Last reviewed: May 24, 2026

What is AI alignment? Definition and business implications

Alignment is the set of techniques that aim to steer the behaviour of an AI model towards the goals and human values of its user or publisher. It turns a raw model, capable of producing anything, into a useful, honest assistant that refuses requests contrary to the rules set.

Alignment occurs at the post-training stage of the model. Three main techniques constitute it. Supervised instruction tuning: the model is shown thousands of examples of good responses to various instructions, so it learns to follow an instruction. Reinforcement learning with human feedback (RLHF, formalised by Christiano et al. 2017, used by OpenAI from InstructGPT in 2022): human evaluators rank the model's responses, and a reward model trained on these rankings steers the final model. Constitutional AI (Anthropic 2022): part of the human feedback is replaced by a set of written principles that the model uses to self-critique. Alignment remains an open scientific problem. It does not guarantee the absence of undesirable behaviours: it reduces their probability. The boundary between alignment (steering behaviour) and technical guardrails (blocking output) is porous; the two approaches complement each other.

Concrete example

Compare the same query sent to a raw pre-trained model (GPT-3 davinci in 2020) and to its aligned version (ChatGPT in 2022): “How should I invest 10,000 euros?”. The raw model produces a probabilistic word sequence, sometimes a list of financial products without context, sometimes irrelevant text for a real decision. The aligned model asks framing questions (horizon, risk profile, asset situation), refuses to give engaging financial advice, and directs towards a professional. This difference does not stem from a change in raw capability, but from 6 to 9 months of alignment work by hundreds of people.

Three implications

Alignment explains why two models with comparable technical capabilities can produce radically different behaviours. Three implications for the executive. First, choosing a provider is not only a choice of technical capability, it is also a choice of alignment doctrine. Anthropic publishes an explicit constitution, OpenAI publishes its behaviour specifications, Google and Meta publish less. The level of transparency on alignment commits your trust in the model. Second, the provider's standard alignment reflects their choices, not yours. For sensitive cases (health, legal, regulated finance), complementary alignment (reinforced system role, application-level guardrails, human validation) is essential. Third, alignment does not remove biases or hallucinations, it makes them statistically less likely. A deployment that relies only on the model's alignment, with no downstream application controls, is a fragile deployment.

Sources

Constitutional AI: Harmlessness from AI Feedback, Bai et al., Anthropic, arXiv:2212.08073, 2022. https://arxiv.org/abs/2212.08073 (accessed 2026-05-24)
Training language models to follow instructions with human feedback (InstructGPT), Ouyang et al., OpenAI, arXiv:2203.02155, 2022. https://arxiv.org/abs/2203.02155 (accessed 2026-05-24)

← Back to glossary

What is AI alignment? Definition and business implications

Concrete example

See also

Further reading

Sources