Last reviewed: May 24, 2026

What is AI model distillation? Definition and business implications

Distillation is a technique that transfers the knowledge of a large AI model (teacher model) to a smaller model (student model), while preserving most of the performance. It enables the deployment of lightweight models with reduced inference cost, viable on more modest infrastructures.

The concept was formalised by Hinton, Vinyals, and Dean in 2015 in the paper Distilling the Knowledge in a Neural Network. The principle: instead of training the student model directly on the original data, it is trained to imitate the outputs (probabilities, logits) of the teacher model. The student learns not only the right answers, but also the relative confidence the teacher model assigns to each alternative. This so-called soft information carries far more learning signal than binary labels alone. DistilBERT (Sanh et al., 2019) is the historical example: it achieves 97% of BERT's performance while being 40% smaller and 60% faster at inference. In 2026, nearly all lightweight models deployed in enterprise (Mistral 7B, Llama 3.2 1B, Gemma 2B) are distilled or related models. Distillation has become the standard process for producing cost-efficient inference models.

Concrete example

A 50-employee customer-service SME handles 8,000 conversations per month with an AI assistant. With a flagship model (Claude Sonnet 4.6, GPT-5.4), monthly inference cost amounts to about 320 euros. By migrating to a distilled model of close quality (Llama 3.3 70B Instruct, or a Mistral distilled model), the cost falls to 35 euros per month, that is, a saving of 3,400 euros per year for a strictly equivalent use case. The quality loss, measured on 200 annotated conversations, is 3 points on the first-contact resolution rate, which remains imperceptible to users.

Sources

Distilling the Knowledge in a Neural Network, Hinton, Vinyals & Dean, arXiv:1503.02531, 2015. https://arxiv.org/abs/1503.02531 (accessed 2026-05-24)
DistilBERT, a distilled version of BERT, Sanh et al., arXiv:1910.01108, 2019. https://arxiv.org/abs/1910.01108 (accessed 2026-05-24)

← Back to glossary

What is AI model distillation? Definition and business implications

Concrete example

See also

Further reading

Sources