Last reviewed: May 25, 2026

What is an LLM? Definition and business implications

An LLM (Large Language Model) is a type of artificial intelligence trained on text corpora of several hundred billion words, which produces natural language by predicting, word by word, the most probable continuation of a given text.

An LLM is a very large neural network, typically built on the transformer architecture (Vaswani et al., 2017), trained to predict the next word in a sequence from the preceding words. This simple objective, repeated over tens of trillions of tokens, is enough to produce models capable of answering questions, drafting texts, translating, reasoning, and coding. The LLM family spans very different sizes, from the lightweight 7-billion-parameter model (Mistral 7B) to latest-generation models with more than a trillion parameters (GPT-4, estimated at 1.76 trillion according to architecture leaks). Size is no longer the sole quality criterion: since 2024, well-trained 70-billion-parameter models rival on common benchmarks with models five to twenty times larger, at a much lower inference cost.

Concrete example

The original transformer, published by Google in 2017, contained 65 million parameters. GPT-3, unveiled by OpenAI in 2020, had 175 billion, that is 2,700 times more in three years. Since then, inflation has continued: Llama 3.1 (Meta) reaches 405 billion in open-source, and the mixture-of-experts architecture of GPT-4 totals about 1.76 trillion parameters according to public estimates. But in 2026, the leader in quality-to-cost ratio according to public MMLU benchmarks is Llama 3.3 with 70 billion parameters, which rivals models ten times larger at a far lower inference cost.

Three implications

Choosing an LLM is closer to an infrastructure choice than a software choice. Three implications for the executive. First, parameter size primarily determines inference cost, only secondarily quality. A well-trained 70-billion-parameter model covers 80 to 90% of enterprise use cases at a five-to-ten-times lower cost than a premium model. Systematically test mid-sized models before paying for the flagships. Second, three families coexist: proprietary (GPT, Claude, Gemini), open-source (Llama, Mistral, DeepSeek), sovereign (Mistral AI in Europe). The choice engages cost, data confidentiality, and supplier dependence. Third, model performance is also that of its tokeniser, its context window, and its robustness to adversarial instructions. No single benchmark captures these three axes.

Sources

Attention Is All You Need, Vaswani et al., NeurIPS 2017. https://arxiv.org/abs/1706.03762 (accessed 2026-05-24)
Language Models are Few-Shot Learners, Brown et al., NeurIPS 2020. https://arxiv.org/abs/2005.14165 (accessed 2026-05-24)

← Back to glossary

What is an LLM? Definition and business implications

Concrete example

See also

Sources