Advanced architectures
The techniques that distinguish an amateur deployment from an industrial one: RAG, fine-tuning, distillation, MoE, MCP, API. The lexicon of technical arbitrations for executives.
- AI APIAn AI API is a technical interface that lets a software application send requests to an AI model hosted by a provider, and retrieve its responses. It is the standard access mode to AI in enterprise, as opposed to local hosting of the model.
- DistillationDistillation is a technique that transfers the knowledge of a large AI model (teacher model) to a smaller model (student model), while preserving most of the performance. It enables the deployment of lightweight models with reduced inference cost, viable on more modest infrastructures.
- Fine-tuningFine-tuning is an adaptation technique for an already-trained AI model, which consists of continuing its training on a dataset specific to your use case. It modifies the model's internal parameters, in contrast to RAG, which simply injects context at query time.
- Function callingFunction calling is the ability of an AI model to invoke predefined functions or tools to execute actions in an external system. The model returns a structured object (JSON) rather than text, allowing the application to call the function and reinject the result into the conversation.
- MCP (Model Context Protocol)MCP (Model Context Protocol) is an open standard, introduced by Anthropic in November 2024, that lets an AI model connect to data sources and external tools in a uniform way. It avoids writing specific connectors for every model-application combination.
- MoE (Mixture of Experts)Mixture of Experts (MoE) is an AI model architecture that splits the network into specialised sub-models, called experts. For each token processed, a router dynamically selects a few experts, leaving the others inactive. The model has the capacity of a large model but the compute cost of a smaller one.
- Open-source modelAn open-source AI model is a foundation model whose weights and architecture are freely downloadable and usable under a permissive licence (Apache 2.0, MIT). It contrasts with the proprietary model (Claude, GPT, Gemini) accessible only via API. The choice engages sovereignty, cost, and long-term flexibility.
- RAG (Retrieval-Augmented Generation)RAG (Retrieval-Augmented Generation) is an AI architecture that pairs a search engine across your documents with a generative model. The model answers by relying on citable business data rather than on its training knowledge alone.
- Vector databaseA vector database is a database specialised in the storage and retrieval of vectors (embeddings). It allows, for a given query, finding the most semantically close content in a corpus, without exact lexical match. It is the typical search engine of a RAG system.