Last reviewed:
What is the AI cost per token? Definition and business implications
Cost per token is the elementary economic unit of an AI deployment. Providers bill input tokens (your prompt) and output tokens (the model's response) separately, with a typical ratio of 1 to 5. Mastering this cost requires distinguishing unit prices, consumed volumes, and optimisation levers.
Cost per token is expressed in dollars per million tokens (MTok). Anthropic prices in May 2026 illustrate the orders of magnitude. Claude Haiku 4.5 (lightweight model): $1 input, $5 output per MTok. Claude Sonnet 4.6 (quality/cost balance): $3 / $15. Claude Opus 4.7 (flagship model): $5 / $25. The output/input ratio is constant at 5:1 across the Anthropic line-up. Competitors charge comparable rates: OpenAI's GPT-5.4 around $2.50/MTok, DeepSeek V3.2 at $0.14/MTok. Three optimisation levers exist. Batch API: minus 50% on input and output, for non-interactive uses. Prompt caching: up to minus 90% on cached inputs (repeated contexts, system prompts). Adapted model choice: a Haiku 4.5 costs five times less than an Opus 4.7 for classification, extraction, or routing use cases.
Concrete example
A 90-employee SME uses an AI assistant via the Claude Sonnet 4.6 API for 3,000 requests per day. A typical exchange consumes 800 input tokens and 400 output tokens. Monthly calculation: 3,000 × 22 days × 800 input × ($3/MTok) = $158 for input. 3,000 × 22 × 400 × ($15/MTok) = $396 for output. Gross total: $554 per month, or about $6,600 per year. Activating prompt caching on the 600 tokens of common context: minus 50% on input, annual saving of $950. Switching the model to Haiku 4.5 for 40% of simple requests (request classification): $1,800 additional annual saving.
See also
Further reading
Sources
- Anthropic API pricing page, Claude Haiku 4.5, Sonnet 4.6, Opus 4.7 rates, May 2026. https://www.anthropic.com/pricing
- OpenAI API pricing page, GPT-5.4 and lightweight models, 2026. https://openai.com/api/pricing/