LLM inference cost calculator.
Tokens in, dollars out. Compare Claude, OpenAI, Gemini, Llama side by side. Numbers updated when providers change their pricing.
System prompt + retrieved chunks + history. Most agents land 1.5k - 6k.
Tier-1 support replies average ~300 tokens. Voice transcripts run longer.
Each tool call sends the full context again. The honest part of agent cost.
For supporting providers, cached input is ~10% of regular cost. Big lever.
- GGemini 2.5 FlashGoogle · fast · $0.07 in / $0.30 out$14$0.0003 / req
- OGPT-5 miniOpenAI · flagship · $0.40 in / $1.60 out$74$0.0015 / req
- MLlama 3.3 70BMeta (host) · open · $0.60 in / $0.60 out$79$0.0016 / req
- AClaude HaikuAnthropic · fast · $0.80 in / $4.00 out$162$0.0032 / req
- GGemini 2.5 ProGoogle · flagship · $1.25 in / $5.00 out$231$0.0046 / req
- MMistral Large 2Mistral · flagship · $2.00 in / $6.00 out$335$0.0067 / req
- AClaude SonnetAnthropic · flagship · $3.00 in / $15.00 out$608$0.0122 / req
- OGPT-5OpenAI · frontier · $5.00 in / $25.00 out$1,012$0.0202 / req
- AClaude OpusAnthropic · frontier · $15.00 in / $75.00 out$3,038$0.0608 / req
At your volume, Gemini 2.5 Flash lands $14 / month. Cheapest is not always the right answer; the model-swap eval in module 04 is how you decide for real.
How to swap models without breaking your agentHow the numbers are calculated.
per_request_in = input_tokens * tool_calls
per_request_out = output_tokens
cached = per_request_in * cache_hit
fresh_in = per_request_in - cached
cost_req = (fresh_in / 1e6) * price_in
+ (cached / 1e6) * price_in * 0.10
+ (per_request_out / 1e6) * price_out
monthly = cost_req * requests_per_month
- - Cached input pricing follows Anthropic / OpenAI's published rate of ~10% of base for prompt caching.
- - Tool calls add full re-sends of input context. We don't assume KV-cache discounts beyond prompt caching.
- - Doesn't include embedding / retrieval cost. Add ~$0.0001/req for hybrid retrieval if you want the full picture.
- - Prices are list. Volume discounts and committed-spend programs aren't modeled.
The hard part is knowing when a $0.012 agent becomes a $0.14 agent overnight.
Module 10 covers cost regressions, alerting, and the Langfuse dashboard that catches drift before your CFO does.