a8A8gent
HomeToolsCost calculator
Tool

LLM inference cost calculator.

Tokens in, dollars out. Compare Claude, OpenAI, Gemini, Llama side by side. Numbers updated when providers change their pricing.

Per request
2,500
100100k

System prompt + retrieved chunks + history. Most agents land 1.5k - 6k.

350
208k

Tier-1 support replies average ~300 tokens. Voice transcripts run longer.

2
110

Each tool call sends the full context again. The honest part of agent cost.

Per month
50,000
1002M
Cache hit rate
60%
0%95%

For supporting providers, cached input is ~10% of regular cost. Big lever.

Side by side
Estimated monthly inference cost
Pricing snapshot · 2026.04
  • G
    Gemini 2.5 Flash
    Google · fast · $0.07 in / $0.30 out
    $14
    $0.0003 / req
  • O
    GPT-5 mini
    OpenAI · flagship · $0.40 in / $1.60 out
    $74
    $0.0015 / req
  • M
    Llama 3.3 70B
    Meta (host) · open · $0.60 in / $0.60 out
    $79
    $0.0016 / req
  • A
    Claude Haiku
    Anthropic · fast · $0.80 in / $4.00 out
    $162
    $0.0032 / req
  • G
    Gemini 2.5 Pro
    Google · flagship · $1.25 in / $5.00 out
    $231
    $0.0046 / req
  • M
    Mistral Large 2
    Mistral · flagship · $2.00 in / $6.00 out
    $335
    $0.0067 / req
  • A
    Claude Sonnet
    Anthropic · flagship · $3.00 in / $15.00 out
    $608
    $0.0122 / req
  • O
    GPT-5
    OpenAI · frontier · $5.00 in / $25.00 out
    $1,012
    $0.0202 / req
  • A
    Claude Opus
    Anthropic · frontier · $15.00 in / $75.00 out
    $3,038
    $0.0608 / req
Bars are relative to the most expensive model in your set.See methodology
For your numbers
Gemini 2.5 Flashcheapest at this volume

At your volume, Gemini 2.5 Flash lands $14 / month. Cheapest is not always the right answer; the model-swap eval in module 04 is how you decide for real.

How to swap models without breaking your agent
Methodology

How the numbers are calculated.

per_request_in  = input_tokens * tool_calls
per_request_out = output_tokens
cached          = per_request_in * cache_hit
fresh_in        = per_request_in - cached

cost_req = (fresh_in / 1e6) * price_in
         + (cached   / 1e6) * price_in * 0.10
         + (per_request_out / 1e6) * price_out

monthly  = cost_req * requests_per_month
  • - Cached input pricing follows Anthropic / OpenAI's published rate of ~10% of base for prompt caching.
  • - Tool calls add full re-sends of input context. We don't assume KV-cache discounts beyond prompt caching.
  • - Doesn't include embedding / retrieval cost. Add ~$0.0001/req for hybrid retrieval if you want the full picture.
  • - Prices are list. Volume discounts and committed-spend programs aren't modeled.
Cost is the easy part

The hard part is knowing when a $0.012 agent becomes a $0.14 agent overnight.

Module 10 covers cost regressions, alerting, and the Langfuse dashboard that catches drift before your CFO does.