Overage Forecaster

📖 What this is / how to use
What this calculator does

Forecast your true monthly cost on hybrid (seat + included quota + overage) AI plans. Account for what insiders call the "false sense of security" in 2026 hybrid pricing.

Why use it
  • Hybrid pricing (seat + included quota + overage) is the dominant model in 2026 — Microsoft, GitHub, Anthropic all switched in Q1
  • Seat fees you can plan; overage you cannot — calculator separates them so the realistic bill is visible
  • Surfaces which metered dimension you will breach first, before the invoice does
Pricing data:
✓ Curated · verified today ago

Hybrid pricing (seat + included quota + overage) is the dominant model in 2026. Microsoft Copilot, GitHub Copilot, and Claude Enterprise all switched to hybrid in Q1 2026 - and the seat price is no longer the whole story.

📊 How it works (diagram)
Overage Forecaster full size
Why this matters: Last week one CIO called hybrid pricing a "false sense of security" - seat fees you can plan, overage you can’t. This calculator separates the two so you can see your realistic monthly bill.

1. Pick your plan

2. Your scale

Forecast

Base (seats)
$0
Overage
$0
Total / month
$0

By dimension

Per-metric breakdown

MetricConsumedIncludedOverRateCost

vs peer plans (same usage profile)

PlanSeatTotal/movs target
🎯 Use your Overage Forecaster results to…
🚨
Catch overage early

See which metered dimension pushes you past your plan, before the invoice does.

🔁
Compare peer plans

Check whether a different vendor or tier would carry your usage cheaper.

📊
Right-size seats

Tune seat count and usage to find the plan that fits without overpaying.

🔌
Integrate with your AI agents

MCP available for agentic workflow integration. Plug AICost.ai into your agents to surface real-time cost intelligence.

📖 Data sources & methodology 171 text models · 9 embeddings · 30 vision · 46 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-13

Methodology

  • All prices are USD per 1 million tokens, current as of 2026-06-13.
  • Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
  • Batch API discounts are 50% off standard rates across providers that offer Batch mode.
  • Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
  • Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
  • Long-context pricing tiers apply when input exceeds model threshold.
  • Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic
2026-06-13
https://www.anthropic.com/pricing
Daily snapshot since Sep 2023 · 586 days captured
Anthropic Docs
2026-06-13
https://platform.claude.com/docs/en/about-claude/pricing
Daily snapshot since Sep 2023 · 586 days captured
OpenAI
2026-06-13
https://openai.com/api/pricing/
Daily snapshot since Sep 2023 · 587 days captured
Google AI
2026-06-13
https://ai.google.dev/gemini-api/docs/pricing
Daily snapshot since Dec 2023 · 562 days captured
Google Vertex
2026-06-13
https://cloud.google.com/vertex-ai/generative-ai/pricing
Daily snapshot since Dec 2023 · 562 days captured
DeepSeek
2026-06-13
https://api-docs.deepseek.com/quick_start/pricing
Daily snapshot since May 2024 · 501 days captured
xAI
2026-06-13
https://x.ai/api
Daily snapshot since Nov 2024 · 419 days captured
Mistral
2026-06-13
https://mistral.ai/pricing
Daily snapshot since Dec 2023 · 560 days captured
Cohere
2026-06-13
https://cohere.com/pricing
Daily snapshot since Sep 2023 · 586 days captured

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model Field Why it’s inferred
Anthropic — Claude Sonnet 4.6 cachedInput Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5 cachedInput Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5 batchInput Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5 batchOutput Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5 cachedInput Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini cachedInput Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2 cachedInput Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2 batchInput Derived at 50% of input.
OpenAI — GPT-5.2 batchOutput Derived at 50% of output.
OpenAI — GPT-5 cachedInput Derived at 10% of input.
OpenAI — GPT-5 batchInput Derived at 50% of input.
OpenAI — GPT-5 batchOutput Derived at 50% of output.
OpenAI — GPT-5.5 Pro cachedInput Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.2 Pro cachedInput Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.2 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.1 batchInput Derived at 50% of input.
OpenAI — GPT-5.1 batchOutput Derived at 50% of output.
OpenAI — GPT-5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5 Nano cachedInput Derived at 10% of input.
OpenAI — GPT-5 Nano batchInput Derived at 50% of input.
OpenAI — GPT-5 Nano batchOutput Derived at 50% of output.
Google — Gemini 3 Flash cachedInput Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash cachedInput Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy) cachedInput Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →