Overage Forecaster

What this calculator does

Forecast your true monthly cost on hybrid (seat + included quota + overage) AI plans. Account for what insiders call the "false sense of security" in 2026 hybrid pricing.

Why use it

Hybrid pricing (seat + included quota + overage) is the dominant model in 2026 — Microsoft, GitHub, Anthropic all switched in Q1
Seat fees you can plan; overage you cannot — calculator separates them so the realistic bill is visible
Surfaces which metered dimension you will breach first, before the invoice does

New to this calculator? Start with the ⚡ Playground — a few sliders, instant ballpark. Then switch to the 🧮 Calculator for your exact number.

Two ways to use this: visualize in the Playground, then get your number in the Calculator.

▶Playground A quick, visual way to see which factors move your result the most. Open the playground → ▤Calculator Enter your real workload for a precise result you can apply to your own usage. Go to the calculator →

Will you blow your budget this month?

Projects month-end spend from your run rate so far. Trend multiplier fixed at 1× (steady).

Projected overage this month

—

Change the input sliders below to see new estimates.

Monthly budget 10,000

Spent so far 4,700

Days elapsed 11

How we got this estimate

—

💡Projected month = (spent ÷ days elapsed) × 30 × trend. Overage is anything above budget — $0 means you’re fine.

Why this matters: Last week one CIO called hybrid pricing a "false sense of security" - seat fees you can plan, overage you can’t. This calculator separates the two so you can see your realistic monthly bill.

📊 Not sure of a value? Fields with a ▾ Typical pill offer broad industry ballparks (sourced typical ranges) so you can move forward now — your result gets more accurate as you replace them with your own measured numbers. Values marked * are rough estimates.

🎮 Interactive Guide & Calculator Playground →

Pick vendor + plan

Vendor and plan determine the included quota and overage rate.

Enter seats

Active seats, not licensed. Each seat carries the plan's base cost.

Estimate usage

Optional but powerful: expected metered usage tells us whether you will breach the included quota.

Read base / overage

Three numbers: base cost (predictable), overage cost (the surprise), total monthly bill. Peer-plan comparison shows whether another vendor is cheaper at your usage.

👇 Now try the calculator below with your own AI workloads

1. Pick your plan

Vendor Hint: Which subscription provider you are on.

Plan Hint: Which plan tier you pay for.

2. Your scale

Seats Hint: How many seats you pay for.

Forecast

Base (seats)

Overage

Total / month

By dimension

Per-metric breakdown

Metric	Consumed	Included	Over	Rate	Cost

vs peer plans (same usage profile)

Plan	Seat	Total/mo	vs target

🎯 Use your Overage Forecaster results to…

🚨 Catch overage early — See which metered dimension pushes you past your plan, before the invoice does.
🔁 Compare peer plans — Check whether a different vendor or tier would carry your usage cheaper.
📊 Right-size seats — Tune seat count and usage to find the plan that fits without overpaying.
🔌 Integrate with your AI agents — MCP available for agentic workflow integration. Plug AICost.ai into your agents to surface real-time cost intelligence.

Pricing data:

✓ Curated · verified today ago

Hybrid pricing (seat + included quota + overage) is the dominant model in 2026. Microsoft Copilot, GitHub Copilot, and Claude Enterprise all switched to hybrid in Q1 2026 - and the seat price is no longer the whole story.

Vendor / Model

Field

Why it’s inferred

Anthropic — Claude Sonnet 4.6

cachedInput

Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.

Anthropic — Claude Sonnet 4.5

cachedInput

Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.

Anthropic — Claude Sonnet 4.5

batchInput

Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Sonnet 4.5

batchOutput

Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Haiku 4.5

cachedInput

Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.

OpenAI — GPT-5.4 Mini

cachedInput

Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.

OpenAI — GPT-5.4 Nano

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Nano

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Nano

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Pro

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.2

cachedInput

Derived at 10% of input; no residency uplift.

OpenAI — GPT-5.2

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2

batchOutput

Derived at 50% of output.

OpenAI — GPT-5

cachedInput

Derived at 10% of input.

OpenAI — GPT-5

batchInput

Derived at 50% of input.

OpenAI — GPT-5

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.5 Pro

cachedInput

Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.

OpenAI — GPT-5.5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.2 Pro

cachedInput

Derived at 10% of input — pro-tier convention.

OpenAI — GPT-5.2 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.1

batchInput

Derived at 50% of input.

OpenAI — GPT-5.1

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Nano

cachedInput

Derived at 10% of input.

OpenAI — GPT-5 Nano

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Nano

batchOutput

Derived at 50% of output.

Google — Gemini 3 Flash

cachedInput

Derived at 10% of input — Google caching discount convention ~90%.

Google — Gemini 3.1 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 3.1 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 3.1 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Pro

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.5 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

cachedInput

Derived at 25% of input per Google 2.0 family caching rates.

Google — Gemini 2.0 Flash

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.0 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

xAI — Grok 4 (legacy)

cachedInput

Extrapolated at 25% of base.

Overage Forecaster