Agent Loop Cost

What does each agent run actually cost?

Multi-turn agents accumulate context every step. Cost grows quadratically, not linearly. Model it before you get a $20K overnight bill.

Pricing verified: 2026-07-28 🔴 Highest-risk AI cost pattern

What this calculator does

Model the cost of a multi-turn agent loop — tool calls, context accumulation, retries — where a naive estimate misses 50-80%.

Why use it

Agents have a hidden cost explosion: context grows each turn, and every turn pays for the full accumulated context
Tool-use calls multiply: a 10-step agent loop is not 10x a single call cost — it's often 30x
Prompt caching is the single biggest lever for agent cost — model it here

New to this calculator? Start with the ⚡ Playground — a few sliders, instant ballpark. Then switch to the 🧮 Calculator for your exact number.

Two ways to use this: visualize in the Playground, then get your number in the Calculator.

▶Playground A quick, visual way to see which factors move your result the most. Open the playground → ▤Calculator Enter your real workload for a precise result you can apply to your own usage. Go to the calculator →

What will an agent loop cost?

Tasks × turns × tokens — plus a “runaway tail” for loops that don’t stop. Balanced tier, ~30 days. (Runaway risk fixed at default; see the breakdown.)

Estimated monthly cost

—

Change the input sliders below to see new estimates.

Tasks / day 1,000

Turns / task 6

Tokens / turn 3,000

How we got this estimate

—

💡Base = tasks × turns × tokens × rate. The runaway tail adds the cost of a small fraction of tasks looping to their cap — shown in “Why this number”.

Multi-turn agents accumulate context every turn, and one bad loop shouldn't be a $20K surprise. Set your agent profile below and watch your real per-task cost update instantly.

📊 Not sure of a value? Fields with a ▾ Typical pill offer broad industry ballparks (sourced typical ranges; Agent loops compound context: turn 1 ~2K tokens, turn 10 ~20K. Typical tasks run 4-8 turns; runaways 30-60. The bill is made of the bad days - cap runaway turns/tokens.) so you can move forward now — your result gets more accurate as you replace them with your own measured numbers. Values marked * are rough estimates.

🎮 Interactive Guide & Calculator Playground →

Enter key values

Agent turns per task, tokens per turn, task completion rate, tasks/day.

Model context growth

Pick linear/quadratic context accumulation. Most agents are quadratic — watch out.

Read cost per task

Shows cost explosion from turn 1 → turn N. Total agent cost vs equivalent single-shot.

Act on it

If turns > 10, prompt caching saves dramatically. If context > 100K, switch to summarization pattern.

Enter agent parameters

Turns per task = average tool-call loop depth. Simple agents: 3-5 turns. ReAct agents: 10-20. Planning agents: can be 50+. Tokens per turn = new content added each turn (tool result + reasoning). Task completion rate = % of tasks that succeed (failed tasks retry, doubling cost). Tasks/day drives scale.

Model context growth

Linear growth = agent summarizes context each turn (rare). Quadratic growth = full history passed each turn (common). In quadratic mode, turn N pays for tokens from turns 1..N, so total cost scales as N². A 20-turn agent with 500 fresh tokens/turn ends up passing 10K tokens per call by turn 20 — and you paid for that 10K on every call from turn 1.

Interpret the cost breakdown

Per-task cost shows the progression: turn 1 is cheap, turn N is expensive. Total per-task shows the integral. Monthly is per-task × tasks/day × 30. Comparison to single-shot shows how much "multi-turn premium" you're paying. Prompt cache savings projection shows how much caching the stable prefix (system prompt + early turns) can reduce cost.

Reduce agent cost

Biggest wins: (1) enable prompt caching on the stable context — see Prompt Cache ROI. (2) summarize context every N turns instead of full history. (3) route turns to cheaper models when not reasoning-heavy (Multi-Model Router). (4) watch out for retries — a 20% failure rate doubles total cost if tasks auto-retry.

👇 Now try the calculator below with your own AI workloads

🤖 Your agent setup

Each "task" runs a multi-turn loop until done. Context accumulates across turns.

Pick a typical agent workload — or switch to ⚙️ Advanced to enter exact tokens

Model (the reasoning one) Hint: The agent's main brain. Flagship models cost 3-5x more; only worth it if every step needs deep reasoning.

System prompt + tool definitions (tokens) Hint: Setup text sent every step (instructions + tool list). Turn caching on below or you pay for it each step.

Initial user task (tokens) Hint: The opening task you hand the agent, in words. A sentence or two is typical.

Avg turns per task 10

Hint: Back-and-forth steps per task. Simple ~5, coding ~15, deep research ~30.

Avg tool result size (tokens) Hint: Size of what each tool hands back (file, search result), in words. Piles up every step.

Avg assistant reply per turn (tokens) Hint: What the agent writes back each step, in words. Usually short.

Agent tasks per day Hint: Jobs the agent runs per day at your busiest. Multiplies everything.

Prompt caching (system + tool defs)

Hint: Reuses the setup text instead of re-sending it. Often cuts agent cost 40-60%. Leave on.

Cost per completed agent task

LeanModerateHeavy🔴 Runaway

Daily

—

Monthly

—

Tokens/task

—

Why this result?

Daily spend

Monthly spend

Total tokens / task

Context at final turn

largest single call

📈 Cost accumulates per turn

Why agents get expensive: context grows every turn.

💡 Optimization levers

🔴 Runaway risk

📊 Same workload, different reasoning model

Agent tasks amplify price differences - every turn pays the premium.

Model	$/task	$/day	$/month	$/month (cached)

Agent guardrails playbook → Cache savings calculator → Agent architecture audit →

🎯 Use this result to

🤖 Size your agent budget — Real per-task cost including context bloat from each turn. Most teams underestimate 3-5x.
📈 Spot turn-cost growth — Each turn carries forward all prior context. By turn 8 input cost is 3x turn 1.
🔧 Tune tool overhead — Bloated tool results are silent cost killers. See exactly how much they add per turn.
🔌 Integrate with your AI agents — MCP available for agentic workflow integration. Surface live cost intelligence.

📅 Schedule a call to apply this to your workload

Need help using this calculator for your workloads?

AICost.ai has 50+ calculators and playbooks. Schedule an AvatarVA meeting and we'll work through your real cost scenarios across AI & Cloud: visibility, cost reduction, optimization, forecasting and capacity planning, without sacrificing accuracy or performance.

📅 Schedule an AvatarVA meeting →

Vendor / Model

Field

Why it’s inferred

Anthropic — Claude Sonnet 4.6

cachedInput

Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.

Anthropic — Claude Sonnet 4.5

cachedInput

Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.

Anthropic — Claude Sonnet 4.5

batchInput

Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Sonnet 4.5

batchOutput

Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Haiku 4.5

cachedInput

Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.

OpenAI — GPT-5.4 Mini

cachedInput

Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.

OpenAI — GPT-5.4 Nano

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Nano

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Nano

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Pro

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.2

cachedInput

Derived at 10% of input; no residency uplift.

OpenAI — GPT-5.2

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2

batchOutput

Derived at 50% of output.

OpenAI — GPT-5

cachedInput

Derived at 10% of input.

OpenAI — GPT-5

batchInput

Derived at 50% of input.

OpenAI — GPT-5

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.5 Pro

cachedInput

Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.

OpenAI — GPT-5.5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.2 Pro

cachedInput

Derived at 10% of input — pro-tier convention.

OpenAI — GPT-5.2 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.1

batchInput

Derived at 50% of input.

OpenAI — GPT-5.1

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Nano

cachedInput

Derived at 10% of input.

OpenAI — GPT-5 Nano

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Nano

batchOutput

Derived at 50% of output.

Google — Gemini 3 Flash

cachedInput

Derived at 10% of input — Google caching discount convention ~90%.

Google — Gemini 3.1 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 3.1 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 3.1 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Pro

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.5 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

cachedInput

Derived at 25% of input per Google 2.0 family caching rates.

Google — Gemini 2.0 Flash

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.0 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

xAI — Grok 4 (legacy)

cachedInput

Extrapolated at 25% of base.

What does each agent run actually cost?

Agent Loop Cost Calculator

Results

💡 Optimization levers

🔴 Runaway risk

Go deeper

Need help using this calculator for your workloads?

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

Immediate steps you can take