Agent Loop Cost

What does each agent run actually cost?

Multi-turn agents accumulate context every step. Cost grows quadratically, not linearly. Model it before you get a $20K overnight bill.

Pricing verified: 2026-06-13 🔴 Highest-risk AI cost pattern
📖 What this is / how to use
What this calculator does

Model the cost of a multi-turn agent loop — tool calls, context accumulation, retries — where a naive estimate misses 50-80%.

Why use it
  • Agents have a hidden cost explosion: context grows each turn, and every turn pays for the full accumulated context
  • Tool-use calls multiply: a 10-step agent loop is not 10x a single call cost — it's often 30x
  • Prompt caching is the single biggest lever for agent cost — model it here
📊 Calculator at a glance
📊 How it works (diagram)
Agent Loop Cost full size
Multi-turn agents accumulate context every turn, and one bad loop shouldn't be a $20K surprise. Set your agent profile below and watch your real per-task cost update instantly.
🤖 Your agent setup

Each "task" runs a multi-turn loop until done. Context accumulates across turns.

Quick example taps
Flagship models (Opus, GPT-5.4) are 3-5x more expensive - is that necessary for every turn?
Repeated across every turn. Cache this or pay for it N times.
Each turn = 1 model call + 1 tool execution. Coding agents avg 8-20, research agents 15-40.
This gets added to context every turn. File reads, API responses, search results.
Caches the big system prompt + tool defs. Often cuts agent cost 40-60%.
Cost per completed agent task
-
-
LeanModerateHeavy🔴 Runaway
-
Daily
Monthly
Tokens/task
Why this result?
Daily spend
-
Monthly spend
-
Total tokens / task
-
-
Context at final turn
-
largest single call
📈 Cost accumulates per turn

Why agents get expensive: context grows every turn.

💡 Optimization levers

    🔴 Runaway risk

      📊 Same workload, different reasoning model

      Agent tasks amplify price differences - every turn pays the premium.

      Model $/task $/day $/month $/month (cached)
      Agent guardrails playbook → Cache savings calculator → Agent architecture audit →
      🎯 Use this result to
      📅 Schedule a call to apply this to your workload

      Go deeper

      Our playbooks on cutting this number.

      🔁
      Agent Loop Guardrails
      Stop $20K overnight bills
      💾
      Prompt Cache ROI
      Cache the system prompt
      📈
      Scale Projection
      What if tasks 10x?
      🧮
      Cost Calculator
      Single-call granularity

      Need help using this calculator for your workloads?

      AICost.ai has 50+ calculators and playbooks. Schedule an AvatarVA meeting and we'll work through your real cost scenarios across AI & Cloud: visibility, cost reduction, optimization, forecasting and capacity planning, without sacrificing accuracy or performance.

      📅 Schedule an AvatarVA meeting →
      📖 Data sources & methodology 171 text models · 9 embeddings · 30 vision · 46 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-13

      Methodology

      • All prices are USD per 1 million tokens, current as of 2026-06-13.
      • Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
      • Batch API discounts are 50% off standard rates across providers that offer Batch mode.
      • Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
      • Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
      • Long-context pricing tiers apply when input exceeds model threshold.
      • Embedding prices are input-only (no output tokens generated).

      Primary sources

      Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

      Anthropic
      2026-06-13
      https://www.anthropic.com/pricing
      Daily snapshot since Sep 2023 · 586 days captured
      Anthropic Docs
      2026-06-13
      https://platform.claude.com/docs/en/about-claude/pricing
      Daily snapshot since Sep 2023 · 586 days captured
      OpenAI
      2026-06-13
      https://openai.com/api/pricing/
      Daily snapshot since Sep 2023 · 587 days captured
      Google AI
      2026-06-13
      https://ai.google.dev/gemini-api/docs/pricing
      Daily snapshot since Dec 2023 · 562 days captured
      Google Vertex
      2026-06-13
      https://cloud.google.com/vertex-ai/generative-ai/pricing
      Daily snapshot since Dec 2023 · 562 days captured
      DeepSeek
      2026-06-13
      https://api-docs.deepseek.com/quick_start/pricing
      Daily snapshot since May 2024 · 501 days captured
      xAI
      2026-06-13
      https://x.ai/api
      Daily snapshot since Nov 2024 · 419 days captured
      Mistral
      2026-06-13
      https://mistral.ai/pricing
      Daily snapshot since Dec 2023 · 560 days captured
      Cohere
      2026-06-13
      https://cohere.com/pricing
      Daily snapshot since Sep 2023 · 586 days captured

      Inferred values (marked with * in calculator tables)

      Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

      Vendor / Model Field Why it’s inferred
      Anthropic — Claude Sonnet 4.6 cachedInput Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
      Anthropic — Claude Sonnet 4.5 cachedInput Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
      Anthropic — Claude Sonnet 4.5 batchInput Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
      Anthropic — Claude Sonnet 4.5 batchOutput Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
      Anthropic — Claude Haiku 4.5 cachedInput Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
      OpenAI — GPT-5.4 Mini cachedInput Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
      OpenAI — GPT-5.4 Nano cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
      OpenAI — GPT-5.4 Nano batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
      OpenAI — GPT-5.4 Nano batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
      OpenAI — GPT-5.4 Pro cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
      OpenAI — GPT-5.4 Pro batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
      OpenAI — GPT-5.4 Pro batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
      OpenAI — GPT-5.2 cachedInput Derived at 10% of input; no residency uplift.
      OpenAI — GPT-5.2 batchInput Derived at 50% of input.
      OpenAI — GPT-5.2 batchOutput Derived at 50% of output.
      OpenAI — GPT-5 cachedInput Derived at 10% of input.
      OpenAI — GPT-5 batchInput Derived at 50% of input.
      OpenAI — GPT-5 batchOutput Derived at 50% of output.
      OpenAI — GPT-5.5 Pro cachedInput Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
      OpenAI — GPT-5.5 Pro batchInput Derived at 50% of input.
      OpenAI — GPT-5.5 Pro batchOutput Derived at 50% of output.
      OpenAI — GPT-5.2 Pro cachedInput Derived at 10% of input — pro-tier convention.
      OpenAI — GPT-5.2 Pro batchInput Derived at 50% of input.
      OpenAI — GPT-5.2 Pro batchOutput Derived at 50% of output.
      OpenAI — GPT-5.1 batchInput Derived at 50% of input.
      OpenAI — GPT-5.1 batchOutput Derived at 50% of output.
      OpenAI — GPT-5 Pro batchInput Derived at 50% of input.
      OpenAI — GPT-5 Pro batchOutput Derived at 50% of output.
      OpenAI — GPT-5 Nano cachedInput Derived at 10% of input.
      OpenAI — GPT-5 Nano batchInput Derived at 50% of input.
      OpenAI — GPT-5 Nano batchOutput Derived at 50% of output.
      Google — Gemini 3 Flash cachedInput Derived at 10% of input — Google caching discount convention ~90%.
      Google — Gemini 3.1 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
      Google — Gemini 3.1 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
      Google — Gemini 3.1 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
      Google — Gemini 2.5 Pro cachedInput Derived at 10% of input.
      Google — Gemini 2.5 Flash cachedInput Derived at 10% of input.
      Google — Gemini 2.5 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
      Google — Gemini 2.5 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
      Google — Gemini 2.5 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
      Google — Gemini 2.0 Flash cachedInput Derived at 25% of input per Google 2.0 family caching rates.
      Google — Gemini 2.0 Flash batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
      Google — Gemini 2.0 Flash batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
      Google — Gemini 2.0 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
      Google — Gemini 2.0 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
      Google — Gemini 2.0 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
      xAI — Grok 4 (legacy) cachedInput Extrapolated at 25% of base.

      Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →