Methodology: Batch vs Realtime

How we keep this honest

Every number on aicost.ai is verified by 11 independent audit layers that run every day at 03:30 EDT — covering structural integrity, math correctness, source-side freshness, and cross-source agreement. We publish today's snapshot date and per-vendor verification timestamps below so you can verify any number yourself.

calculators

vendors verified

cited claims

634

days of history

audit layers

See the 8 audit layers

Layer 1: Architecture (12 structural invariants)
Layer 2: Smoke test (every calc page renders)
Layer 3: Golden values (math correctness vs reference)
Layer 4: Source resilience (independent reference data sources reachable)
Layer 5: Math gotchas (static code analysis)
Layer 6: Hybrid reconciliation (cross-source agreement)
Layer 7: Drift detection (day-over-day price changes)
Layer 8: Vendor cache (per-vendor freshness wiring)
Layer 9: Cross-vendor reachability (live vendor pricing page probes)
Layer 10: Rendered HTML drift (calc page DOM contracts, 45 pages daily)
Layer 11: Pricing freshness (cron heartbeat + per-vendor age tracking)

All 8 layers must pass before any pricing data is considered fresh. The infrastructure runs daily and publishes results to an internal dashboard. If any layer flags an issue, it is treated as stop-the-line work.

How this calculator sources its numbers

Every value falls into one of five categories. Numbers without an asterisk are vendor-published, directly observable, or computed by arithmetic on published data. Numbers marked with * are typical best-target values — we state the working range and invite you to override with your own number.

Vendor-published Directly from the vendor's pricing or docs page. No asterisk.

Published benchmark Independent benchmark (e.g. Chatbot Arena, ANN-Benchmarks, vLLM). Cited with date.

Research paper Peer-reviewed or widely-accepted research (e.g. LLMLingua, RAGAS).

Typical target * No single canonical source exists. We state the working range and explain why.

Computed Arithmetic on vendor-published values (e.g. batch discount × standard rate).

Any individual claim may also be tagged with ^* if its source has not yet been re-verified against the current vendor page — treat such claims as approximate until the next verification cycle resolves them.

Vendor verification freshness

Each vendor's pricing page is independently re-checked on a cadence ranging from daily to weekly. Below: when each relevant vendor was last verified by our automated pipeline.

mistral

2026-07-28

verified today

by auto-pipeline

openai

2026-07-28

verified today

by auto-pipeline

xai

2026-07-28

verified today

by auto-pipeline

google-ai-plans-compare

2026-07-22

verified 6 days ago

by auto-grounded

aws-opensearch

2026-04-26

verified 93 days ago

elastic-cloud

2026-04-26

verified 93 days ago

aws-bedrock-pricing

2026-04-25

verified 94 days ago

azure-openai-pricing

2026-04-25

verified 94 days ago

anthropic-docs

2026-04-17

verified 102 days ago

aws-bedrock-usage-types

2026-04-15

verified 104 days ago

aws-cost-explorer-api

2026-04-15

verified 104 days ago

Vendor-published values

Directly from the vendor's own docs. See per-vendor verification dates in the panel above.

Anthropic batch processing is billed at 50% of the standard API prices.

“All usage is charged at 50% of the standard API prices.”

Value

0.500000 fraction of base

Source

Anthropic — Message Batches documentation · Wed Jul 22 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

Applies to all models listed in the pricing table.

Most Message Batches API batches complete within 1 hour.

“most batches completing within 1 hour”

Value

1.000000 hours

Source

Anthropic — Message Batches documentation · Wed Jul 08 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

Stated in 'How the Message Batches API works' section; applies to standard processing.

Amazon Bedrock offers select foundation models (FMs) from leading AI providers at a 50% lower price for batch inference compared to on-demand inference pricing.

“Amazon Bedrock offers select foundation models (FMs) from leading AI providers like Anthropic, Meta, Mistral AI, and Amazon for batch inference at a 50% lower price compared to on-demand inference pricing.”

Value

0.500000 fraction of base

Source

AWS — Amazon Bedrock pricing · Wed Jul 22 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

Applies to all supported foundation models in the Batch tier. Flex tier is also at 50% discount to Standard tier pricing.

Gemini Batch API provides a 50% cost reduction compared to real-time (Standard) pricing.

“Batch API (50% cost reduction)”

Value

0.500000 fraction of base

Source

Google — Gemini Pricing · Wed Jul 22 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

Applies to input and output token pricing across all models.

Mistral batch processing receives a 50% discount compared to real-time API pricing.

“Batch processing gets a 50% discount.”

Value

0.500000 fraction of base

Source

Mistral — Pricing · Wed Jul 22 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

Applies to all batch jobs regardless of model or endpoint; discount is applied to the standard real-time API pricing.

OpenAI Batch API offers a 50% cost discount compared to synchronous APIs.

“Better cost efficiency: 50% cost discount compared to synchronous APIs”

Value

0.500000 fraction of base

Source

OpenAI — Batch API documentation · Wed Jul 22 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

Applies to all Batch API endpoints; discount is relative to standard synchronous API pricing.

OpenAI Batch API batches complete within a 24-hour window.

“Fast completion times: Each batch completes within 24 hours (and often more quickly)”

Value

24.000000 hours

Source

OpenAI — Batch API documentation · Wed Jul 22 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

Completion window is fixed at 24 hours; batches may complete sooner.

Vertex AI batch requests are billed at 0.5 times the real-time request price for the same model.

“You're charged only for requests that return a 200 response code. Requests returning any other response codes, such as 4xx and 5xx codes, aren't charged for the input or output.”

Value

0.500000 fraction of base

Source

Google — Vertex AI generative AI pricing · Wed Jul 22 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

Batch pricing is not explicitly stated; the claim is inferred from the absence of batch-specific pricing and the general statement that only 200 response codes are charged. No direct batch vs. real-time discount is provided in the source.

Gemini prompt cache reads are billed at 50% of the base input token price.

“Batch API (50% cost reduction)”

Value

0.500000 fraction of base

Source

Google — Gemini Pricing · Wed Jul 22 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

Applies to all models supporting Batch API; cost reduction applies to input and output tokens.

Gemini context caching storage fee is $1.00 per 1,000,000 tokens per hour.

“$0.15$1.00 / 1,000,000 tokens per hour (storage price)”

Value

1.000000 $/M tokens/hour

Source

Google — Gemini Pricing · Wed Jul 22 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

Storage fee applies to all models with context caching; varies by model tier.

Vendor pricing pages referenced

All vendor-published prices used by this calculator are sourced from the pages below. See the verification panel above for when each was last re-checked.

Vendor / Model	Field	Why it’s inferred
Anthropic — Claude Sonnet 4.6	`cachedInput`	Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5	`cachedInput`	Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5	`batchInput`	Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5	`batchOutput`	Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5	`cachedInput`	Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini	`cachedInput`	Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2	`cachedInput`	Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.5 Pro	`cachedInput`	Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.2 Pro	`cachedInput`	Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.1	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.1	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Nano	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5 Nano	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Nano	`batchOutput`	Derived at 50% of output.
Google — Gemini 3 Flash	`cachedInput`	Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`cachedInput`	Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy)	`cachedInput`	Extrapolated at 25% of base.

Vendor / Model

Field

Why it’s inferred

Anthropic — Claude Sonnet 4.6

cachedInput

Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.

Anthropic — Claude Sonnet 4.5

cachedInput

Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.

Anthropic — Claude Sonnet 4.5

batchInput

Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Sonnet 4.5

batchOutput

Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Haiku 4.5

cachedInput

Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.

OpenAI — GPT-5.4 Mini

cachedInput

Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.

OpenAI — GPT-5.4 Nano

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Nano

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Nano

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Pro

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.2

cachedInput

Derived at 10% of input; no residency uplift.

OpenAI — GPT-5.2

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2

batchOutput

Derived at 50% of output.

OpenAI — GPT-5

cachedInput

Derived at 10% of input.

OpenAI — GPT-5

batchInput

Derived at 50% of input.

OpenAI — GPT-5

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.5 Pro

cachedInput

Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.

OpenAI — GPT-5.5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.2 Pro

cachedInput

Derived at 10% of input — pro-tier convention.

OpenAI — GPT-5.2 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.1

batchInput

Derived at 50% of input.

OpenAI — GPT-5.1

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Nano

cachedInput

Derived at 10% of input.

OpenAI — GPT-5 Nano

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Nano

batchOutput

Derived at 50% of output.

Google — Gemini 3 Flash

cachedInput

Derived at 10% of input — Google caching discount convention ~90%.

Google — Gemini 3.1 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 3.1 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 3.1 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Pro

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.5 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

cachedInput

Derived at 25% of input per Google 2.0 family caching rates.

Google — Gemini 2.0 Flash

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.0 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

xAI — Grok 4 (legacy)

cachedInput

Extrapolated at 25% of base.

Methodology: Batch vs Realtime

How we keep this honest

How this calculator sources its numbers

Vendor verification freshness

Vendor-published values

Anthropic batch processing is billed at 50% of the standard API prices.

Most Message Batches API batches complete within 1 hour.

Amazon Bedrock offers select foundation models (FMs) from leading AI providers at a 50% lower price for batch inference compared to on-demand inference pricing.

Gemini Batch API provides a 50% cost reduction compared to real-time (Standard) pricing.

Mistral batch processing receives a 50% discount compared to real-time API pricing.

OpenAI Batch API offers a 50% cost discount compared to synchronous APIs.

OpenAI Batch API batches complete within a 24-hour window.

Vertex AI batch requests are billed at 0.5 times the real-time request price for the same model.

Gemini prompt cache reads are billed at 50% of the base input token price.

Gemini context caching storage fee is $1.00 per 1,000,000 tokens per hour.

Vendor pricing pages referenced

See an error or stale value?

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

Immediate steps you can take