Embedding Cost · for RAG builders

What does your RAG setup cost to build + run?

Indexing, re-indexing, query-side embeddings, vector storage. Compare 9 embedding models side-by-side.

Pricing verified: 2026-07-28 9 embedding models

What this calculator does

Compare 9 embedding models on indexing + query-time cost for your RAG corpus.

Why use it

Embedding choice affects not just cost but retrieval quality — see both at once
Separate indexing cost (one-time) from query cost (recurring) — most people conflate them
Compare OpenAI, Voyage, Cohere, Gemini, BGE, Nomic side-by-side

New to this calculator? Start with the ⚡ Playground — a few sliders, instant ballpark. Then switch to the 🧮 Calculator for your exact number.

Two ways to use this: visualize in the Playground, then get your number in the Calculator.

▶Playground A quick, visual way to see which factors move your result the most. Open the playground → ▤Calculator Enter your real workload for a precise result you can apply to your own usage. Go to the calculator →

What will embeddings cost?

Embeddings are cheap. Indexing is a one-time cost amortized over a year; queries are ongoing. One rebuild/year assumed.

Estimated monthly cost

—

Change the input sliders below to see new estimates.

Docs to embed 500,000

Tokens / doc 1,500

Query embeddings / day 5,000

How we got this estimate

—

💡Monthly = (one-time index ÷ 12) + ongoing query embeddings. Corpus size sets indexing; queries/day set the ongoing trickle.

Pricing a vector database for RAG? Tap a typical corpus below or enter your own — we apply real embedding prices, re-indexing, and query volume so you can compare providers honestly.

📊 Not sure of a value? Fields with a ▾ Typical pill offer broad industry ballparks (sourced typical ranges; Embedding APIs Jul 2026: OpenAI 3-small $0.02/M tokens ($0.01 batch), 3-large $0.13/M; Voyage-4 $0.06 (lite $0.02); Cohere Embed v3 $0.10; Jina v3 $0.02; self-host BGE ~$0.001/M on spot GPU (crossover ~$500/mo API spend). Chunk overlap adds 11-25% billed tokens over raw corpus size.) so you can move forward now — your result gets more accurate as you replace them with your own measured numbers. Values marked * are rough estimates.

🎮 Interactive Guide & Calculator Playground →

Enter key values

Docs, tokens/doc, queries/month. Pick an embedding model.

Adjust re-indexing

How often do you re-embed the full corpus? Default: 2x/year.

Read 3-phase cost

Indexing / re-indexing / query-side costs broken out separately.

Pick the right model

Lower $/M isn't always winner — bigger models may have fewer chunks at same recall.

Enter corpus stats

Docs = total documents in your RAG corpus. Tokens/doc = average document length in tokens (1500 typical). Queries/month = retrieval calls (user queries that hit the vector DB). Query tokens = avg user query length. These drive indexing (one-time) vs query-side (recurring) cost.

Pick embedding + re-indexing cadence

Embedding model choice matters for quality AND cost. Voyage and Cohere are retrieval-tuned (better recall). OpenAI 3-small is cheap and general. BGE/Nomic are open-source. Re-indexing cadence depends on how often your corpus drifts — stable knowledge bases: 1-2x/year. Fast-changing content: quarterly or more. Dimension truncation (OpenAI 3-large supports it) can cut storage ~50% with minor quality loss.

Interpret 3-phase cost

Indexing cost (first-time) is usually negligible unless your corpus is huge (10M+ docs). Re-indexing amortized monthly — shows the recurring cost of keeping the index fresh. Query-side cost scales with monthly queries and is often dominant for high-traffic RAG. The alternatives table compares all 9 models at your workload.

Match model to workload

For retrieval quality benchmarks, see the source citations at the bottom of the calc. Pair this with Vector DB Cost for storage side, and RAG Pipeline for the full end-to-end view. Most teams over-pay on embeddings — switching OpenAI 3-small → Voyage-3-lite keeps cost similar with better retrieval.

👇 Now try the calculator below with your own AI workloads

🎛 CALCULATOR

Quick start — tap a typical corpus

📚 Your document corpus

What goes into the vector database.

Number of documents Hint: Files the AI searches. A round number is fine (e.g. 30,000).

Avg tokens / doc Hint: Length of each doc, in words. One-pager ~600, typical ~2,000.

Re-indexing frequency Hint: How often docs change. Rarely for handbooks, monthly for fast-moving info.

🔍 Query patterns

Queries / month Hint: Searches per month. A busy-month guess is fine.

Avg tokens / query Hint: Length of a typical question. A short sentence; leave as-is if unsure.

🧬 Your pick

Embedding model Hint: Pick the embedding model. The note below shows its rate.

Use Batch API for indexing (50% off where available) Hint: On if indexing can wait hours (~half price). Off if you need it now.

📈 RESULTS

Total monthly embedding cost

🏗 One-time indexing

🔄 Monthly re-indexing

🔍 Monthly query cost

💾 Vector storage estimate

Total vectors (chunks)-

Dimensions-

Raw vector size-

With index overhead (~1.3x)-

Storage note: Pinecone, Weaviate, Qdrant, pgvector - pricing varies widely. Typical: $0.025-$0.30 per GB/mo. Our vector DB cost guide breaks down each option.

💡 Recommendations

📊 Cost across every embedding model

Same corpus + queries, different embedding provider. Current selection highlighted.

Model	Dimensions	Max input	Indexing cost	Monthly cost	Annual cost

Query-side LLM cost → Margin calculator → Get a RAG architecture review →

🎯 Use this result to

🧬 Plan embedding budget — Index is a one-time spike. Queries are recurring. See the split before picking a provider.
🔄 Justify reindex cadence — Quarterly vs monthly reindex changes annual cost 4x. Math decides the schedule.
🆚 Compare 8 embedding providers — OpenAI, Cohere, Voyage, Mistral side-by-side. Pricing varies 10x across providers.
🔌 Integrate with your AI agents — MCP available for agentic workflow integration. Cost-aware embedding pipelines.

📅 Schedule a call to apply this to your workload

📋 What now?

Index once, query forever — indexing is a near one-time cost; the recurring spend is query-side embeddings, so optimize there first when queries/month is high.
Right-size the model — a cheaper or smaller-dimension model (e.g. OpenAI 3-small or a truncated 3-large) often keeps recall while cutting both $/1M and vector storage.
Batch the indexing job — if a re-index can wait hours, the Batch tier is ~50% off on providers that publish it.

📅 Book a working session to apply this to your workload →

What does your RAG setup cost to build + run?

Embedding Cost Calculator

Results

Go deeper

Need help using this calculator for your workloads?