Gemini API Pricing: Full Breakdown of Costs (Jun 2026)
On this page
In this guide, you'll learn what every current Gemini model costs, how the billing works (including the things that can make your bill higher than expected), the most effective ways to cut costs, and the options for using Gemini models for free.
How much does the Gemini API cost?
Google's flagship model, Gemini 3.1 Pro, costs $2.00 per 1 million input tokens and $12.00 per 1 million output tokens for prompts up to 200K tokens. The cheapest current-generation model, Gemini 3.1 Flash-Lite, costs $0.25 per 1 million input tokens and $1.50 per 1 million output tokens.
Here's the quick view of the models most people ask about:
| Model | Best for | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|---|
| Gemini 3.1 Pro | Flagship: reasoning, agentic work, coding | $2.00 | $12.00 |
| Gemini 3.5 Flash | Frontier intelligence with speed and grounding | $1.50 | $9.00 |
| Gemini 3 Flash | Fast, capable production default | $0.50 | $3.00 |
| Gemini 3.1 Flash-Lite | High-volume simple tasks: translation, extraction, agents | $0.25 | $1.50 |
| Gemini 2.5 Flash-Lite | Cheapest model in the lineup (prior generation) | $0.10 | $0.40 |
| Gemini 3.1 Flash Image (Nano Banana) | Image generation | $0.50 | $0.067 per 1K image |
| Veo 3.1 | Video generation | — | $0.40 per second |
Two things make Gemini's pricing different from other providers. First, the Gemini API has a recurring free tier: the Flash-family models can be used free of charge within rate limits, with the trade-off that free-tier content is used to improve Google's products. Second, the flagship has two price levels: prompts over 200K tokens bill at $4.00 input and $18.00 output, double and 1.5x the standard rates.
All prices in this guide come from Google's official pricing documentation. Next, let's look at how the pricing works.
"Gemini pricing" can mean three different products. The consumer Gemini app has subscription plans (Free, AI Plus, AI Pro, AI Ultra) for the chat interface; these don't include API access. The Gemini API, accessed through Google AI Studio, is the pay-per-token developer product this article covers. Vertex AI serves the same models through Google Cloud with enterprise features like provisioned throughput and volume discounts, and its prices can differ from the Gemini API; check the Vertex AI pricing page if you're on Google Cloud.
How Gemini API pricing works
The Gemini API uses pay-as-you-go, per-token billing on the paid tier: you're charged for the tokens you send (input) and the tokens the model generates (output). A token is roughly 4 characters or 0.75 English words.
The free tier works differently: input and output are free of charge on supported models, limited by per-model rate limits, and your content is used to improve Google's products. On the paid tier, it isn't. This data distinction is stated directly on Google's pricing page for every model.
Thinking tokens are billed as output
Google's pricing tables label the output price as "including thinking tokens," which means the model's internal reasoning bills at the output rate even though you don't see it. A short visible answer can carry thousands of billed thinking tokens on a hard prompt. Thinking-capable models expose a thinking level setting; lowering it for simple tasks reduces output token counts directly.
Context caching has a storage meter
Cached input costs 90% less than fresh input: $0.20 instead of $2.00 per 1M tokens on Gemini 3.1 Pro, $0.15 instead of $1.50 on 3.5 Flash. The part most articles skip: explicit caching also bills a storage charge per token per hour that the cache exists. Storage costs $4.50 per 1M tokens per hour on 3.1 Pro and $1.00 per 1M tokens per hour on the Flash models.
The arithmetic matters. Storing 1M tokens on 3.1 Pro for one hour costs $4.50, which is more than two fresh reads of the same content at $2.00 each. Caching pays off when the cached content is read frequently within a short window; a cache that sits unused is a meter running. Create caches for high-frequency repeated context, set short lifetimes, and delete caches when a job finishes.
Multimodal input is tokenized at its own rates
Images, video, and PDFs convert to tokens and bill at the model's standard input rate on current models. Audio input costs more: $0.50 per 1M tokens on 3.1 Flash-Lite versus $0.25 for text, and $1.00 versus $0.50 on Gemini 3 Flash. PDFs bill at the image token rate, which on current models equals the text rate. Voice and audio workloads need their own estimate; text rates undercount them by 2x.
What makes your bill higher than expected
The 200K threshold on Pro models
Gemini 3.1 Pro and 2.5 Pro have two price levels. Prompts up to 200K tokens bill at the standard rate; prompts above 200K bill at 2x input and 1.5x output ($4.00/$18.00 on 3.1 Pro). The Flash models have flat pricing at any context length. If your workload routinely sends large contexts, this threshold decides whether Pro costs $2 or $4 per million input tokens, and chunking a 300K-token job into two requests can cost less than sending it whole.
Grounding bills per search query, not per request
Grounding with Google Search on Gemini 3 models includes 5,000 free prompts per month (shared across the Gemini 3 family), then costs $14 per 1,000 search queries. The detail that surprises people: one request can trigger multiple search queries, and each query is charged. Gemini 2.5 models use an older scheme: 1,500 free requests per day, then $35 per 1,000 grounded prompts. One thing Google does not charge for: retrieved search context is not billed as input tokens, unlike OpenAI's web search tool where it is.
Grounding with Google Maps follows the same structure: 5,000 free prompts per month on Gemini 3 models, then $14 per 1,000 queries.
Media generation is billed per image and per second
Image generation on Gemini 3.1 Flash Image (Nano Banana) costs $0.045 to $0.151 per image depending on resolution ($0.067 for a standard 1K image). The higher-quality Gemini 3 Pro Image costs $0.134 per 1K or 2K image and $0.24 per 4K image. Imagen 4 is priced flat per image: $0.02 (Fast), $0.04 (Standard), $0.06 (Ultra).
Video generation on Veo 3.1 costs $0.40 per second at 720p or 1080p and $0.60 per second at 4K; Veo 3.1 Fast costs $0.15 and $0.35. An 8-second standard clip is $3.20. You're only charged for successfully generated videos. Text-to-speech costs $10.00 per 1M audio output tokens on the Flash TTS model and $20.00 on Pro TTS, and the Lyria 3 music model costs $0.08 per full song.
At these rates, media dominates mixed workloads: one 8-second Veo clip costs as much as roughly 1M output tokens on Gemini 3 Flash.
Other tools
Code execution has no session fee; the generated code and its results bill as output tokens, then as input tokens when the model reuses them while reasoning. URL context bills the fetched content as input tokens with no per-call fee. File search charges $0.15 per 1M tokens for embedding documents at indexing time, and retrieved tokens bill as regular input. The Deep Research agent bills all of its intermediate reasoning and tool loops at Gemini 3 Pro rates, so a single research task can consume far more tokens than a normal request.
How to reduce Gemini API costs
The methods below are ordered by impact, with the most effective first.
1. Pick the right model for the job
The current lineup spans 8x on input between Flash-Lite and Pro ($0.25 to $2.00), and the prior-generation 2.5 Flash-Lite stretches that to 20x at $0.10. A simple decision guide:
- Translation, extraction, classification, high-volume agent steps → Gemini 3.1 Flash-Lite at $0.25/$1.50, or 2.5 Flash-Lite at $0.10/$0.40 if you don't need the newest generation.
- Production chat and most workloads → Gemini 3 Flash at $0.50/$3.00.
- Harder tasks that still need speed and grounding → Gemini 3.5 Flash at $1.50/$9.00.
- The hardest reasoning and agentic work → Gemini 3.1 Pro at $2.00/$12.00.
Routing most traffic to a Flash model and escalating hard cases reduces spend more than any other change on this list.
2. Keep Pro prompts under 200K tokens
Crossing the threshold doubles the input rate and raises output 1.5x. Trim context, summarize history, or split jobs rather than sending oversized prompts to Pro by default. Flash models don't have the threshold, so very long contexts can also simply route to Flash.
3. Use the Batch API for anything that can wait
The Batch API takes 50% off input and output on every model: Gemini 3.1 Pro drops to $1.00/$6.00, Gemini 3 Flash to $0.25/$1.50, and 3.1 Flash-Lite to $0.125/$0.75. The discount also applies to image generation (a 1K Nano Banana image drops to $0.034). Results return asynchronously, so it fits pipelines, evaluations, and bulk processing.
4. Cache repeated context, then delete the cache
Cache reads cost 10% of the input rate. To come out ahead, cache content that gets read repeatedly within its lifetime, account for the hourly storage charge, and delete caches when the workload ends. A long-lived cache of rarely used content costs more than not caching.
5. Control thinking levels
Thinking tokens bill as output. Set the thinking level down for tasks that don't need deep reasoning, and reserve high thinking budgets for the prompts that justify them.
6. Use the free grounding allowance deliberately
5,000 free grounded prompts per month across Gemini 3 models covers a lot of moderate use. Enable grounding only on requests that need live information, and the allowance stretches much further.
Can you use the Gemini API for free?
Yes, in more ways than any other major provider. Gemini is one of the major AI APIs with a recurring free tier, and there are other routes on top of it.
Puter.js: the User-Pays model
Puter.js is a JavaScript library that lets you add Gemini models to your app with no API key, no backend, and no bill to you as the developer. It works on the User-Pays model: each user of your app covers their own AI usage through their Puter account, so your costs stay at zero no matter how many users you have. Unlike the native free tier, there are no rate limits to design around and the Pro models are included.
<html>
<body>
<script src="https://js.puter.com/v2/"></script>
<script>
puter.ai.chat("Explain quantum computing in simple terms", {
model: "google/gemini-3.5-flash"
}).then(response => {
document.body.innerHTML = response.message.content;
});
</script>
</body>
</html>
To see what this saves, we ran the same numbers we used in our other pricing guides: 500 monthly users sending 30 messages each, averaging 1,000 input and 300 output tokens per message, for 15M input and 4.5M output tokens a month. On Gemini 3.5 Flash through the API, our calculation puts that at $22.50 for input and $40.50 for output, about $63 every month, growing linearly with your user base. Through Puter.js, the same app costs you $0 at 500 users, and still $0 at 50,000 users, because each user carries their own usage.
The native free tier
The Gemini API's free tier gives you free input and output tokens on the Flash-family models (including Gemini 3.5 Flash, 3 Flash, and 3.1 Flash-Lite), subject to per-model rate limits, with no credit card required. Google AI Studio itself is free of charge in all available regions. Two caveats to know before building on it. First, free-tier content is used to improve Google's products, so it's not suitable for sensitive or proprietary data. Second, the Pro models are excluded: Gemini 3.1 Pro is paid-tier only. The free tier fits prototyping, personal projects, and low-volume tools; the rate limits page lists the current per-model quotas.
Google Cloud credits
New Google Cloud customers receive trial credits (currently $300) that can pay for Gemini usage through Vertex AI. This is Google Cloud billing rather than the Gemini API, so it suits teams already planning to deploy on Vertex. Check the current terms on Google Cloud's site, since trial amounts and conditions change.
OpenRouter's free endpoints
OpenRouter periodically lists free variants of Gemini models (tagged :free). Free usage is capped at 50 requests per day and 20 per minute, rising to 1,000 per day once you've purchased at least $10 in credits. Since Gemini already has a native free tier with better limits, OpenRouter mainly makes sense if you're already routing other models through it.
Real-world cost examples
Per-million-token prices are hard to apply directly, so here's what we calculated for a few real workloads, using the same method as our other pricing guides: estimate tokens per request, multiply by volume, then by the per-million rate.
Customer support chatbot. We modeled 1,000 conversations a month, averaging 8 messages each, with about 1,200 input tokens (system prompt plus history) and 250 output tokens per message. That's 9.6M input and 2M output tokens monthly, which we priced as:
| Model | Monthly cost |
|---|---|
| Gemini 3.1 Flash-Lite | ~$5.40 |
| Gemini 3 Flash | ~$10.80 |
| Gemini 3.5 Flash | ~$32.40 |
| Gemini 3.1 Pro | ~$43.20 |
The same workload on the cheapest and flagship models differs 8x, which is why model choice comes first. Context caching on the repeated system prompt lowers these further, with the storage charge offsetting part of the saving.
Summarizing 100 PDFs. At ~20,000 tokens per document with 500-token summaries, that's 2M input and 50K output tokens. PDFs bill at the same rate as text on current models, so we calculate about $1.15 on Gemini 3 Flash, or about $0.58 through the Batch API.
Daily content generation. For 30 articles a month on Gemini 3.5 Flash, with 2,000-token prompts and roughly 4,000 output tokens each (including thinking tokens), we estimate about $1.17 a month.
Complete Gemini API pricing table
All prices are per 1M tokens, paid tier, standard processing. Where two prices are shown for Pro models, the first applies to prompts up to 200K tokens and the second above 200K.
Text models
| Model | Input | Cached input | Output |
|---|---|---|---|
| Gemini 3.1 Pro | $2.00 / $4.00 | $0.20 / $0.40 | $12.00 / $18.00 |
| Gemini 3.5 Flash | $1.50 | $0.15 | $9.00 |
| Gemini 3 Flash | $0.50 | $0.05 | $3.00 |
| Gemini 3.1 Flash-Lite | $0.25 | $0.025 | $1.50 |
| Gemini 2.5 Pro | $1.25 / $2.50 | $0.125 / $0.25 | $10.00 / $15.00 |
| Gemini 2.5 Flash | $0.30 | $0.03 | $2.50 |
| Gemini 2.5 Flash-Lite | $0.10 | $0.01 | $0.40 |
Audio input bills at higher rates (for example, $0.50 on 3.1 Flash-Lite and $1.00 on 3 Flash). Cache storage costs $1.00 per 1M tokens per hour on Flash models and $4.50 on Pro models. The Batch API takes 50% off input and output on all models.
Image, video, and audio generation
| Model | Price |
|---|---|
| Gemini 3.1 Flash Image (Nano Banana) | $0.045–$0.151 per image by resolution ($0.067 at 1K) |
| Gemini 3 Pro Image | $0.134 per 1K/2K image, $0.24 per 4K image |
| Gemini 2.5 Flash Image | $0.039 per image |
| Imagen 4 | $0.02 (Fast), $0.04 (Standard), $0.06 (Ultra) per image |
| Veo 3.1 | $0.40 / second (720p, 1080p), $0.60 / second (4K) |
| Veo 3.1 Fast | $0.15 / second (720p, 1080p), $0.35 / second (4K) |
| Flash TTS / Pro TTS | $10.00 / $20.00 per 1M audio output tokens |
| Lyria 3 (music) | $0.08 per song, $0.04 per 30-second clip |
| Gemini Embedding | $0.15 per 1M tokens ($0.075 batch) |
Tools
| Tool | Price |
|---|---|
| Grounding with Google Search (Gemini 3) | 5,000 prompts / month free, then $14.00 / 1K search queries |
| Grounding with Google Search (Gemini 2.5) | 1,500 requests / day free, then $35.00 / 1K grounded prompts |
| Grounding with Google Maps | 5,000 prompts / month free, then $14.00 / 1K queries |
| Code execution | No fee; code and results bill as tokens |
| URL context | No fee; fetched content bills as input tokens |
| File search | $0.15 / 1M tokens at indexing; retrieved tokens bill as input |
For models not listed here (Live API, computer use, robotics, Gemma open models, and dated snapshots), see Google's full pricing documentation.
Conclusion
Gemini API pricing in 2026 runs from $0.10 per million input tokens on Gemini 2.5 Flash-Lite to $2.00 (or $4.00 above 200K tokens) on Gemini 3.1 Pro, with image generation from $0.02 per image and video from $0.15 per second.
Keeping the bill predictable comes down to a few deliberate choices:
- Route most traffic to a Flash model and escalate only the hard cases.
- Keep Pro prompts under the 200K threshold.
- Push non-urgent work to the Batch API for 50% off.
- Cache only content that gets reused, and account for the hourly storage charge.
- Watch thinking tokens and per-query grounding charges, which don't show up in the headline rates.
Prices here were verified against Google's official pricing pages. Google updates pricing and model availability frequently, so always confirm current rates before committing to a budget.
Related
- How to Get a Gemini API Key
- Free Gemini API
- Free, Unlimited Google AI API
- Access Gemini Using OpenAI-Compatible API
- How to Use Gemini with the Vercel AI SDK
- How to do OAuth with Gemini
- Free, Unlimited Nano Banana API
- Free, Unlimited Imagen API
- Free, Unlimited Veo API
- Free, Unlimited Gemma API
- OpenAI API Pricing
- Claude API Pricing
- Grok API Pricing
- DeepSeek API Pricing
- Qwen API Pricing
- Mistral API Pricing
- Perplexity API Pricing
- MiniMax API Pricing
- Kimi API Pricing
- Z.ai GLM API Pricing
- Cohere API Pricing
- Free, Unlimited AI API
- Free LLM API
Free, Serverless AI and Cloud
Start creating powerful web applications with Puter.js in seconds!
Get Started Now