On this page

How much does the Gemini API cost?How Gemini API pricing works What makes your bill higher than expected How to reduce Gemini API costs Can you use the Gemini API for free?Real-world cost examples Complete Gemini API pricing table Conclusion Related

Gemini API Pricing: Full Breakdown of Costs (Jun 2026)

Nariman Jelveh, Reynaldi Chernando

Updated: June 12, 2026

On this page

In this guide, you'll learn what every current Gemini model costs, how the billing works (including the things that can make your bill higher than expected), the most effective ways to cut costs, and the options for using Gemini models for free.

How much does the Gemini API cost?

Google's flagship model, Gemini 3.1 Pro, costs $2.00 per 1 million input tokens and $12.00 per 1 million output tokens for prompts up to 200K tokens. The cheapest current-generation model, Gemini 3.1 Flash-Lite, costs $0.25 per 1 million input tokens and $1.50 per 1 million output tokens.

Here's the quick view of the models most people ask about:

Model	Best for	Input (per 1M tokens)	Output (per 1M tokens)
Gemini 3.1 Pro	Flagship: reasoning, agentic work, coding	$2.00	$12.00
Gemini 3.5 Flash	Frontier intelligence with speed and grounding	$1.50	$9.00
Gemini 3 Flash	Fast, capable production default	$0.50	$3.00
Gemini 3.1 Flash-Lite	High-volume simple tasks: translation, extraction, agents	$0.25	$1.50
Gemini 2.5 Flash-Lite	Cheapest model in the lineup (prior generation)	$0.10	$0.40
Gemini 3.1 Flash Image (Nano Banana)	Image generation	$0.50	$0.067 per 1K image
Veo 3.1	Video generation	—	$0.40 per second

Two things make Gemini's pricing different from other providers. First, the Gemini API has a recurring free tier: the Flash-family models can be used free of charge within rate limits, with the trade-off that free-tier content is used to improve Google's products. Second, the flagship has two price levels: prompts over 200K tokens bill at $4.00 input and $18.00 output, double and 1.5x the standard rates.

All prices in this guide come from Google's official pricing documentation. Next, let's look at how the pricing works.

"Gemini pricing" can mean three different products. The consumer Gemini app has subscription plans (Free, AI Plus, AI Pro, AI Ultra) for the chat interface; these don't include API access. The Gemini API, accessed through Google AI Studio, is the pay-per-token developer product this article covers. Vertex AI serves the same models through Google Cloud with enterprise features like provisioned throughput and volume discounts, and its prices can differ from the Gemini API; check the Vertex AI pricing page if you're on Google Cloud.

How Gemini API pricing works

The Gemini API uses pay-as-you-go, per-token billing on the paid tier: you're charged for the tokens you send (input) and the tokens the model generates (output). A token is roughly 4 characters or 0.75 English words.

The free tier works differently: input and output are free of charge on supported models, limited by per-model rate limits, and your content is used to improve Google's products. On the paid tier, it isn't. This data distinction is stated directly on Google's pricing page for every model.

Thinking tokens are billed as output

Google's pricing tables label the output price as "including thinking tokens," which means the model's internal reasoning bills at the output rate even though you don't see it. A short visible answer can carry thousands of billed thinking tokens on a hard prompt. Thinking-capable models expose a thinking level setting; lowering it for simple tasks reduces output token counts directly.

Context caching has a storage meter

Cached input costs 90% less than fresh input: $0.20 instead of $2.00 per 1M tokens on Gemini 3.1 Pro, $0.15 instead of $1.50 on 3.5 Flash. The part most articles skip: explicit caching also bills a storage charge per token per hour that the cache exists. Storage costs $4.50 per 1M tokens per hour on 3.1 Pro and $1.00 per 1M tokens per hour on the Flash models.

The arithmetic matters. Storing 1M tokens on 3.1 Pro for one hour costs $4.50, which is more than two fresh reads of the same content at $2.00 each. Caching pays off when the cached content is read frequently within a short window; a cache that sits unused is a meter running. Create caches for high-frequency repeated context, set short lifetimes, and delete caches when a job finishes.

Multimodal input is tokenized at its own rates

Images, video, and PDFs convert to tokens and bill at the model's standard input rate on current models. Audio input costs more: $0.50 per 1M tokens on 3.1 Flash-Lite versus $0.25 for text, and $1.00 versus $0.50 on Gemini 3 Flash. PDFs bill at the image token rate, which on current models equals the text rate. Voice and audio workloads need their own estimate; text rates undercount them by 2x.

What makes your bill higher than expected

The 200K threshold on Pro models

Gemini 3.1 Pro and 2.5 Pro have two price levels. Prompts up to 200K tokens bill at the standard rate; prompts above 200K bill at 2x input and 1.5x output ($4.00/$18.00 on 3.1 Pro). The Flash models have flat pricing at any context length. If your workload routinely sends large contexts, this threshold decides whether Pro costs $2 or $4 per million input tokens, and chunking a 300K-token job into two requests can cost less than sending it whole.

Grounding bills per search query, not per request

Grounding with Google Search on Gemini 3 models includes 5,000 free prompts per month (shared across the Gemini 3 family), then costs $14 per 1,000 search queries. The detail that surprises people: one request can trigger multiple search queries, and each query is charged. Gemini 2.5 models use an older scheme: 1,500 free requests per day, then $35 per 1,000 grounded prompts. One thing Google does not charge for: retrieved search context is not billed as input tokens, unlike OpenAI's web search tool where it is.

Grounding with Google Maps follows the same structure: 5,000 free prompts per month on Gemini 3 models, then $14 per 1,000 queries.

Media generation is billed per image and per second

Image generation on Gemini 3.1 Flash Image (Nano Banana) costs $0.045 to $0.151 per image depending on resolution ($0.067 for a standard 1K image). The higher-quality Gemini 3 Pro Image costs $0.134 per 1K or 2K image and $0.24 per 4K image. Imagen 4 is priced flat per image: $0.02 (Fast), $0.04 (Standard), $0.06 (Ultra).

Video generation on Veo 3.1 costs $0.40 per second at 720p or 1080p and $0.60 per second at 4K; Veo 3.1 Fast costs $0.15 and $0.35. An 8-second standard clip is $3.20. You're only charged for successfully generated videos. Text-to-speech costs $10.00 per 1M audio output tokens on the Flash TTS model and $20.00 on Pro TTS, and the Lyria 3 music model costs $0.08 per full song.

At these rates, media dominates mixed workloads: one 8-second Veo clip costs as much as roughly 1M output tokens on Gemini 3 Flash.

Other tools

Code execution has no session fee; the generated code and its results bill as output tokens, then as input tokens when the model reuses them while reasoning. URL context bills the fetched content as input tokens with no per-call fee. File search charges $0.15 per 1M tokens for embedding documents at indexing time, and retrieved tokens bill as regular input. The Deep Research agent bills all of its intermediate reasoning and tool loops at Gemini 3 Pro rates, so a single research task can consume far more tokens than a normal request.

How to reduce Gemini API costs

The methods below are ordered by impact, with the most effective first.

1. Pick the right model for the job

The current lineup spans 8x on input between Flash-Lite and Pro ($0.25 to $2.00), and the prior-generation 2.5 Flash-Lite stretches that to 20x at $0.10. A simple decision guide:

Translation, extraction, classification, high-volume agent steps → Gemini 3.1 Flash-Lite at $0.25/$1.50, or 2.5 Flash-Lite at $0.10/$0.40 if you don't need the newest generation.
Production chat and most workloads → Gemini 3 Flash at $0.50/$3.00.
Harder tasks that still need speed and grounding → Gemini 3.5 Flash at $1.50/$9.00.
The hardest reasoning and agentic work → Gemini 3.1 Pro at $2.00/$12.00.

Routing most traffic to a Flash model and escalating hard cases reduces spend more than any other change on this list.

2. Keep Pro prompts under 200K tokens

Crossing the threshold doubles the input rate and raises output 1.5x. Trim context, summarize history, or split jobs rather than sending oversized prompts to Pro by default. Flash models don't have the threshold, so very long contexts can also simply route to Flash.

3. Use the Batch API for anything that can wait

The Batch API takes 50% off input and output on every model: Gemini 3.1 Pro drops to $1.00/$6.00, Gemini 3 Flash to $0.25/$1.50, and 3.1 Flash-Lite to $0.125/$0.75. The discount also applies to image generation (a 1K Nano Banana image drops to $0.034). Results return asynchronously, so it fits pipelines, evaluations, and bulk processing.

4. Cache repeated context, then delete the cache

Cache reads cost 10% of the input rate. To come out ahead, cache content that gets read repeatedly within its lifetime, account for the hourly storage charge, and delete caches when the workload ends. A long-lived cache of rarely used content costs more than not caching.

5. Control thinking levels

Thinking tokens bill as output. Set the thinking level down for tasks that don't need deep reasoning, and reserve high thinking budgets for the prompts that justify them.

6. Use the free grounding allowance deliberately

5,000 free grounded prompts per month across Gemini 3 models covers a lot of moderate use. Enable grounding only on requests that need live information, and the allowance stretches much further.

Can you use the Gemini API for free?

Yes, in more ways than any other major provider. Gemini is one of the major AI APIs with a recurring free tier, and there are other routes on top of it.

Puter.js: the User-Pays model

Puter.js is a JavaScript library that lets you add Gemini models to your app with no API key, no backend, and no bill to you as the developer. It works on the User-Pays model: each user of your app covers their own AI usage through their Puter account, so your costs stay at zero no matter how many users you have. Unlike the native free tier, there are no rate limits to design around and the Pro models are included.

<html>
<body>
  <script src="https://js.puter.com/v2/"></script>
  <script>
    puter.ai.chat("Explain quantum computing in simple terms", {
      model: "google/gemini-3.5-flash"
    }).then(response => {
      document.body.innerHTML = response.message.content;
    });
  </script>
</body>
</html>

To see what this saves, we ran the same numbers we used in our other pricing guides: 500 monthly users sending 30 messages each, averaging 1,000 input and 300 output tokens per message, for 15M input and 4.5M output tokens a month. On Gemini 3.5 Flash through the API, our calculation puts that at $22.50 for input and $40.50 for output, about $63 every month, growing linearly with your user base. Through Puter.js, the same app costs you $0 at 500 users, and still $0 at 50,000 users, because each user carries their own usage.

The native free tier

The Gemini API's free tier gives you free input and output tokens on the Flash-family models (including Gemini 3.5 Flash, 3 Flash, and 3.1 Flash-Lite), subject to per-model rate limits, with no credit card required. Google AI Studio itself is free of charge in all available regions. Two caveats to know before building on it. First, free-tier content is used to improve Google's products, so it's not suitable for sensitive or proprietary data. Second, the Pro models are excluded: Gemini 3.1 Pro is paid-tier only. The free tier fits prototyping, personal projects, and low-volume tools; the rate limits page lists the current per-model quotas.

Google Cloud credits

New Google Cloud customers receive trial credits (currently $300) that can pay for Gemini usage through Vertex AI. This is Google Cloud billing rather than the Gemini API, so it suits teams already planning to deploy on Vertex. Check the current terms on Google Cloud's site, since trial amounts and conditions change.

OpenRouter's free endpoints

OpenRouter periodically lists free variants of Gemini models (tagged :free). Free usage is capped at 50 requests per day and 20 per minute, rising to 1,000 per day once you've purchased at least $10 in credits. Since Gemini already has a native free tier with better limits, OpenRouter mainly makes sense if you're already routing other models through it.

Real-world cost examples

Per-million-token prices are hard to apply directly, so here's what we calculated for a few real workloads, using the same method as our other pricing guides: estimate tokens per request, multiply by volume, then by the per-million rate.

Customer support chatbot. We modeled 1,000 conversations a month, averaging 8 messages each, with about 1,200 input tokens (system prompt plus history) and 250 output tokens per message. That's 9.6M input and 2M output tokens monthly, which we priced as:

Model	Monthly cost
Gemini 3.1 Flash-Lite	~$5.40
Gemini 3 Flash	~$10.80
Gemini 3.5 Flash	~$32.40
Gemini 3.1 Pro	~$43.20

The same workload on the cheapest and flagship models differs 8x, which is why model choice comes first. Context caching on the repeated system prompt lowers these further, with the storage charge offsetting part of the saving.

Summarizing 100 PDFs. At ~20,000 tokens per document with 500-token summaries, that's 2M input and 50K output tokens. PDFs bill at the same rate as text on current models, so we calculate about $1.15 on Gemini 3 Flash, or about $0.58 through the Batch API.

Daily content generation. For 30 articles a month on Gemini 3.5 Flash, with 2,000-token prompts and roughly 4,000 output tokens each (including thinking tokens), we estimate about $1.17 a month.

Complete Gemini API pricing table

All prices are per 1M tokens, paid tier, standard processing. Where two prices are shown for Pro models, the first applies to prompts up to 200K tokens and the second above 200K.

Text models

Model	Input	Cached input	Output
Gemini 3.1 Pro	$2.00 / $4.00	$0.20 / $0.40	$12.00 / $18.00
Gemini 3.5 Flash	$1.50	$0.15	$9.00
Gemini 3 Flash	$0.50	$0.05	$3.00
Gemini 3.1 Flash-Lite	$0.25	$0.025	$1.50
Gemini 2.5 Pro	$1.25 / $2.50	$0.125 / $0.25	$10.00 / $15.00
Gemini 2.5 Flash	$0.30	$0.03	$2.50
Gemini 2.5 Flash-Lite	$0.10	$0.01	$0.40

Audio input bills at higher rates (for example, $0.50 on 3.1 Flash-Lite and $1.00 on 3 Flash). Cache storage costs $1.00 per 1M tokens per hour on Flash models and $4.50 on Pro models. The Batch API takes 50% off input and output on all models.

Image, video, and audio generation

Model	Price
Gemini 3.1 Flash Image (Nano Banana)	$0.045–$0.151 per image by resolution ($0.067 at 1K)
Gemini 3 Pro Image	$0.134 per 1K/2K image, $0.24 per 4K image
Gemini 2.5 Flash Image	$0.039 per image
Imagen 4	$0.02 (Fast), $0.04 (Standard), $0.06 (Ultra) per image
Veo 3.1	$0.40 / second (720p, 1080p), $0.60 / second (4K)
Veo 3.1 Fast	$0.15 / second (720p, 1080p), $0.35 / second (4K)
Flash TTS / Pro TTS	$10.00 / $20.00 per 1M audio output tokens
Lyria 3 (music)	$0.08 per song, $0.04 per 30-second clip
Gemini Embedding	$0.15 per 1M tokens ($0.075 batch)

Tools

Tool	Price
Grounding with Google Search (Gemini 3)	5,000 prompts / month free, then $14.00 / 1K search queries
Grounding with Google Search (Gemini 2.5)	1,500 requests / day free, then $35.00 / 1K grounded prompts
Grounding with Google Maps	5,000 prompts / month free, then $14.00 / 1K queries
Code execution	No fee; code and results bill as tokens
URL context	No fee; fetched content bills as input tokens
File search	$0.15 / 1M tokens at indexing; retrieved tokens bill as input

For models not listed here (Live API, computer use, robotics, Gemma open models, and dated snapshots), see Google's full pricing documentation.

Conclusion

Gemini API pricing in 2026 runs from $0.10 per million input tokens on Gemini 2.5 Flash-Lite to $2.00 (or $4.00 above 200K tokens) on Gemini 3.1 Pro, with image generation from $0.02 per image and video from $0.15 per second.

Keeping the bill predictable comes down to a few deliberate choices:

Route most traffic to a Flash model and escalate only the hard cases.
Keep Pro prompts under the 200K threshold.
Push non-urgent work to the Batch API for 50% off.
Cache only content that gets reused, and account for the hourly storage charge.
Watch thinking tokens and per-query grounding charges, which don't show up in the headline rates.

Prices here were verified against Google's official pricing pages. Google updates pricing and model availability frequently, so always confirm current rates before committing to a budget.

Free, Serverless AI and Cloud

Start creating powerful web applications with Puter.js in seconds!

Get Started Now

Read the Docs • Try the Playground