Tutorials

Gemini API Pricing

On this page

This guide breaks down everything you need to know about Gemini API pricing: every model, every tier, and every discount. Whether you're budgeting for a side project or planning enterprise-scale usage, you'll find the exact numbers here.

At the end, we'll also show you how to access Gemini models for free using Puter.js: no API keys, no billing setup, no cost to you as a developer. Puter is the pioneer of the "User-Pays" model, which allows developers to incorporate AI capabilities into their applications while each user will cover their own usage costs.

How Gemini API pricing works

Google charges based on tokens, the pieces of text the model reads and generates. As a rough estimate, 1 token is approximately 4 characters or 0.75 words in English. You're billed separately for:

  • Input tokens: the text you send to the model (your prompt, system instructions, conversation history)
  • Output tokens: the text the model generates in response (including thinking tokens for reasoning models)

All text prices below are per million tokens (MTok) in USD.

A unique advantage of Google's Gemini API is that it offers a free tier with generous rate limits, making it one of the few major AI providers where you can get started at zero cost.

Pricing tiers

Google offers three access tiers:

Tier Description
Free Limited rate limits, your content may be used to improve products
Paid (Standard) Higher limits, your content is not used for product improvement
Priority 1.8x standard pricing for guaranteed availability and lowest latency

Additionally, Batch processing is available at 50% off standard rates.

Model pricing

Gemini 3.1 Flash-Lite (latest lightweight model)

Tier Input Output
Free Free Free
Standard $0.25 / MTok $1.50 / MTok
Batch $0.125 / MTok $0.75 / MTok
Priority $0.45 / MTok $2.70 / MTok

Gemini 3.1 Pro Preview (latest flagship)

Tier Input (≤200K) Input (>200K) Output (≤200K) Output (>200K)
Standard $2.00 / MTok $4.00 / MTok $12.00 / MTok $18.00 / MTok
Batch $1.00 / MTok $2.00 / MTok $6.00 / MTok $9.00 / MTok
Priority $3.60 / MTok $7.20 / MTok $21.60 / MTok $32.40 / MTok

Gemini 3 Flash Preview

Tier Input Output
Free Free Free
Standard $0.50 / MTok $3.00 / MTok
Priority $0.90 / MTok $5.40 / MTok

Gemini 2.5 Pro

Tier Input (≤200K) Input (>200K) Output (≤200K) Output (>200K)
Free Free Free Free Free
Standard $1.25 / MTok $2.50 / MTok $10.00 / MTok $15.00 / MTok
Batch $0.625 / MTok $1.25 / MTok $5.00 / MTok $7.50 / MTok
Priority $2.25 / MTok $4.50 / MTok $18.00 / MTok $27.00 / MTok

Gemini 2.5 Flash

Tier Input Output
Free Free Free
Standard $0.30 / MTok $2.50 / MTok
Batch $0.15 / MTok $1.25 / MTok
Priority $0.54 / MTok $4.50 / MTok

Gemini 2.5 Flash-Lite

Tier Input Output
Free Free Free
Standard $0.10 / MTok $0.40 / MTok
Batch $0.05 / MTok $0.20 / MTok
Priority $0.18 / MTok $0.72 / MTok

Specialized models

Model Input Output Use Case
Gemini Robotics-ER 1.6 $1.00 / MTok $5.00 / MTok Robotics
Gemini 2.5 Computer Use $1.25–$2.50 / MTok $10.00–$15.00 / MTok Computer use / browser automation
Gemma 4 (open) Free Free Free tier only

Legacy models

Model Input Output Notes
Gemini 2.0 Flash $0.10 / MTok $0.40 / MTok Deprecated June 1, 2026

Which model should you choose?

  • Gemini 2.5 Flash-Lite: The cheapest option at $0.10/$0.40 per MTok. Great for high-volume, low-complexity tasks like classification, extraction, and routing.
  • Gemini 2.5 Flash: Best value for most applications. Strong reasoning at $0.30/$2.50 per MTok, with a free tier available.
  • Gemini 2.5 Pro: The most capable current production model. Best for complex reasoning, research, and multi-step analysis.
  • Gemini 3.1 Flash-Lite: The latest lightweight model with improved capabilities at $0.25/$1.50 per MTok.
  • Gemini 3.1 Pro Preview: Latest flagship preview with top-tier performance. Use when you need the best quality available.

For most developers starting out, Gemini 2.5 Flash is the best value. It's highly capable, has a free tier, and at $0.30/$2.50 per MTok is one of the cheapest flagship-tier models available.

What does this cost in practice?

To give you a sense of real-world costs with Gemini 2.5 Flash ($0.30 input / $2.50 output per MTok):

Use Case Approx. Tokens Estimated Cost
Single chat message (500 in / 500 out) 1,000 $0.001
Summarize a 10-page document ~5,000 in / 500 out $0.003
Analyze a 50-page PDF ~25,000 in / 2,000 out $0.01
Process 1,000 customer support tickets ~3.7M total ~$5.00
10,000 short API calls / day (30 days) ~300M/month ~$420/month

With Gemini 2.5 Flash-Lite ($0.10/$0.40 per MTok), these costs drop by roughly 75%.

Gemini vs GPT vs Claude vs Grok: price comparison

How does Gemini stack up against competing models?

Model Input Output Context Window
Gemini 2.5 Flash $0.30 / MTok $2.50 / MTok 1M
Gemini 2.5 Flash-Lite $0.10 / MTok $0.40 / MTok 1M
GPT-5.4 mini $0.75 / MTok $4.50 / MTok 128K
GPT-5.4 nano $0.20 / MTok $1.25 / MTok 128K
Claude Sonnet 4.6 $3 / MTok $15 / MTok 1M
Claude Haiku 4.5 $1 / MTok $5 / MTok 200K
grok-4.3 $1.25 / MTok $2.50 / MTok 1M
Gemini 2.5 Pro $1.25 / MTok $10 / MTok 1M
Claude Opus 4.7 $5 / MTok $25 / MTok 1M
GPT-5.5 $5 / MTok $30 / MTok 272K

Gemini 2.5 Flash-Lite is one of the cheapest models available from any major provider, and Gemini 2.5 Flash offers an unbeatable combination of quality, price, and a 1M token context window. Plus, the free tier means you can test and prototype at zero cost.

Context caching pricing

Context caching reduces costs by reusing previously processed parts of your prompt across API calls. Cached tokens are served at a fraction of the standard input price.

Model Cached Input Cache Storage
Gemini 3.1 Flash-Lite $0.025 / MTok $1.00 / MTok / hour
Gemini 3.1 Pro Preview $0.20–$0.40 / MTok $4.50 / MTok / hour
Gemini 2.5 Pro $0.125–$0.25 / MTok $4.50 / MTok / hour
Gemini 2.5 Flash $0.03 / MTok $1.00 / MTok / hour
Gemini 2.5 Flash-Lite $0.01 / MTok $1.00 / MTok / hour

Cached input is typically 90% cheaper than standard input. However, note that cache storage incurs an hourly cost, so caching is most cost-effective for frequently reused prompts.

Batch API pricing (50% off)

The Batch API processes requests asynchronously at half the standard price:

Model Batch Input Batch Output
Gemini 3.1 Flash-Lite $0.125 / MTok $0.75 / MTok
Gemini 3.1 Pro Preview $1.00–$2.00 / MTok $6.00–$9.00 / MTok
Gemini 2.5 Pro $0.625–$1.25 / MTok $5.00–$7.50 / MTok
Gemini 2.5 Flash $0.15 / MTok $1.25 / MTok
Gemini 2.5 Flash-Lite $0.05 / MTok $0.20 / MTok

Batch processing is ideal for bulk tasks like document analysis, data extraction, or content moderation where you can tolerate some latency.

Image generation pricing

Imagen 4

Quality Price per image
Fast $0.02
Standard $0.04
Ultra $0.06

Gemini 2.5 Flash (native image generation)

Tier Price per image
Standard $0.039
Batch $0.0195

Video generation pricing

Model Price per second
Veo 3.1 $0.05–$0.60 (varies by quality/resolution)
Veo 3 $0.10–$0.40
Veo 2 $0.35

Music generation pricing

Model Price
Lyria 3 Clip $0.04 / 30-second song
Lyria 3 Pro $0.08 / full song

Audio pricing

Audio input tokens are priced at roughly 2-3x text token pricing. For example, with Gemini 2.5 Flash:

Type Price
Audio input $1.00 / MTok
Text output $2.50 / MTok

Embedding pricing

Model Price
Gemini Embedding 2 (text) $0.20 / MTok
Gemini Embedding 2 (image) $0.45 / MTok (~$0.00012/image)
Gemini Embedding 2 (audio) $6.50 / MTok (~$0.00016/second)
Gemini Embedding 2 (video) $12.00 / MTok (~$0.00079/frame)
Gemini Embedding 001 $0.15 / MTok (batch: $0.075)

Tool and feature pricing

Google Search grounding

Model Generation Free Tier Paid Tier
Gemini 3.x models Up to 5,000 requests $14–$35 / 1,000 queries
Gemini 2.x models Up to 500 requests $14–$35 / 1,000 queries

Google Maps grounding

$25 per 1,000 grounded prompts after free tier limits.

Code execution

Charged at the model's standard token rates, no additional per-invocation fee.

$0.15 per million embedding tokens, plus standard retrieval costs.

Free tier details

Most Gemini models include a free tier with limited rate limits. Key details:

  • Available for Gemini 2.5 Pro, 2.5 Flash, 2.5 Flash-Lite, 3 Flash Preview, and 3.1 Flash-Lite
  • Rate limits vary by model (typically lower RPM and TPM than paid tiers)
  • Content may be used to improve Google products (paid tier opts out of this)
  • Great for prototyping and low-volume applications

Billing and payment

  • Billed monthly based on actual usage
  • Payments in USD
  • Free tier available for testing and low-volume use
  • Paid tier requires billing setup in Google AI Studio or Google Cloud
  • Enterprise tier available with custom support and compliance

Tips to reduce your Gemini API costs

Here are practical ways to keep your bill down:

  1. Start with Flash-Lite. At $0.10/$0.40 per MTok, Gemini 2.5 Flash-Lite handles many tasks well. Only upgrade to Flash or Pro when quality demands it.

  2. Use the free tier for prototyping. Test your application logic without spending anything, then switch to paid when you're ready for production.

  3. Leverage context caching. If you're sending the same system prompt or reference documents with every request, enable caching. Cached input is ~90% cheaper than standard input.

  4. Batch non-urgent work. Use the Batch API for 50% off on bulk processing. Document analysis, data extraction, and analytics don't need real-time responses.

  5. Stay under the 200K input threshold. For Pro models, input and output pricing jumps at 200K tokens. Keep prompts under this threshold when possible.

  6. Trim your inputs. Remove unnecessary conversation history, compress system prompts, and avoid sending entire documents when a relevant excerpt will do.

  7. Choose the right image quality. Imagen 4 Fast ($0.02/image) is 67% cheaper than Ultra ($0.06/image). Only use higher quality when you need it.

The free alternative: Puter.js

If you're a developer building an app that uses Gemini, there's a way to skip all of the above: no API keys, no billing setup, no rate limit management, and no cost to you.

Puter.js is a JavaScript SDK that gives you access to Gemini and 400+ other AI models directly from your frontend code, including chat, text-to-speech, image generation, and more. It uses a "User-Pays" model: each user of your app covers their own AI usage through their Puter account. You, the developer, pay nothing.

Here's what that means in practice:

Gemini API (Direct) Puter.js
Cost to developer Pay per token (free tier available) Free
API key required Yes No
Billing setup Google Cloud / AI Studio None
Rate limits Per-project tiers Per-user (handled by Puter)
Backend required Yes (to protect your key) No
Models available Gemini only Gemini + GPT + Claude + Grok + 500 more
Capabilities Chat, images, video, etc. (separate APIs) Chat, TTS, image generation, and more in one unified SDK

Try it now

Add one script tag to your HTML and start using Gemini immediately:

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        puter.ai.chat("Explain quantum computing in simple terms", {
            model: "google/gemini-2.5-flash"
        }).then(response => {
            document.body.innerHTML = response.message.content[0].text;
        });
    </script>
</body>
</html>

No API key. No backend. No billing. You can also use Gemini 2.5 Pro, Gemini 2.5 Flash-Lite, and every other Gemini model the same way.

You can also stream responses for a better user experience:

<html>
<body>
    <div id="output"></div>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        async function streamResponse() {
            const response = await puter.ai.chat("Write a short poem about coding", {
                model: "google/gemini-2.5-flash",
                stream: true
            });
            const output = document.getElementById('output');
            for await (const chunk of response) {
                if (chunk?.text) {
                    output.textContent += chunk.text;
                }
            }
        }
        streamResponse();
    </script>
</body>
</html>

Why developers choose Puter.js over direct API access

  • $0 infrastructure cost: Your users pay for their own usage, so your app costs nothing to run regardless of scale
  • No API key management: No keys to rotate, no secrets to protect, no backend needed to hide them
  • No rate limit headaches: Each user has their own limits, so one user's traffic never blocks another's
  • More than just chat: Text-to-speech, image generation, and other AI capabilities are all available through the same SDK, no juggling separate APIs
  • Access every AI provider: Switch between Gemini, GPT, Claude, Grok, DeepSeek, and more with one line of code, no separate accounts or billing for each
  • Ship faster: Go from idea to production in minutes, not days of billing setup and backend configuration


Free, Serverless AI and Cloud

Start creating powerful web applications with Puter.js in seconds!

Get Started Now

Read the Docs Try the Playground