Gemini API

google/gemini-3.5-flash

Gemini 3.5 Flash

Gemini 3.5 Flash is Google DeepMind's frontier-speed model that combines Flash-tier latency and cost with near-Pro-level reasoning, announced at Google I/O 2026. It processes output 4x faster than comparable frontier models while outperforming Gemini 3.1 Pro on coding and agentic benchmarks — 76.2% on Terminal-Bench 2.1, 83.6% on MCP Atlas, and 84.2% on CharXiv Reasoning. It's purpose-built for agentic workflows: orchestrating multi-step tool use, long-context document analysis, and iterative code generation. With a 1M token context window and full multimodal input support (text, image, audio, video, PDF), it handles complex real-world tasks at scale. At $1.50 per million input tokens and $9.00 per million output tokens, it's the best choice for developers who need frontier intelligence without frontier latency or cost.

google/gemma-4-26b-a4b-it

Gemma 4 26B A4B

Gemma 4 26B A4B is a Mixture-of-Experts (MoE) open model from Google DeepMind, built from the same research as Gemini 3. It has 26B total parameters but activates only 3.8B per forward pass, delivering near-31B-dense quality at a fraction of the compute cost. The model supports a 256K token context window, multimodal image and text input, built-in step-by-step reasoning (thinking mode), and native function calling for agentic workflows. It currently ranks #6 among open models on the Arena AI text leaderboard with an estimated LMArena score of 1441 — competitive with models many times its active size. It excels at reasoning, coding, long-context tasks, and structured tool use. It's a strong pick for developers who need high throughput and low latency without sacrificing capability.

Gemma 4 31B

google/gemma-4-31b-it

Gemma 4 31B is a dense multimodal model from Google DeepMind, built on the same research foundation as Gemini 3. It is the most capable model in the Gemma 4 family, accepting text, image, and video input with a 256K-token context window. It delivers strong benchmark results: 89.2% on AIME 2026, 85.2% on MMLU Pro, 80.0% on LiveCodeBench v6, and 84.3% on GPQA Diamond. On the Arena AI text leaderboard, it ranks as the #3 open model globally, outperforming many models with far higher parameter counts. Gemma 4 31B features native function calling trained into the model, configurable chain-of-thought reasoning, and structured JSON output — making it especially well-suited for agentic workflows, coding tasks, and multi-turn tool use. It supports over 140 languages and serves as a strong foundation for fine-tuning.

Veo 3.1 Lite

google/veo-3.1-lite

Veo 3.1 Lite is Google DeepMind's most cost-effective video generation model, built for high-volume applications where per-clip cost is a primary concern. It generates video at the same speed as Veo 3.1 Fast but at less than half the price — starting at $0.05 per second for 720p. The model supports text-to-video and image-to-video with 720p and 1080p output in landscape (16:9) or portrait (9:16), at configurable durations of 4, 6, or 8 seconds. It does not support 4K output, scene extension, or native audio generation — clips are silent by default. Veo 3.1 Lite is ideal for developers building batch video pipelines, social media automation, or interactive tools where cost per generation matters most and audio can be added in post-production.

google/gemini-3.1-flash-lite

Gemini 3.1 Flash-Lite

Gemini 3.1 Flash-Lite is Google's fastest and most cost-efficient model in the Gemini 3.1 series, designed for high-volume, latency-sensitive workloads. It delivers 2.5x faster time-to-first-token and 45% higher output throughput than Gemini 2.5 Flash, scoring 86.9% on GPQA Diamond and 76.8% on MMMU Pro. An Intelligence Index score of 34 — nearly triple that of 2.5 Flash-Lite — puts it well above prior budget-tier models at the same price point. The model supports text, image, video, audio, and PDF input with a 1M token context window and configurable thinking levels. It's the right choice for developers building translation pipelines, content moderation systems, classification tasks, or any application where throughput and cost matter most.

google/gemini-3.1-flash-image-preview

Gemini 3.1 Flash Image

Gemini 3.1 Flash Image (also known as Nano Banana 2) is Google DeepMind's latest state-of-the-art image generation and editing model, combining Pro-level quality with the speed of the Flash architecture. It supports text and image input with up to 1M token context, generates images up to 4K resolution, and features advanced world knowledge, precise text rendering, subject consistency, and web-search grounding.

google/gemini-3.1-pro-preview

Gemini 3.1 Pro

Gemini 3.1 Pro is Google's most advanced reasoning model, building on the Gemini 3 series with over double the reasoning performance of its predecessor (77.1% on ARC-AGI-2) and a 1M token context window. It features a three-tier thinking system (low, medium, high) for adjustable reasoning depth and is optimized for agentic workflows, software engineering, and complex problem-solving.

google/gemini-3-flash-preview

Gemini 3 Flash

Gemini 3 Flash is Google's frontier intelligence model built for speed, combining Pro-grade reasoning with Flash-level latency at a fraction of the cost. It excels at agentic coding, complex analysis, and multimodal understanding with configurable thinking levels.

google/gemini-3-pro-image-preview

Gemini 3 Pro Image

Gemini 3 Pro Image (Nano Banana Pro) is Google's most advanced image generation and editing model built on Gemini 3 Pro, featuring studio-quality output with support for 2K/4K resolution. It excels at accurate text rendering in multiple languages, uses Google Search grounding for real-time data, and employs thinking mode for complex reasoning through prompts.

google/gemini-3-pro-preview

Gemini 3 Pro

Gemini 3 Pro is Google's most intelligent model, delivering state-of-the-art performance in reasoning, multimodal understanding, and agentic coding. It handles text, images, video, audio, and code with a 1M token context window and advanced tool-calling capabilities.

Veo 3.1

google/veo-3.1

Veo 3.1 is Google DeepMind's flagship AI video generation model, offering the highest quality output in the Veo family. It generates up to 4K resolution video with natively synchronized audio — including dialogue, sound effects, and ambient noise — all produced in a single joint diffusion process. The model supports text-to-video, image-to-video, reference image guidance (up to 3 images), and frame-to-frame generation. Clips are 8 seconds at base, extendable to over 60 seconds via scene chaining. Both 16:9 and native 9:16 aspect ratios are supported. Lip-sync accuracy sits under 120ms. Veo 3.1 achieved top human-preference scores on MovieGenBench for prompt adherence, visual quality, and audio sync, and state-of-the-art results on VBench I2V. It was the first major AI video model to support true 4K output. Best suited for high-fidelity creative and production work where quality is the priority over speed or cost.

Veo 3.1 Fast

google/veo-3.1-fast

Veo 3.1 Fast is the speed-optimized variant of Google DeepMind's Veo 3.1 video model, generating output roughly twice as fast as the standard version with only a minor quality trade-off. Independent testing shows quality differences of 1–8% depending on scene complexity — negligible for most use cases. It retains the full feature set of the standard model: native audio generation, text-to-video, image-to-video, reference images, frame-to-frame generation, and support for 720p, 1080p, and 4K resolutions. An 8-second 720p clip typically completes in 30–45 seconds. At roughly one-fifth the per-second cost of the standard model, Veo 3.1 Fast is well suited for rapid prototyping, iterative prompt testing, and production workflows where turnaround time and budget matter more than maximum fidelity.

google/gemini-2.5-flash-lite-preview-09-2025

Gemini 2.5 Flash Lite Preview 09-2025

Gemini 2.5 Flash-Lite Preview (September 2025) is a preview version of Google's cost-optimized Flash-Lite model. It's designed for high-volume classification, translation, and routing tasks with improved cost efficiency.

google/gemini-2.5-flash-preview-09-2025

Gemini 2.5 Flash Preview 09-2025

Gemini 2.5 Flash Preview (September 2025) is a preview version of Google's hybrid reasoning Flash model with controllable thinking capabilities. It balances quality, cost, and latency for enterprise-scale applications.

Imagen 4 Fast

google/imagen-4.0-fast

Imagen 4 Fast is Google's speed-optimized text-to-image model offering generation up to 10x faster than Imagen 3 at just $0.02 per image. It's ideal for rapid prototyping, high-volume tasks, and iterative exploration while maintaining improved text rendering and style versatility.

google/imagen-4.0-ultra

Imagen 4 Ultra

Imagen 4 Ultra is Google's highest-fidelity text-to-image model designed for professional-grade realism with superior prompt adherence and nuanced interpretation of complex scenes. It delivers exceptional detail in textures, lighting, and atmosphere with 2K resolution output at $0.06 per image.

Imagen 4

google/imagen-4.0

Imagen 4 is Google DeepMind's flagship text-to-image generation model, available through the Gemini API and Google AI Studio. It delivers significant improvements over Imagen 3, particularly in rendering text, typography, and fine details like intricate fabrics and textures. The model supports output up to 2K resolution across a range of aspect ratios, generating images in roughly 2.5 seconds. A Fast variant optimized for high-volume use runs at $0.02 per image, while the standard model is $0.04 and the Ultra tier—built for precise prompt adherence—is $0.06. In human evaluations on GenAI-Bench, Imagen 4 scored highly against other leading image generation models on overall preference. All outputs are embedded with Google's SynthID watermark for AI-content traceability. It's a strong fit for developers building creative tools, marketing asset pipelines, or any application requiring reliable, high-quality image generation from text prompts.

google/gemma-3n-e2b-it:free

Gemma 3n 2B

Gemma 3n E2B Instruct (Free) is Google's mobile-first open model with an effective 2B parameter memory footprint using Per-Layer Embeddings. It's optimized for on-device AI with audio, text, image, and video understanding.

Gemma 3n 4B

google/gemma-3n-e4b-it

Gemma 3n E4B Instruct is Google's mobile-optimized model with a 4B active memory footprint containing a nested 2B submodel for flexible quality-latency tradeoffs. It supports real-time multimodal processing on edge devices.

google/gemini-2.5-flash-lite

Gemini 2.5 Flash-Lite

Gemini 2.5 Flash-Lite is Google's cost-optimized version of 2.5 Flash, designed for high-volume tasks like classification, translation, and intelligent routing. It delivers efficient performance for cost-sensitive, high-scale operations.

google/gemini-2.5-pro-preview

Gemini 2.5 Pro Preview 06-05

Gemini 2.5 Pro Preview is the preview version of Google's most advanced reasoning model with state-of-the-art coding and complex task performance. It features Deep Think mode, 1M token context, and advanced multimodal capabilities.

google/imagen-4.0-preview

Imagen 4 Preview

Imagen 4 Preview is the preview version of Google's flagship text-to-image diffusion model featuring photorealistic detail, improved typography, and support for up to 2K resolution. It balances quality and cost at $0.04 per image, making it suitable for a wide variety of creative tasks.

Veo 3

google/veo-3.0

Google Veo 3 is Google DeepMind's advanced AI video model that generates high-quality videos with native synchronized audio including dialogue, sound effects, and ambient noise directly from text prompts. It delivers state-of-the-art results in physics, realism, and prompt adherence with cinematic quality 8-second clips at up to 1080p resolution.

Veo 3 with Audio

google/veo-3.0-audio

Google Veo 3 with Audio is the audio-enabled configuration of Veo 3 that generates synchronized sound effects, dialogue, ambient noise, and music natively alongside video content. It produces complete audiovisual experiences from text prompts, eliminating the need for separate audio post-production.

Veo 3 Fast

google/veo-3.0-fast

Google Veo 3 Fast is a speed-optimized variant of Veo 3 that generates videos approximately 2x faster at 60-80% lower cost while maintaining high visual quality. It's designed for rapid iteration, prototyping, and cost-efficient production workflows at 720p resolution.

google/veo-3.0-fast-audio

Veo 3 Fast with Audio

Google Veo 3 Fast with Audio is the audio-enabled version of the speed-optimized Veo 3 Fast model, combining faster generation times and lower costs with native synchronized audio generation. It delivers sound effects, dialogue, and ambient audio while optimizing for speed and affordability in production workflows.

google/gemini-2.5-pro-preview-05-06

Gemini 2.5 Pro Preview 05-06

Gemini 2.5 Pro Preview (May 6) is a dated preview snapshot of Google's flagship reasoning model with improvements in code and function calling. It offers advanced reasoning capabilities for complex enterprise use cases.

google/gemini-2.5-flash-image

Gemini 2.5 Flash Image

Gemini 2.5 Flash Image (codenamed Nano Banana) is Google's state-of-the-art multimodal model for fast, conversational image generation and editing with low latency. It maintains character consistency across prompts, enables precise local edits via natural language, and supports multi-image composition and fusion.

google/gemini-2.5-flash

Gemini 2.5 Flash

Gemini 2.5 Flash is Google's hybrid reasoning model balancing speed, cost, and intelligence with controllable thinking capabilities. It supports up to 1M tokens and excels at summarization, chat applications, and data extraction at scale.

Gemini 2.5 Pro

google/gemini-2.5-pro

Gemini 2.5 Pro is Google's most capable reasoning model with state-of-the-art performance on coding and complex tasks. It features a 1M token context window, advanced multimodal understanding, and Deep Think mode for enhanced reasoning.

Gemma 3 12B

google/gemma-3-12b-it

Gemma 3 12B Instruct is Google's mid-sized open multimodal model supporting text and image input with a 128K token context window. It supports 140+ languages and offers strong performance for single-GPU deployment.

Gemma 3 27B

google/gemma-3-27b-it

Gemma 3 27B Instruct is Google's most capable single-GPU open model with multimodal support, 128K context, and 140+ language support. It outperforms many larger models and offers state-of-the-art open-weight performance.

Gemma 3 4B

google/gemma-3-4b-it

Gemma 3 4B Instruct is Google's compact multimodal open model supporting text and images with a 128K token context window. It's optimized for deployment on laptops and edge devices while maintaining strong capabilities.

google/gemini-2.0-flash-lite-001

Gemini 2.0 Flash Lite

Gemini 2.0 Flash-Lite 001 is a stable versioned release of Google's most cost-efficient model. It's optimized for large-scale text tasks with simplified pricing and consistent behavior for production use.

google/gemini-2.0-flash-001

Gemini 2.0 Flash

Gemini 2.0 Flash 001 is a stable versioned release of Gemini 2.0 Flash, Google's fast multimodal workhorse model. It provides consistent behavior for production deployments with native tool use and 1M token context support.

Veo 2

google/veo-2.0

Google Veo 2 is Google DeepMind's video generation model that creates 5-second, 720p-4K resolution videos from text or image prompts with realistic physics simulation and cinematic quality. It excels at following complex instructions, simulating real-world physics, and supporting diverse visual styles without native audio generation.

google/gemini-2.0-flash

Gemini 2.0 Flash

Gemini 2.0 Flash is Google's fast multimodal model with native tool use, 1M token context window, and support for text, images, video, and audio input. It's optimized for agentic workflows with low latency and cost-efficient inference.

google/gemini-2.0-flash-lite

Gemini 2.0 Flash-Lite

Gemini 2.0 Flash-Lite is Google's most cost-efficient model, optimized for large-scale text output tasks. It offers simplified pricing and lower costs than Flash while maintaining solid performance for high-volume workloads.

Gemma 2 27B

google/gemma-2-27b-it

Gemma 2 27B Instruct is Google's open-weight instruction-tuned language model with 27 billion parameters, trained on 13 trillion tokens. It offers competitive performance with models twice its size and runs on a single high-end GPU.