Gemini API

google/gemini-3.1-flash-image-preview

Gemini 3.1 Flash Image

Gemini 3.1 Flash Image (also known as Nano Banana 2) is Google DeepMind's latest state-of-the-art image generation and editing model, combining Pro-level quality with the speed of the Flash architecture. It supports text and image input with up to 1M token context, generates images up to 4K resolution, and features advanced world knowledge, precise text rendering, subject consistency, and web-search grounding.

google/gemini-3.1-pro-preview

Gemini 3.1 Pro

Gemini 3.1 Pro is Google's most advanced reasoning model, building on the Gemini 3 series with over double the reasoning performance of its predecessor (77.1% on ARC-AGI-2) and a 1M token context window. It features a three-tier thinking system (low, medium, high) for adjustable reasoning depth and is optimized for agentic workflows, software engineering, and complex problem-solving.

google/gemini-3-flash-preview

Gemini 3 Flash

Gemini 3 Flash is Google's frontier intelligence model built for speed, combining Pro-grade reasoning with Flash-level latency at a fraction of the cost. It excels at agentic coding, complex analysis, and multimodal understanding with configurable thinking levels.

google/gemini-3-pro-image-preview

Gemini 3 Pro Image

Gemini 3 Pro Image (Nano Banana Pro) is Google's most advanced image generation and editing model built on Gemini 3 Pro, featuring studio-quality output with support for 2K/4K resolution. It excels at accurate text rendering in multiple languages, uses Google Search grounding for real-time data, and employs thinking mode for complex reasoning through prompts.

google/gemini-3-pro-preview

Gemini 3 Pro

Gemini 3 Pro is Google's most intelligent model, delivering state-of-the-art performance in reasoning, multimodal understanding, and agentic coding. It handles text, images, video, audio, and code with a 1M token context window and advanced tool-calling capabilities.

google/gemini-2.5-flash-lite-preview-09-2025

Google: Gemini 2.5 Flash Lite Preview 09-2025

Gemini 2.5 Flash-Lite Preview (September 2025) is a preview version of Google's cost-optimized Flash-Lite model. It's designed for high-volume classification, translation, and routing tasks with improved cost efficiency.

google/gemini-2.5-flash-preview-09-2025

Google: Gemini 2.5 Flash Preview 09-2025

Gemini 2.5 Flash Preview (September 2025) is a preview version of Google's hybrid reasoning Flash model with controllable thinking capabilities. It balances quality, cost, and latency for enterprise-scale applications.

google/imagen-4.0-ultra

Imagen 4 Ultra

Imagen 4 Ultra is Google's highest-fidelity text-to-image model designed for professional-grade realism with superior prompt adherence and nuanced interpretation of complex scenes. It delivers exceptional detail in textures, lighting, and atmosphere with 2K resolution output at $0.06 per image.

google/gemma-3n-e2b-it:free

Google: Gemma 3n 2B

Gemma 3n E2B Instruct (Free) is Google's mobile-first open model with an effective 2B parameter memory footprint using Per-Layer Embeddings. It's optimized for on-device AI with audio, text, image, and video understanding.

Google: Gemma 3n 4B

google/gemma-3n-e4b-it

Gemma 3n E4B Instruct is Google's mobile-optimized model with a 4B active memory footprint containing a nested 2B submodel for flexible quality-latency tradeoffs. It supports real-time multimodal processing on edge devices.

google/gemini-2.5-flash-lite

Gemini 2.5 Flash-Lite

Gemini 2.5 Flash-Lite is Google's cost-optimized version of 2.5 Flash, designed for high-volume tasks like classification, translation, and intelligent routing. It delivers efficient performance for cost-sensitive, high-scale operations.

google/gemini-2.5-pro-preview

Google: Gemini 2.5 Pro Preview 06-05

Gemini 2.5 Pro Preview is the preview version of Google's most advanced reasoning model with state-of-the-art coding and complex task performance. It features Deep Think mode, 1M token context, and advanced multimodal capabilities.

Imagen 4 Fast

google/imagen-4.0-fast

Imagen 4 Fast is Google's speed-optimized text-to-image model offering generation up to 10x faster than Imagen 3 at just $0.02 per image. It's ideal for rapid prototyping, high-volume tasks, and iterative exploration while maintaining improved text rendering and style versatility.

google/imagen-4.0-preview

Imagen 4 Preview

Imagen 4 Preview is the preview version of Google's flagship text-to-image diffusion model featuring photorealistic detail, improved typography, and support for up to 2K resolution. It balances quality and cost at $0.04 per image, making it suitable for a wide variety of creative tasks.

Google Veo 3

google/veo-3.0

Google Veo 3 is Google DeepMind's advanced AI video model that generates high-quality videos with native synchronized audio including dialogue, sound effects, and ambient noise directly from text prompts. It delivers state-of-the-art results in physics, realism, and prompt adherence with cinematic quality 8-second clips at up to 1080p resolution.

Google Veo 3 with Audio

google/veo-3.0-audio

Google Veo 3 with Audio is the audio-enabled configuration of Veo 3 that generates synchronized sound effects, dialogue, ambient noise, and music natively alongside video content. It produces complete audiovisual experiences from text prompts, eliminating the need for separate audio post-production.

Google Veo 3 Fast

google/veo-3.0-fast

Google Veo 3 Fast is a speed-optimized variant of Veo 3 that generates videos approximately 2x faster at 60-80% lower cost while maintaining high visual quality. It's designed for rapid iteration, prototyping, and cost-efficient production workflows at 720p resolution.

google/veo-3.0-fast-audio

Google Veo 3 Fast with Audio

Google Veo 3 Fast with Audio is the audio-enabled version of the speed-optimized Veo 3 Fast model, combining faster generation times and lower costs with native synchronized audio generation. It delivers sound effects, dialogue, and ambient audio while optimizing for speed and affordability in production workflows.

google/gemini-2.5-pro-preview-05-06

Google: Gemini 2.5 Pro Preview 05-06

Gemini 2.5 Pro Preview (May 6) is a dated preview snapshot of Google's flagship reasoning model with improvements in code and function calling. It offers advanced reasoning capabilities for complex enterprise use cases.

google/gemini-2.5-flash-image

Gemini 2.5 Flash Image

Gemini 2.5 Flash Image (codenamed Nano Banana) is Google's state-of-the-art multimodal model for fast, conversational image generation and editing with low latency. It maintains character consistency across prompts, enables precise local edits via natural language, and supports multi-image composition and fusion.

Gemini 2.5 Flash Image

google/flash-image-2.5

Gemini 2.5 Flash Image is a fast, natively multimodal image generation and editing model that excels at character consistency, multi-image fusion, and conversational editing using natural language. It supports targeted edits, style transfer, and leverages Gemini's world knowledge for context-aware image creation at $0.039 per image.

google/gemini-2.5-flash

Gemini 2.5 Flash

Gemini 2.5 Flash is Google's hybrid reasoning model balancing speed, cost, and intelligence with controllable thinking capabilities. It supports up to 1M tokens and excels at summarization, chat applications, and data extraction at scale.

Gemini 2.5 Pro

google/gemini-2.5-pro

Gemini 2.5 Pro is Google's most capable reasoning model with state-of-the-art performance on coding and complex tasks. It features a 1M token context window, advanced multimodal understanding, and Deep Think mode for enhanced reasoning.

Google: Gemma 3 12B

google/gemma-3-12b-it

Gemma 3 12B Instruct is Google's mid-sized open multimodal model supporting text and image input with a 128K token context window. It supports 140+ languages and offers strong performance for single-GPU deployment.

Google: Gemma 3 27B

google/gemma-3-27b-it

Gemma 3 27B Instruct is Google's most capable single-GPU open model with multimodal support, 128K context, and 140+ language support. It outperforms many larger models and offers state-of-the-art open-weight performance.

Google: Gemma 3 4B

google/gemma-3-4b-it

Gemma 3 4B Instruct is Google's compact multimodal open model supporting text and images with a 128K token context window. It's optimized for deployment on laptops and edge devices while maintaining strong capabilities.

google/gemini-2.0-flash-001

Google: Gemini 2.0 Flash

Gemini 2.0 Flash 001 is a stable versioned release of Gemini 2.0 Flash, Google's fast multimodal workhorse model. It provides consistent behavior for production deployments with native tool use and 1M token context support.

google/gemini-2.0-flash-lite-001

Google: Gemini 2.0 Flash Lite

Gemini 2.0 Flash-Lite 001 is a stable versioned release of Google's most cost-efficient model. It's optimized for large-scale text tasks with simplified pricing and consistent behavior for production use.

Google Veo 2

google/veo-2.0

Google Veo 2 is Google DeepMind's video generation model that creates 5-second, 720p-4K resolution videos from text or image prompts with realistic physics simulation and cinematic quality. It excels at following complex instructions, simulating real-world physics, and supporting diverse visual styles without native audio generation.

google/gemini-2.0-flash

Gemini 2.0 Flash

Gemini 2.0 Flash is Google's fast multimodal model with native tool use, 1M token context window, and support for text, images, video, and audio input. It's optimized for agentic workflows with low latency and cost-efficient inference.

google/gemini-2.0-flash-lite

Gemini 2.0 Flash-Lite

Gemini 2.0 Flash-Lite is Google's most cost-efficient model, optimized for large-scale text output tasks. It offers simplified pricing and lower costs than Flash while maintaining solid performance for high-volume workloads.

Google: Gemma 2 27B

google/gemma-2-27b-it

Gemma 2 27B Instruct is Google's open-weight instruction-tuned language model with 27 billion parameters, trained on 13 trillion tokens. It offers competitive performance with models twice its size and runs on a single high-end GPU.