Gemini API
Access Gemini instantly with Puter.js, and add AI to any app in a few lines of code without backend or API keys.
// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';
puter.ai.chat("Explain AI like I'm five!", {
model: "gemini-3-flash-preview"
}).then(response => {
console.log(response);
});
<html>
<body>
<script src="https://js.puter.com/v2/"></script>
<script>
puter.ai.chat("Explain AI like I'm five!", {
model: "gemini-3-flash-preview"
}).then(response => {
console.log(response);
});
</script>
</body>
</html>
List of Gemini Models
Gemma 4 26B A4B
google/gemma-4-26b-a4b-it
Gemma 4 26B A4B is a Mixture-of-Experts (MoE) open model from Google DeepMind, built from the same research as Gemini 3. It has 26B total parameters but activates only 3.8B per forward pass, delivering near-31B-dense quality at a fraction of the compute cost. The model supports a 256K token context window, multimodal image and text input, built-in step-by-step reasoning (thinking mode), and native function calling for agentic workflows. It currently ranks #6 among open models on the Arena AI text leaderboard with an estimated LMArena score of 1441 — competitive with models many times its active size. It excels at reasoning, coding, long-context tasks, and structured tool use. It's a strong pick for developers who need high throughput and low latency without sacrificing capability.
ChatGemma 4 31B
google/gemma-4-31b-it
Gemma 4 31B is a dense multimodal model from Google DeepMind, built on the same research foundation as Gemini 3. It is the most capable model in the Gemma 4 family, accepting text, image, and video input with a 256K-token context window. It delivers strong benchmark results: 89.2% on AIME 2026, 85.2% on MMLU Pro, 80.0% on LiveCodeBench v6, and 84.3% on GPQA Diamond. On the Arena AI text leaderboard, it ranks as the #3 open model globally, outperforming many models with far higher parameter counts. Gemma 4 31B features native function calling trained into the model, configurable chain-of-thought reasoning, and structured JSON output — making it especially well-suited for agentic workflows, coding tasks, and multi-turn tool use. It supports over 140 languages and serves as a strong foundation for fine-tuning.
VideoVeo 3.1 Lite
google/veo-3.1-lite
Veo 3.1 Lite is Google DeepMind's most cost-effective video generation model, built for high-volume applications where per-clip cost is a primary concern. It generates video at the same speed as Veo 3.1 Fast but at less than half the price — starting at $0.05 per second for 720p. The model supports text-to-video and image-to-video with 720p and 1080p output in landscape (16:9) or portrait (9:16), at configurable durations of 4, 6, or 8 seconds. It does not support 4K output, scene extension, or native audio generation — clips are silent by default. Veo 3.1 Lite is ideal for developers building batch video pipelines, social media automation, or interactive tools where cost per generation matters most and audio can be added in post-production.
ChatGemini 3.1 Flash Lite Preview
google/gemini-3.1-flash-lite-preview
Gemini 3.1 Flash Lite is Google's fastest and most cost-efficient model in the Gemini 3 series, optimized for high-volume, latency-sensitive tasks like translation, classification, and content moderation. Priced at $0.25/1M input tokens and $1.50/1M output tokens, it outperforms Gemini 2.5 Flash with 2.5x faster time-to-first-token and a 45% boost in output speed.
ImageGemini 3.1 Flash Image
google/gemini-3.1-flash-image-preview
Gemini 3.1 Flash Image (also known as Nano Banana 2) is Google DeepMind's latest state-of-the-art image generation and editing model, combining Pro-level quality with the speed of the Flash architecture. It supports text and image input with up to 1M token context, generates images up to 4K resolution, and features advanced world knowledge, precise text rendering, subject consistency, and web-search grounding.
ChatGemini 3.1 Pro
google/gemini-3.1-pro-preview
Gemini 3.1 Pro is Google's most advanced reasoning model, building on the Gemini 3 series with over double the reasoning performance of its predecessor (77.1% on ARC-AGI-2) and a 1M token context window. It features a three-tier thinking system (low, medium, high) for adjustable reasoning depth and is optimized for agentic workflows, software engineering, and complex problem-solving.
ChatGemini 3 Flash
google/gemini-3-flash-preview
Gemini 3 Flash is Google's frontier intelligence model built for speed, combining Pro-grade reasoning with Flash-level latency at a fraction of the cost. It excels at agentic coding, complex analysis, and multimodal understanding with configurable thinking levels.
ImageGemini 3 Pro Image
google/gemini-3-pro-image-preview
Gemini 3 Pro Image (Nano Banana Pro) is Google's most advanced image generation and editing model built on Gemini 3 Pro, featuring studio-quality output with support for 2K/4K resolution. It excels at accurate text rendering in multiple languages, uses Google Search grounding for real-time data, and employs thinking mode for complex reasoning through prompts.
ChatGemini 3 Pro
google/gemini-3-pro-preview
Gemini 3 Pro is Google's most intelligent model, delivering state-of-the-art performance in reasoning, multimodal understanding, and agentic coding. It handles text, images, video, audio, and code with a 1M token context window and advanced tool-calling capabilities.
VideoVeo 3.1
google/veo-3.1
Veo 3.1 is Google DeepMind's flagship AI video generation model, offering the highest quality output in the Veo family. It generates up to 4K resolution video with natively synchronized audio — including dialogue, sound effects, and ambient noise — all produced in a single joint diffusion process. The model supports text-to-video, image-to-video, reference image guidance (up to 3 images), and frame-to-frame generation. Clips are 8 seconds at base, extendable to over 60 seconds via scene chaining. Both 16:9 and native 9:16 aspect ratios are supported. Lip-sync accuracy sits under 120ms. Veo 3.1 achieved top human-preference scores on MovieGenBench for prompt adherence, visual quality, and audio sync, and state-of-the-art results on VBench I2V. It was the first major AI video model to support true 4K output. Best suited for high-fidelity creative and production work where quality is the priority over speed or cost.
VideoVeo 3.1 Fast
google/veo-3.1-fast
Veo 3.1 Fast is the speed-optimized variant of Google DeepMind's Veo 3.1 video model, generating output roughly twice as fast as the standard version with only a minor quality trade-off. Independent testing shows quality differences of 1–8% depending on scene complexity — negligible for most use cases. It retains the full feature set of the standard model: native audio generation, text-to-video, image-to-video, reference images, frame-to-frame generation, and support for 720p, 1080p, and 4K resolutions. An 8-second 720p clip typically completes in 30–45 seconds. At roughly one-fifth the per-second cost of the standard model, Veo 3.1 Fast is well suited for rapid prototyping, iterative prompt testing, and production workflows where turnaround time and budget matter more than maximum fidelity.
ChatGemini 2.5 Flash Lite Preview 09-2025
google/gemini-2.5-flash-lite-preview-09-2025
Gemini 2.5 Flash-Lite Preview (September 2025) is a preview version of Google's cost-optimized Flash-Lite model. It's designed for high-volume classification, translation, and routing tasks with improved cost efficiency.
ChatGemini 2.5 Flash Preview 09-2025
google/gemini-2.5-flash-preview-09-2025
Gemini 2.5 Flash Preview (September 2025) is a preview version of Google's hybrid reasoning Flash model with controllable thinking capabilities. It balances quality, cost, and latency for enterprise-scale applications.
ImageImagen 4 Fast
google/imagen-4.0-fast
Imagen 4 Fast is Google's speed-optimized text-to-image model offering generation up to 10x faster than Imagen 3 at just $0.02 per image. It's ideal for rapid prototyping, high-volume tasks, and iterative exploration while maintaining improved text rendering and style versatility.
ImageImagen 4 Ultra
google/imagen-4.0-ultra
Imagen 4 Ultra is Google's highest-fidelity text-to-image model designed for professional-grade realism with superior prompt adherence and nuanced interpretation of complex scenes. It delivers exceptional detail in textures, lighting, and atmosphere with 2K resolution output at $0.06 per image.
ImageImagen 4
google/imagen-4.0
Imagen 4 is Google DeepMind's flagship text-to-image generation model, available through the Gemini API and Google AI Studio. It delivers significant improvements over Imagen 3, particularly in rendering text, typography, and fine details like intricate fabrics and textures. The model supports output up to 2K resolution across a range of aspect ratios, generating images in roughly 2.5 seconds. A Fast variant optimized for high-volume use runs at $0.02 per image, while the standard model is $0.04 and the Ultra tier—built for precise prompt adherence—is $0.06. In human evaluations on GenAI-Bench, Imagen 4 scored highly against other leading image generation models on overall preference. All outputs are embedded with Google's SynthID watermark for AI-content traceability. It's a strong fit for developers building creative tools, marketing asset pipelines, or any application requiring reliable, high-quality image generation from text prompts.
ChatGemma 3n 2B
google/gemma-3n-e2b-it:free
Gemma 3n E2B Instruct (Free) is Google's mobile-first open model with an effective 2B parameter memory footprint using Per-Layer Embeddings. It's optimized for on-device AI with audio, text, image, and video understanding.
ChatGemma 3n 4B
google/gemma-3n-e4b-it
Gemma 3n E4B Instruct is Google's mobile-optimized model with a 4B active memory footprint containing a nested 2B submodel for flexible quality-latency tradeoffs. It supports real-time multimodal processing on edge devices.
ChatGemini 2.5 Flash-Lite
google/gemini-2.5-flash-lite
Gemini 2.5 Flash-Lite is Google's cost-optimized version of 2.5 Flash, designed for high-volume tasks like classification, translation, and intelligent routing. It delivers efficient performance for cost-sensitive, high-scale operations.
ChatGemini 2.5 Pro Preview 06-05
google/gemini-2.5-pro-preview
Gemini 2.5 Pro Preview is the preview version of Google's most advanced reasoning model with state-of-the-art coding and complex task performance. It features Deep Think mode, 1M token context, and advanced multimodal capabilities.
ImageImagen 4 Preview
google/imagen-4.0-preview
Imagen 4 Preview is the preview version of Google's flagship text-to-image diffusion model featuring photorealistic detail, improved typography, and support for up to 2K resolution. It balances quality and cost at $0.04 per image, making it suitable for a wide variety of creative tasks.
VideoVeo 3
google/veo-3.0
Google Veo 3 is Google DeepMind's advanced AI video model that generates high-quality videos with native synchronized audio including dialogue, sound effects, and ambient noise directly from text prompts. It delivers state-of-the-art results in physics, realism, and prompt adherence with cinematic quality 8-second clips at up to 1080p resolution.
VideoVeo 3 with Audio
google/veo-3.0-audio
Google Veo 3 with Audio is the audio-enabled configuration of Veo 3 that generates synchronized sound effects, dialogue, ambient noise, and music natively alongside video content. It produces complete audiovisual experiences from text prompts, eliminating the need for separate audio post-production.
VideoVeo 3 Fast
google/veo-3.0-fast
Google Veo 3 Fast is a speed-optimized variant of Veo 3 that generates videos approximately 2x faster at 60-80% lower cost while maintaining high visual quality. It's designed for rapid iteration, prototyping, and cost-efficient production workflows at 720p resolution.
VideoVeo 3 Fast with Audio
google/veo-3.0-fast-audio
Google Veo 3 Fast with Audio is the audio-enabled version of the speed-optimized Veo 3 Fast model, combining faster generation times and lower costs with native synchronized audio generation. It delivers sound effects, dialogue, and ambient audio while optimizing for speed and affordability in production workflows.
ChatGemini 2.5 Pro Preview 05-06
google/gemini-2.5-pro-preview-05-06
Gemini 2.5 Pro Preview (May 6) is a dated preview snapshot of Google's flagship reasoning model with improvements in code and function calling. It offers advanced reasoning capabilities for complex enterprise use cases.
ImageGemini 2.5 Flash Image
google/gemini-2.5-flash-image
Gemini 2.5 Flash Image (codenamed Nano Banana) is Google's state-of-the-art multimodal model for fast, conversational image generation and editing with low latency. It maintains character consistency across prompts, enables precise local edits via natural language, and supports multi-image composition and fusion.
ImageGemini 2.5 Flash Image
google/flash-image-2.5
Gemini 2.5 Flash Image is a fast, natively multimodal image generation and editing model that excels at character consistency, multi-image fusion, and conversational editing using natural language. It supports targeted edits, style transfer, and leverages Gemini's world knowledge for context-aware image creation at $0.039 per image.
ChatGemini 2.5 Flash
google/gemini-2.5-flash
Gemini 2.5 Flash is Google's hybrid reasoning model balancing speed, cost, and intelligence with controllable thinking capabilities. It supports up to 1M tokens and excels at summarization, chat applications, and data extraction at scale.
ChatGemini 2.5 Pro
google/gemini-2.5-pro
Gemini 2.5 Pro is Google's most capable reasoning model with state-of-the-art performance on coding and complex tasks. It features a 1M token context window, advanced multimodal understanding, and Deep Think mode for enhanced reasoning.
ChatGemma 3 12B
google/gemma-3-12b-it
Gemma 3 12B Instruct is Google's mid-sized open multimodal model supporting text and image input with a 128K token context window. It supports 140+ languages and offers strong performance for single-GPU deployment.
ChatGemma 3 27B
google/gemma-3-27b-it
Gemma 3 27B Instruct is Google's most capable single-GPU open model with multimodal support, 128K context, and 140+ language support. It outperforms many larger models and offers state-of-the-art open-weight performance.
ChatGemma 3 4B
google/gemma-3-4b-it
Gemma 3 4B Instruct is Google's compact multimodal open model supporting text and images with a 128K token context window. It's optimized for deployment on laptops and edge devices while maintaining strong capabilities.
ChatGemini 2.0 Flash
google/gemini-2.0-flash-001
Gemini 2.0 Flash 001 is a stable versioned release of Gemini 2.0 Flash, Google's fast multimodal workhorse model. It provides consistent behavior for production deployments with native tool use and 1M token context support.
ChatGemini 2.0 Flash Lite
google/gemini-2.0-flash-lite-001
Gemini 2.0 Flash-Lite 001 is a stable versioned release of Google's most cost-efficient model. It's optimized for large-scale text tasks with simplified pricing and consistent behavior for production use.
VideoVeo 2
google/veo-2.0
Google Veo 2 is Google DeepMind's video generation model that creates 5-second, 720p-4K resolution videos from text or image prompts with realistic physics simulation and cinematic quality. It excels at following complex instructions, simulating real-world physics, and supporting diverse visual styles without native audio generation.
ChatGemini 2.0 Flash
google/gemini-2.0-flash
Gemini 2.0 Flash is Google's fast multimodal model with native tool use, 1M token context window, and support for text, images, video, and audio input. It's optimized for agentic workflows with low latency and cost-efficient inference.
ChatGemini 2.0 Flash-Lite
google/gemini-2.0-flash-lite
Gemini 2.0 Flash-Lite is Google's most cost-efficient model, optimized for large-scale text output tasks. It offers simplified pricing and lower costs than Flash while maintaining solid performance for high-volume workloads.
ChatGemma 2 27B
google/gemma-2-27b-it
Gemma 2 27B Instruct is Google's open-weight instruction-tuned language model with 27 billion parameters, trained on 13 trillion tokens. It offers competitive performance with models twice its size and runs on a single high-end GPU.
ChatGemma 2 9B
google/gemma-2-9b-it
Gemma 2 9B Instruct is Google's efficient open-weight language model with 9 billion parameters, trained using knowledge distillation from the 27B model. It delivers strong performance for text generation while running on consumer hardware.
Frequently Asked Questions
The Gemini API gives you access to models for AI chat, image generation, and video generation. Through Puter.js, you can start using Gemini models instantly with zero setup or configuration.
Puter.js supports a variety of Gemini models, including Gemma 4 26B A4B, Gemma 4 31B, Veo 3.1 Lite, and more. Find all AI models supported by Puter.js in the AI model list.
With the User-Pays model, users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.
Puter.js is a JavaScript library that provides access to AI, storage, and other cloud services directly from a single API. It handles authentication, infrastructure, and scaling so you can focus on building your app.
Yes — the Gemini API through Puter.js works with any JavaScript framework, Node.js, or plain HTML. Just include the library and start building. See the documentation for more details.