Gemini API
Access Gemini instantly with Puter.js, and add AI to any app in a few lines of code without backend or API keys.
// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';
puter.ai.chat("Explain AI like I'm five!", {
model: "gemini-3-flash-preview"
}).then(response => {
console.log(response);
});
<html>
<body>
<script src="https://js.puter.com/v2/"></script>
<script>
puter.ai.chat("Explain AI like I'm five!", {
model: "gemini-3-flash-preview"
}).then(response => {
console.log(response);
});
</script>
</body>
</html>
List of Gemini Models
Gemini 3.1 Flash Image
google/gemini-3.1-flash-image-preview
Gemini 3.1 Flash Image (also known as Nano Banana 2) is Google DeepMind's latest state-of-the-art image generation and editing model, combining Pro-level quality with the speed of the Flash architecture. It supports text and image input with up to 1M token context, generates images up to 4K resolution, and features advanced world knowledge, precise text rendering, subject consistency, and web-search grounding.
ChatGemini 3.1 Pro
google/gemini-3.1-pro-preview
Gemini 3.1 Pro is Google's most advanced reasoning model, building on the Gemini 3 series with over double the reasoning performance of its predecessor (77.1% on ARC-AGI-2) and a 1M token context window. It features a three-tier thinking system (low, medium, high) for adjustable reasoning depth and is optimized for agentic workflows, software engineering, and complex problem-solving.
ChatGemini 3 Flash
google/gemini-3-flash-preview
Gemini 3 Flash is Google's frontier intelligence model built for speed, combining Pro-grade reasoning with Flash-level latency at a fraction of the cost. It excels at agentic coding, complex analysis, and multimodal understanding with configurable thinking levels.
ImageGemini 3 Pro Image
google/gemini-3-pro-image-preview
Gemini 3 Pro Image (Nano Banana Pro) is Google's most advanced image generation and editing model built on Gemini 3 Pro, featuring studio-quality output with support for 2K/4K resolution. It excels at accurate text rendering in multiple languages, uses Google Search grounding for real-time data, and employs thinking mode for complex reasoning through prompts.
ChatGemini 3 Pro
google/gemini-3-pro-preview
Gemini 3 Pro is Google's most intelligent model, delivering state-of-the-art performance in reasoning, multimodal understanding, and agentic coding. It handles text, images, video, audio, and code with a 1M token context window and advanced tool-calling capabilities.
ChatGoogle: Gemini 2.5 Flash Lite Preview 09-2025
google/gemini-2.5-flash-lite-preview-09-2025
Gemini 2.5 Flash-Lite Preview (September 2025) is a preview version of Google's cost-optimized Flash-Lite model. It's designed for high-volume classification, translation, and routing tasks with improved cost efficiency.
ChatGoogle: Gemini 2.5 Flash Preview 09-2025
google/gemini-2.5-flash-preview-09-2025
Gemini 2.5 Flash Preview (September 2025) is a preview version of Google's hybrid reasoning Flash model with controllable thinking capabilities. It balances quality, cost, and latency for enterprise-scale applications.
ImageImagen 4 Ultra
google/imagen-4.0-ultra
Imagen 4 Ultra is Google's highest-fidelity text-to-image model designed for professional-grade realism with superior prompt adherence and nuanced interpretation of complex scenes. It delivers exceptional detail in textures, lighting, and atmosphere with 2K resolution output at $0.06 per image.
ChatGoogle: Gemma 3n 2B
google/gemma-3n-e2b-it:free
Gemma 3n E2B Instruct (Free) is Google's mobile-first open model with an effective 2B parameter memory footprint using Per-Layer Embeddings. It's optimized for on-device AI with audio, text, image, and video understanding.
ChatGoogle: Gemma 3n 4B
google/gemma-3n-e4b-it
Gemma 3n E4B Instruct is Google's mobile-optimized model with a 4B active memory footprint containing a nested 2B submodel for flexible quality-latency tradeoffs. It supports real-time multimodal processing on edge devices.
ChatGemini 2.5 Flash-Lite
google/gemini-2.5-flash-lite
Gemini 2.5 Flash-Lite is Google's cost-optimized version of 2.5 Flash, designed for high-volume tasks like classification, translation, and intelligent routing. It delivers efficient performance for cost-sensitive, high-scale operations.
ChatGoogle: Gemini 2.5 Pro Preview 06-05
google/gemini-2.5-pro-preview
Gemini 2.5 Pro Preview is the preview version of Google's most advanced reasoning model with state-of-the-art coding and complex task performance. It features Deep Think mode, 1M token context, and advanced multimodal capabilities.
ImageImagen 4 Fast
google/imagen-4.0-fast
Imagen 4 Fast is Google's speed-optimized text-to-image model offering generation up to 10x faster than Imagen 3 at just $0.02 per image. It's ideal for rapid prototyping, high-volume tasks, and iterative exploration while maintaining improved text rendering and style versatility.
ImageImagen 4 Preview
google/imagen-4.0-preview
Imagen 4 Preview is the preview version of Google's flagship text-to-image diffusion model featuring photorealistic detail, improved typography, and support for up to 2K resolution. It balances quality and cost at $0.04 per image, making it suitable for a wide variety of creative tasks.
VideoGoogle Veo 3
google/veo-3.0
Google Veo 3 is Google DeepMind's advanced AI video model that generates high-quality videos with native synchronized audio including dialogue, sound effects, and ambient noise directly from text prompts. It delivers state-of-the-art results in physics, realism, and prompt adherence with cinematic quality 8-second clips at up to 1080p resolution.
VideoGoogle Veo 3 with Audio
google/veo-3.0-audio
Google Veo 3 with Audio is the audio-enabled configuration of Veo 3 that generates synchronized sound effects, dialogue, ambient noise, and music natively alongside video content. It produces complete audiovisual experiences from text prompts, eliminating the need for separate audio post-production.
VideoGoogle Veo 3 Fast
google/veo-3.0-fast
Google Veo 3 Fast is a speed-optimized variant of Veo 3 that generates videos approximately 2x faster at 60-80% lower cost while maintaining high visual quality. It's designed for rapid iteration, prototyping, and cost-efficient production workflows at 720p resolution.
VideoGoogle Veo 3 Fast with Audio
google/veo-3.0-fast-audio
Google Veo 3 Fast with Audio is the audio-enabled version of the speed-optimized Veo 3 Fast model, combining faster generation times and lower costs with native synchronized audio generation. It delivers sound effects, dialogue, and ambient audio while optimizing for speed and affordability in production workflows.
ChatGoogle: Gemini 2.5 Pro Preview 05-06
google/gemini-2.5-pro-preview-05-06
Gemini 2.5 Pro Preview (May 6) is a dated preview snapshot of Google's flagship reasoning model with improvements in code and function calling. It offers advanced reasoning capabilities for complex enterprise use cases.
ImageGemini 2.5 Flash Image
google/gemini-2.5-flash-image
Gemini 2.5 Flash Image (codenamed Nano Banana) is Google's state-of-the-art multimodal model for fast, conversational image generation and editing with low latency. It maintains character consistency across prompts, enables precise local edits via natural language, and supports multi-image composition and fusion.
ImageGemini 2.5 Flash Image
google/flash-image-2.5
Gemini 2.5 Flash Image is a fast, natively multimodal image generation and editing model that excels at character consistency, multi-image fusion, and conversational editing using natural language. It supports targeted edits, style transfer, and leverages Gemini's world knowledge for context-aware image creation at $0.039 per image.
ChatGemini 2.5 Flash
google/gemini-2.5-flash
Gemini 2.5 Flash is Google's hybrid reasoning model balancing speed, cost, and intelligence with controllable thinking capabilities. It supports up to 1M tokens and excels at summarization, chat applications, and data extraction at scale.
ChatGemini 2.5 Pro
google/gemini-2.5-pro
Gemini 2.5 Pro is Google's most capable reasoning model with state-of-the-art performance on coding and complex tasks. It features a 1M token context window, advanced multimodal understanding, and Deep Think mode for enhanced reasoning.
ChatGoogle: Gemma 3 12B
google/gemma-3-12b-it
Gemma 3 12B Instruct is Google's mid-sized open multimodal model supporting text and image input with a 128K token context window. It supports 140+ languages and offers strong performance for single-GPU deployment.
ChatGoogle: Gemma 3 27B
google/gemma-3-27b-it
Gemma 3 27B Instruct is Google's most capable single-GPU open model with multimodal support, 128K context, and 140+ language support. It outperforms many larger models and offers state-of-the-art open-weight performance.
ChatGoogle: Gemma 3 4B
google/gemma-3-4b-it
Gemma 3 4B Instruct is Google's compact multimodal open model supporting text and images with a 128K token context window. It's optimized for deployment on laptops and edge devices while maintaining strong capabilities.
ChatGoogle: Gemini 2.0 Flash
google/gemini-2.0-flash-001
Gemini 2.0 Flash 001 is a stable versioned release of Gemini 2.0 Flash, Google's fast multimodal workhorse model. It provides consistent behavior for production deployments with native tool use and 1M token context support.
ChatGoogle: Gemini 2.0 Flash Lite
google/gemini-2.0-flash-lite-001
Gemini 2.0 Flash-Lite 001 is a stable versioned release of Google's most cost-efficient model. It's optimized for large-scale text tasks with simplified pricing and consistent behavior for production use.
VideoGoogle Veo 2
google/veo-2.0
Google Veo 2 is Google DeepMind's video generation model that creates 5-second, 720p-4K resolution videos from text or image prompts with realistic physics simulation and cinematic quality. It excels at following complex instructions, simulating real-world physics, and supporting diverse visual styles without native audio generation.
ChatGemini 2.0 Flash
google/gemini-2.0-flash
Gemini 2.0 Flash is Google's fast multimodal model with native tool use, 1M token context window, and support for text, images, video, and audio input. It's optimized for agentic workflows with low latency and cost-efficient inference.
ChatGemini 2.0 Flash-Lite
google/gemini-2.0-flash-lite
Gemini 2.0 Flash-Lite is Google's most cost-efficient model, optimized for large-scale text output tasks. It offers simplified pricing and lower costs than Flash while maintaining solid performance for high-volume workloads.
ChatGoogle: Gemma 2 27B
google/gemma-2-27b-it
Gemma 2 27B Instruct is Google's open-weight instruction-tuned language model with 27 billion parameters, trained on 13 trillion tokens. It offers competitive performance with models twice its size and runs on a single high-end GPU.
ChatGoogle: Gemma 2 9B
google/gemma-2-9b-it
Gemma 2 9B Instruct is Google's efficient open-weight language model with 9 billion parameters, trained using knowledge distillation from the 27B model. It delivers strong performance for text generation while running on consumer hardware.
Frequently Asked Questions
The Gemini API gives you access to models for AI chat, image generation, and video generation. Through Puter.js, you can start using Gemini models instantly with zero setup or configuration.
Puter.js supports a variety of Gemini models, including Gemini 3.1 Flash Image, Gemini 3.1 Pro, Gemini 3 Flash, and more. Find all AI models supported by Puter.js in the AI model list.
With the User-Pays model, users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.
Puter.js is a JavaScript library that provides access to AI, storage, and other cloud services directly from a single API. It handles authentication, infrastructure, and scaling so you can focus on building your app.
Yes — the Gemini API through Puter.js works with any JavaScript framework, Node.js, or plain HTML. Just include the library and start building. See the documentation for more details.