Gemini

Gemini API

Access Gemini instantly with Puter.js, and add AI to any app in a few lines of code without backend or API keys.

// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';

puter.ai.chat("Explain AI like I'm five!", {
    model: "gemini-3-flash-preview"
}).then(response => {
    console.log(response);
});
<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        puter.ai.chat("Explain AI like I'm five!", {
            model: "gemini-3-flash-preview"
        }).then(response => {
            console.log(response);
        });
    </script>
</body>
</html>

List of Gemini Models

Image

Gemini 3.1 Flash Image

google/gemini-3.1-flash-image-preview

Gemini 3.1 Flash Image (also known as Nano Banana 2) is Google DeepMind's latest state-of-the-art image generation and editing model, combining Pro-level quality with the speed of the Flash architecture. It supports text and image input with up to 1M token context, generates images up to 4K resolution, and features advanced world knowledge, precise text rendering, subject consistency, and web-search grounding.

Chat

Gemini 3.1 Pro

google/gemini-3.1-pro-preview

Gemini 3.1 Pro is Google's most advanced reasoning model, building on the Gemini 3 series with over double the reasoning performance of its predecessor (77.1% on ARC-AGI-2) and a 1M token context window. It features a three-tier thinking system (low, medium, high) for adjustable reasoning depth and is optimized for agentic workflows, software engineering, and complex problem-solving.

Chat

Gemini 3 Flash

google/gemini-3-flash-preview

Gemini 3 Flash is Google's frontier intelligence model built for speed, combining Pro-grade reasoning with Flash-level latency at a fraction of the cost. It excels at agentic coding, complex analysis, and multimodal understanding with configurable thinking levels.

Image

Gemini 3 Pro Image

google/gemini-3-pro-image-preview

Gemini 3 Pro Image (Nano Banana Pro) is Google's most advanced image generation and editing model built on Gemini 3 Pro, featuring studio-quality output with support for 2K/4K resolution. It excels at accurate text rendering in multiple languages, uses Google Search grounding for real-time data, and employs thinking mode for complex reasoning through prompts.

Chat

Gemini 3 Pro

google/gemini-3-pro-preview

Gemini 3 Pro is Google's most intelligent model, delivering state-of-the-art performance in reasoning, multimodal understanding, and agentic coding. It handles text, images, video, audio, and code with a 1M token context window and advanced tool-calling capabilities.

Chat

Google: Gemini 2.5 Flash Lite Preview 09-2025

google/gemini-2.5-flash-lite-preview-09-2025

Gemini 2.5 Flash-Lite Preview (September 2025) is a preview version of Google's cost-optimized Flash-Lite model. It's designed for high-volume classification, translation, and routing tasks with improved cost efficiency.

Chat

Google: Gemini 2.5 Flash Preview 09-2025

google/gemini-2.5-flash-preview-09-2025

Gemini 2.5 Flash Preview (September 2025) is a preview version of Google's hybrid reasoning Flash model with controllable thinking capabilities. It balances quality, cost, and latency for enterprise-scale applications.

Image

Imagen 4 Ultra

google/imagen-4.0-ultra

Imagen 4 Ultra is Google's highest-fidelity text-to-image model designed for professional-grade realism with superior prompt adherence and nuanced interpretation of complex scenes. It delivers exceptional detail in textures, lighting, and atmosphere with 2K resolution output at $0.06 per image.

Chat

Google: Gemma 3n 2B

google/gemma-3n-e2b-it:free

Gemma 3n E2B Instruct (Free) is Google's mobile-first open model with an effective 2B parameter memory footprint using Per-Layer Embeddings. It's optimized for on-device AI with audio, text, image, and video understanding.

Chat

Google: Gemma 3n 4B

google/gemma-3n-e4b-it

Gemma 3n E4B Instruct is Google's mobile-optimized model with a 4B active memory footprint containing a nested 2B submodel for flexible quality-latency tradeoffs. It supports real-time multimodal processing on edge devices.

Chat

Gemini 2.5 Flash-Lite

google/gemini-2.5-flash-lite

Gemini 2.5 Flash-Lite is Google's cost-optimized version of 2.5 Flash, designed for high-volume tasks like classification, translation, and intelligent routing. It delivers efficient performance for cost-sensitive, high-scale operations.

Chat

Google: Gemini 2.5 Pro Preview 06-05

google/gemini-2.5-pro-preview

Gemini 2.5 Pro Preview is the preview version of Google's most advanced reasoning model with state-of-the-art coding and complex task performance. It features Deep Think mode, 1M token context, and advanced multimodal capabilities.

Image

Imagen 4 Fast

google/imagen-4.0-fast

Imagen 4 Fast is Google's speed-optimized text-to-image model offering generation up to 10x faster than Imagen 3 at just $0.02 per image. It's ideal for rapid prototyping, high-volume tasks, and iterative exploration while maintaining improved text rendering and style versatility.

Image

Imagen 4 Preview

google/imagen-4.0-preview

Imagen 4 Preview is the preview version of Google's flagship text-to-image diffusion model featuring photorealistic detail, improved typography, and support for up to 2K resolution. It balances quality and cost at $0.04 per image, making it suitable for a wide variety of creative tasks.

Video

Google Veo 3

google/veo-3.0

Google Veo 3 is Google DeepMind's advanced AI video model that generates high-quality videos with native synchronized audio including dialogue, sound effects, and ambient noise directly from text prompts. It delivers state-of-the-art results in physics, realism, and prompt adherence with cinematic quality 8-second clips at up to 1080p resolution.

Video

Google Veo 3 with Audio

google/veo-3.0-audio

Google Veo 3 with Audio is the audio-enabled configuration of Veo 3 that generates synchronized sound effects, dialogue, ambient noise, and music natively alongside video content. It produces complete audiovisual experiences from text prompts, eliminating the need for separate audio post-production.

Video

Google Veo 3 Fast

google/veo-3.0-fast

Google Veo 3 Fast is a speed-optimized variant of Veo 3 that generates videos approximately 2x faster at 60-80% lower cost while maintaining high visual quality. It's designed for rapid iteration, prototyping, and cost-efficient production workflows at 720p resolution.

Video

Google Veo 3 Fast with Audio

google/veo-3.0-fast-audio

Google Veo 3 Fast with Audio is the audio-enabled version of the speed-optimized Veo 3 Fast model, combining faster generation times and lower costs with native synchronized audio generation. It delivers sound effects, dialogue, and ambient audio while optimizing for speed and affordability in production workflows.

Chat

Google: Gemini 2.5 Pro Preview 05-06

google/gemini-2.5-pro-preview-05-06

Gemini 2.5 Pro Preview (May 6) is a dated preview snapshot of Google's flagship reasoning model with improvements in code and function calling. It offers advanced reasoning capabilities for complex enterprise use cases.

Image

Gemini 2.5 Flash Image

google/gemini-2.5-flash-image

Gemini 2.5 Flash Image (codenamed Nano Banana) is Google's state-of-the-art multimodal model for fast, conversational image generation and editing with low latency. It maintains character consistency across prompts, enables precise local edits via natural language, and supports multi-image composition and fusion.

Image

Gemini 2.5 Flash Image

google/flash-image-2.5

Gemini 2.5 Flash Image is a fast, natively multimodal image generation and editing model that excels at character consistency, multi-image fusion, and conversational editing using natural language. It supports targeted edits, style transfer, and leverages Gemini's world knowledge for context-aware image creation at $0.039 per image.

Chat

Gemini 2.5 Flash

google/gemini-2.5-flash

Gemini 2.5 Flash is Google's hybrid reasoning model balancing speed, cost, and intelligence with controllable thinking capabilities. It supports up to 1M tokens and excels at summarization, chat applications, and data extraction at scale.

Chat

Gemini 2.5 Pro

google/gemini-2.5-pro

Gemini 2.5 Pro is Google's most capable reasoning model with state-of-the-art performance on coding and complex tasks. It features a 1M token context window, advanced multimodal understanding, and Deep Think mode for enhanced reasoning.

Chat

Google: Gemma 3 12B

google/gemma-3-12b-it

Gemma 3 12B Instruct is Google's mid-sized open multimodal model supporting text and image input with a 128K token context window. It supports 140+ languages and offers strong performance for single-GPU deployment.

Chat

Google: Gemma 3 27B

google/gemma-3-27b-it

Gemma 3 27B Instruct is Google's most capable single-GPU open model with multimodal support, 128K context, and 140+ language support. It outperforms many larger models and offers state-of-the-art open-weight performance.

Chat

Google: Gemma 3 4B

google/gemma-3-4b-it

Gemma 3 4B Instruct is Google's compact multimodal open model supporting text and images with a 128K token context window. It's optimized for deployment on laptops and edge devices while maintaining strong capabilities.

Chat

Google: Gemini 2.0 Flash

google/gemini-2.0-flash-001

Gemini 2.0 Flash 001 is a stable versioned release of Gemini 2.0 Flash, Google's fast multimodal workhorse model. It provides consistent behavior for production deployments with native tool use and 1M token context support.

Chat

Google: Gemini 2.0 Flash Lite

google/gemini-2.0-flash-lite-001

Gemini 2.0 Flash-Lite 001 is a stable versioned release of Google's most cost-efficient model. It's optimized for large-scale text tasks with simplified pricing and consistent behavior for production use.

Video

Google Veo 2

google/veo-2.0

Google Veo 2 is Google DeepMind's video generation model that creates 5-second, 720p-4K resolution videos from text or image prompts with realistic physics simulation and cinematic quality. It excels at following complex instructions, simulating real-world physics, and supporting diverse visual styles without native audio generation.

Chat

Gemini 2.0 Flash

google/gemini-2.0-flash

Gemini 2.0 Flash is Google's fast multimodal model with native tool use, 1M token context window, and support for text, images, video, and audio input. It's optimized for agentic workflows with low latency and cost-efficient inference.

Chat

Gemini 2.0 Flash-Lite

google/gemini-2.0-flash-lite

Gemini 2.0 Flash-Lite is Google's most cost-efficient model, optimized for large-scale text output tasks. It offers simplified pricing and lower costs than Flash while maintaining solid performance for high-volume workloads.

Chat

Google: Gemma 2 27B

google/gemma-2-27b-it

Gemma 2 27B Instruct is Google's open-weight instruction-tuned language model with 27 billion parameters, trained on 13 trillion tokens. It offers competitive performance with models twice its size and runs on a single high-end GPU.

Chat

Google: Gemma 2 9B

google/gemma-2-9b-it

Gemma 2 9B Instruct is Google's efficient open-weight language model with 9 billion parameters, trained using knowledge distillation from the 27B model. It delivers strong performance for text generation while running on consumer hardware.

Frequently Asked Questions

What is this Gemini API about?

The Gemini API gives you access to models for AI chat, image generation, and video generation. Through Puter.js, you can start using Gemini models instantly with zero setup or configuration.

Which Gemini models can I use?

Puter.js supports a variety of Gemini models, including Gemini 3.1 Flash Image, Gemini 3.1 Pro, Gemini 3 Flash, and more. Find all AI models supported by Puter.js in the AI model list.

How much does it cost?

With the User-Pays model, users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

What is Puter.js?

Puter.js is a JavaScript library that provides access to AI, storage, and other cloud services directly from a single API. It handles authentication, infrastructure, and scaling so you can focus on building your app.

Does this work with React / Vue / Vanilla JS / Node / etc.?

Yes — the Gemini API through Puter.js works with any JavaScript framework, Node.js, or plain HTML. Just include the library and start building. See the documentation for more details.