Inception API

Q: How much does it cost?

With the User-Pays model , users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

Access Inception instantly with Puter.js, and add AI to any app in a few lines of code without backend or API keys.

Get Started Read Tutorial

// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';

puter.ai.chat("Explain AI like I'm five!", {
    model: "inception/mercury"
}).then(response => {
    console.log(response);
});

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        puter.ai.chat("Explain AI like I'm five!", {
            model: "inception/mercury"
        }).then(response => {
            console.log(response);
        });
    </script>
</body>
</html>

List of Inception Models

Chat

Mercury 2

inception/mercury-2

Mercury 2 is a diffusion-based reasoning language model by Inception Labs that generates text by refining multiple tokens in parallel rather than sequentially, achieving speeds of ~1,000 tokens per second — roughly 5-10x faster than comparable models like Claude Haiku and GPT-5 Mini. It scores competitively on reasoning benchmarks (91.1 AIME 2025, 73.6 GPQA) while offering pricing at $0.25/$0.75 per million input/output tokens with a 128K context window. It targets latency-sensitive production workloads like agent loops, voice assistants, coding tools, and real-time search.

Chat

Mercury Coder

inception/mercury-coder

Mercury Coder is a diffusion-based large language model specialized for code generation that achieves over 1,000 tokens per second on NVIDIA H100 GPUs. It's optimized for coding workflows including autocomplete, chat-based iteration, and code completion, delivering 5-10x faster speeds than models like GPT-4o Mini while maintaining comparable code quality.

Chat

Mercury

inception/mercury

Mercury is the world's first commercial diffusion large language model (dLLM) from Inception Labs that generates text 5-10x faster than traditional autoregressive LLMs by predicting multiple tokens in parallel. It's designed for latency-sensitive applications like voice agents, search interfaces, and chatbots while matching the quality of speed-optimized models like Claude 3.5 Haiku.

Frequently Asked Questions

What is this Inception API about?

The Inception API gives you access to models for AI chat. Through Puter.js, you can start using Inception models instantly with zero setup or configuration.

Which Inception models can I use?

Puter.js supports a variety of Inception models, including Mercury 2, Mercury Coder, and Mercury. Find all AI models supported by Puter.js in the AI model list.

How much does it cost?

With the User-Pays model, users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

What is Puter.js?

Puter.js is a JavaScript library that provides access to AI, storage, and other cloud services directly from a single API. It handles authentication, infrastructure, and scaling so you can focus on building your app.

Does this work with React / Vue / Vanilla JS / Node / etc.?

Yes — the Inception API through Puter.js works with any JavaScript framework, Node.js, or plain HTML. Just include the library and start building. See the documentation for more details.

Inception API

List of Inception Models

Mercury 2

Mercury Coder

Mercury

Frequently Asked Questions

Related Resources

Free, Unlimited Inception Mercury API

Getting Started with Puter.js

Free, Unlimited AI API