Inception: Mercury

Q: Is Mercury free?

Yes, it is free if you're using it through Puter.js . With the User-Pays Model , you can add Mercury to your app at no cost — your users pay for their own AI usage directly, making it completely free for you as a developer.

This model is no longer available.

Add AI to your application with Puter.js.

Explore Other Models

Model Card

Mercury is the world's first commercial-scale diffusion large language model from Inception Labs. It generates text through iterative parallel refinement rather than sequential token prediction, enabling dramatically higher throughput without sacrificing output quality.

It matches the performance of frontier speed-optimized models such as GPT-4o Mini and Gemini 1.5 Flash across knowledge, coding, instruction-following, and math benchmarks, while running up to 10x faster. It is OpenAI API-compatible for straightforward integration.

Mercury is well-suited for API use cases that demand high concurrency, fast response times, or cost efficiency — including chat, summarization, and general-purpose text generation at scale.

Context Window 128K

tokens

Max Output 32K

tokens

Input Cost $0.25

per million tokens

Output Cost $0.75

per million tokens

Release Date Feb 24, 2025

Code Example

Add AI to your app with the Puter.js AI API — no API keys or setup required.

// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';

puter.ai.chat("Explain quantum computing in simple terms").then(response => {
    document.body.innerHTML = response.message.content;
});

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        puter.ai.chat("Explain quantum computing in simple terms").then(response => {
            document.body.innerHTML = response.message.content;
        });
    </script>
</body>
</html>

More AI Models From Inception

Find other Inception models →

Chat

Mercury 2

Mercury 2 is a diffusion-based reasoning language model from Inception Labs that refines all tokens in parallel rather than generating them sequentially, achieving over 1,000 tokens per second — roughly 5x faster than speed-optimized competitors like Claude Haiku and GPT-5 Mini at comparable quality. On reasoning benchmarks, Mercury 2 scores 91.1 on AIME 2025 and 73.6 on GPQA. It also placed second on the Copilot Arena leaderboard for quality while ranking first for speed overall. With a 128K context window, it is purpose-built for latency-sensitive applications — real-time assistants, high-throughput pipelines, and cost-conscious production workloads where reasoning capability matters.

Frequently Asked Questions

How do I use Mercury?

You can access Mercury by Inception through Puter.js AI API. Include the library in your web app or Node.js project and start making calls with just a few lines of JavaScript — no backend and no configuration required. You can also use it with Python or cURL via Puter's OpenAI-compatible API.

Is Mercury free?

Yes, it is free if you're using it through Puter.js. With the User-Pays Model, you can add Mercury to your app at no cost — your users pay for their own AI usage directly, making it completely free for you as a developer.

What is the pricing for Mercury?

Mercury costs $0.25 per 1M input tokens and $0.75 per 1M output tokens.

	Price per 1M tokens
Input	$0.25
Output	$0.75

Who created Mercury?

Mercury was created by Inception and released on Feb 24, 2025.

What is the context window of Mercury?

Mercury supports a context window of 128K tokens. For reference, that is roughly equivalent to 256 pages of text.

What is the max output length of Mercury?

Mercury can generate up to 32K tokens in a single response.

Does it work with React / Vue / Vanilla JS / Node / etc.?

Yes — the Mercury API works with any JavaScript framework, Node.js, or plain HTML through Puter.js. Just include the library and start building. See the documentation for more details.

Get started with Puter.js

Add AI to your application without worrying about API keys or setup.

Explore Models View Tutorials