Blog

Inception Mercury 2 Is Now Available in Puter.js

On this page

Puter.js now supports Inception Mercury 2, the fastest reasoning LLM from Inception Labs—powered by a diffusion-based architecture that generates text by refining multiple tokens in parallel rather than sequentially.

What is Mercury 2?

Mercury 2 is the first diffusion-based reasoning language model. Unlike traditional autoregressive models that generate one token at a time, Mercury 2 uses a coarse-to-fine diffusion process that refines entire outputs simultaneously—like an editor reworking a full draft at once.

Key highlights:

  • ~1,000 tokens per second — 5x faster than leading speed-optimized LLMs, with end-to-end latency of just 1.7 seconds
  • Strong reasoning — scores 91.1 on AIME 2025 and 73.6 on GPQA, competitive with much larger models
  • 128K context window — with support for tool usage and JSON output
  • Ultra-low cost — $0.25/$0.75 per million input/output tokens

Examples

Basic Chat

puter.ai.chat("Explain the concept of machine learning in simple terms",
  { model: 'inception/mercury-2' }
);

Reasoning Tasks

puter.ai.chat("Solve step by step: If a train leaves at 9am going 60mph and another leaves at 10am going 90mph, when does the second train catch up?",
  { model: 'inception/mercury-2' }
);

Streaming

const response = await puter.ai.chat(
  "Explain the evolution of programming languages from assembly to modern high-level languages",
  { model: 'inception/mercury-2', stream: true }
);

for await (const part of response) {
  puter.print(part?.text);
}

Get Started Now

Just add one library to your project:

// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';

Or add one script tag to your HTML:

<script src="https://js.puter.com/v2/"></script>

No API keys and no infrastructure setup. Start building with Mercury 2 immediately.

Learn more:

Free, Serverless AI and Cloud

Start creating powerful web applications with Puter.js in seconds!

Get Started Now

Read the Docs Try the Playground