Inception Mercury 2 Is Now Available in Puter.js
Puter.js now supports Inception Mercury 2, the fastest reasoning LLM from Inception Labs—powered by a diffusion-based architecture that generates text by refining multiple tokens in parallel rather than sequentially.
What is Mercury 2?
Mercury 2 is the first diffusion-based reasoning language model. Unlike traditional autoregressive models that generate one token at a time, Mercury 2 uses a coarse-to-fine diffusion process that refines entire outputs simultaneously—like an editor reworking a full draft at once.
Key highlights:
- ~1,000 tokens per second — 5x faster than leading speed-optimized LLMs, with end-to-end latency of just 1.7 seconds
- Strong reasoning — scores 91.1 on AIME 2025 and 73.6 on GPQA, competitive with much larger models
- 128K context window — with support for tool usage and JSON output
- Ultra-low cost — $0.25/$0.75 per million input/output tokens
Examples
Basic Chat
puter.ai.chat("Explain the concept of machine learning in simple terms",
{ model: 'inception/mercury-2' }
);
Reasoning Tasks
puter.ai.chat("Solve step by step: If a train leaves at 9am going 60mph and another leaves at 10am going 90mph, when does the second train catch up?",
{ model: 'inception/mercury-2' }
);
Streaming
const response = await puter.ai.chat(
"Explain the evolution of programming languages from assembly to modern high-level languages",
{ model: 'inception/mercury-2', stream: true }
);
for await (const part of response) {
puter.print(part?.text);
}
Get Started Now
Just add one library to your project:
// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';
Or add one script tag to your HTML:
<script src="https://js.puter.com/v2/"></script>
No API keys and no infrastructure setup. Start building with Mercury 2 immediately.
Learn more:
Free, Serverless AI and Cloud
Start creating powerful web applications with Puter.js in seconds!
Get Started Now