Xiaomi MiMo-V2-Omni and MiMo-V2-Pro Are Now Available in Puter.js
On this page
Puter.js now supports MiMo-V2-Omni and MiMo-V2-Pro, two new flagship models from Xiaomi that bring frontier-level multimodal understanding and agentic reasoning to the MiMo V2 family. Add them to your application for free without any API keys.
What is MiMo-V2-Omni?
MiMo-V2-Omni is Xiaomi's omni-modal foundation model that natively processes text, image, and audio within a unified architecture. Rather than bolting modalities together, it integrates dedicated image and audio encoders into a single shared backbone where perception and action emerge as one continuous reasoning process. Key highlights include:
- Unified Multimodal Input: Processes text, images, and audio simultaneously through a shared backbone with native structured tool calling
- 10+ Hours of Audio Understanding: One of the strongest audio understanding foundation models available, scoring 69.4 on MMAU-Pro (vs Gemini 3 Pro: 65.0)
- Strong Vision: 76.8 on MMMU-Pro for visual reasoning and complex chart analysis
- Agentic Capabilities: 74.8 on SWE-Bench Verified and 81.2 on PinchBench, outperforming Gemini 3 Pro and GPT-5.2 on multiple agentic benchmarks
- 256K Context Window: Supports extended multimodal context for complex real-world tasks
What is MiMo-V2-Pro?
MiMo-V2-Pro is Xiaomi's flagship text-only reasoning model built for the "agent era," featuring over 1T total parameters with 42B active — roughly 3x larger than MiMo-V2-Flash. It was previously tested anonymously as "Hunter Alpha" on OpenRouter, where it topped daily API call charts and accumulated over 1 trillion tokens during early testing. Key highlights include:
- Elite Agentic Performance: 61.5 on ClawEval (#3 globally) approaching Claude Opus 4.6, and 81.0 on PinchBench
- Strong Coding: 78.0 on SWE-Bench Verified with coding ability surpassing Claude 4.6 Sonnet
- 1M-Token Context Window: Hybrid attention architecture with a 7:1 ratio enabling high-intensity real-world applications
- Cost Efficient: At $1/$3 per million tokens (input/output), roughly one-fifth the cost of comparable frontier models
- Ranks 8th Globally: 2nd among Chinese LLMs on the Artificial Analysis Intelligence Index
| MiMo-V2-Omni | MiMo-V2-Pro | |
|---|---|---|
| SWE-Bench Verified | 74.8 | 78.0 |
| PinchBench (avg) | 81.2 | 81.0 |
| ClawEval | 54.8 | 61.5 |
| Context Window | 256K | 1M |
| Input Cost | $0.40 / 1M tokens | $1.00 / 1M tokens |
| Output Cost | $2.00 / 1M tokens | $3.00 / 1M tokens |
Examples
Image analysis with MiMo-V2-Omni
puter.ai.chat(
"What do you see in this image? Describe it in detail.",
"https://assets.puter.site/doge.jpeg",
{ model: 'xiaomi/mimo-v2-omni' }
);
Agentic reasoning with MiMo-V2-Pro
puter.ai.chat(
"Design a migration plan to convert a REST API to GraphQL, including schema definitions, a phased rollout strategy, and resolver patterns with DataLoader for N+1 prevention.",
{ model: 'xiaomi/mimo-v2-pro', stream: true }
);
Complex task orchestration with MiMo-V2-Pro
puter.ai.chat(
"Break down the implementation of a real-time collaborative document editor into components, dependencies, and a step-by-step build order with conflict resolution strategy.",
{ model: 'xiaomi/mimo-v2-pro' }
);
Get Started Now
Just add one library to your project:
// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';
Or add one script tag to your HTML:
<script src="https://js.puter.com/v2/"></script>
No API keys needed. Start building with MiMo-V2-Omni and MiMo-V2-Pro immediately.
Learn more:
Free, Serverless AI and Cloud
Start creating powerful web applications with Puter.js in seconds!
Get Started Now