Xiaomi MiMo-V2-Omni and MiMo-V2-Pro Are Now Available in Puter.js

Reynaldi Chernando

March 19, 2026

On this page

What is MiMo-V2-Omni?What is MiMo-V2-Pro?Examples Image analysis with MiMo-V2-Omni Agentic reasoning with MiMo-V2-Pro Complex task orchestration with MiMo-V2-Pro Get Started Now

Puter.js now supports MiMo-V2-Omni and MiMo-V2-Pro, two new flagship models from Xiaomi that bring frontier-level multimodal understanding and agentic reasoning to the MiMo V2 family. Add them to your application for free without any API keys.

What is MiMo-V2-Omni?

MiMo-V2-Omni is Xiaomi's omni-modal foundation model that natively processes text, image, and audio within a unified architecture. Rather than bolting modalities together, it integrates dedicated image and audio encoders into a single shared backbone where perception and action emerge as one continuous reasoning process. Key highlights include:

Unified Multimodal Input: Processes text, images, and audio simultaneously through a shared backbone with native structured tool calling
10+ Hours of Audio Understanding: One of the strongest audio understanding foundation models available, scoring 69.4 on MMAU-Pro (vs Gemini 3 Pro: 65.0)
Strong Vision: 76.8 on MMMU-Pro for visual reasoning and complex chart analysis
Agentic Capabilities: 74.8 on SWE-Bench Verified and 81.2 on PinchBench, outperforming Gemini 3 Pro and GPT-5.2 on multiple agentic benchmarks
256K Context Window: Supports extended multimodal context for complex real-world tasks

What is MiMo-V2-Pro?

MiMo-V2-Pro is Xiaomi's flagship text-only reasoning model built for the "agent era," featuring over 1T total parameters with 42B active — roughly 3x larger than MiMo-V2-Flash. It was previously tested anonymously as "Hunter Alpha" on OpenRouter, where it topped daily API call charts and accumulated over 1 trillion tokens during early testing. Key highlights include:

Elite Agentic Performance: 61.5 on ClawEval (#3 globally) approaching Claude Opus 4.6, and 81.0 on PinchBench
Strong Coding: 78.0 on SWE-Bench Verified with coding ability surpassing Claude 4.6 Sonnet
1M-Token Context Window: Hybrid attention architecture with a 7:1 ratio enabling high-intensity real-world applications
Cost Efficient: At $1/$3 per million tokens (input/output), roughly one-fifth the cost of comparable frontier models
Ranks 8th Globally: 2nd among Chinese LLMs on the Artificial Analysis Intelligence Index

	MiMo-V2-Omni	MiMo-V2-Pro
SWE-Bench Verified	74.8	78.0
PinchBench (avg)	81.2	81.0
ClawEval	54.8	61.5
Context Window	256K	1M
Input Cost	$0.40 / 1M tokens	$1.00 / 1M tokens
Output Cost	$2.00 / 1M tokens	$3.00 / 1M tokens

Examples

Image analysis with MiMo-V2-Omni

puter.ai.chat(
    "What do you see in this image? Describe it in detail.",
    "https://assets.puter.site/doge.jpeg",
    { model: 'xiaomi/mimo-v2-omni' }
);

Agentic reasoning with MiMo-V2-Pro

puter.ai.chat(
    "Design a migration plan to convert a REST API to GraphQL, including schema definitions, a phased rollout strategy, and resolver patterns with DataLoader for N+1 prevention.",
    { model: 'xiaomi/mimo-v2-pro', stream: true }
);

Complex task orchestration with MiMo-V2-Pro

puter.ai.chat(
    "Break down the implementation of a real-time collaborative document editor into components, dependencies, and a step-by-step build order with conflict resolution strategy.",
    { model: 'xiaomi/mimo-v2-pro' }
);

Get Started Now

Just add one library to your project:

// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';

Or add one script tag to your HTML:

<script src="https://js.puter.com/v2/"></script>

No API keys needed. Start building with MiMo-V2-Omni and MiMo-V2-Pro immediately.

Learn more:

Free, Serverless AI and Cloud

Start creating powerful web applications with Puter.js in seconds!

Get Started Now

Read the Docs • Try the Playground