MiniMax API

Q: How much does it cost?

With the User-Pays model , users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

Access MiniMax instantly with Puter.js, and add AI to any app in a few lines of code without backend or API keys.

Get Started Read Tutorial

// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';

puter.ai.chat("Explain AI like I'm five!", {
    model: "minimax/minimax-m3"
}).then(response => {
    console.log(response);
});

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        puter.ai.chat("Explain AI like I'm five!", {
            model: "minimax/minimax-m3"
        }).then(response => {
            console.log(response);
        });
    </script>
</body>
</html>

List of MiniMax Models

Chat

MiniMax M3

minimax/minimax-m3

MiniMax M3 is a frontier-level multimodal language model built for long-horizon coding, agentic workflows, and complex reasoning. It introduces MiniMax Sparse Attention (MSA), which delivers 15x faster decoding and 9x faster prefill at 1M-token context compared to the prior generation. M3 scores 59.0% on SWE-Bench Pro — surpassing GPT-5.5 and Gemini 3.1 Pro — and achieves the highest score on Claw-Eval (74.5%) for autonomous agent tasks. It accepts text, image, and video inputs natively and supports tool calling through standard MCP scaffolding. Ideal for developers building coding agents, long-document pipelines, and multi-step automation that require sustained, multi-hour autonomous execution.

Chat

MiniMax M2.7

minimax/minimax-m2.7

MiniMax M2.7 is a proprietary reasoning LLM from Chinese AI startup MiniMax, released on March 18, 2026, notable for being one of the first commercial models to actively participate in its own training through autonomous self-evolution loops. It excels at agentic coding workflows with a 56.2% score on SWE-Pro and strong performance in office productivity tasks, scoring the highest ELO (1495) on GDPval-AA among open-source-tier models. It targets developers building complex agent systems and automated workflows.

Chat

MiniMax M2.7 Highspeed

minimax/minimax-m2.7-highspeed

MiniMax M2.7 Highspeed is a high-throughput, inference-optimized variant of MiniMax M2.7, delivering approximately 100 tokens per second — roughly 66% faster than the standard version. It shares the same model weights and MoE architecture as M2.7, so output quality and reasoning capability are identical; the speed advantage comes entirely from inference-layer routing and batching optimizations. It supports text and image inputs with a 204K context window and features automatic prompt caching and parallel tool calling. Best suited for live coding assistants, autonomous agent pipelines, and interactive workflows where low latency and high throughput matter.

Chat

MiniMax M2.5

minimax/minimax-m2.5

MiniMax M2.5 is a 230B-parameter Mixture-of-Experts model (10B active) from Shanghai-based MiniMax, designed for real-world productivity with state-of-the-art performance in coding (80.2% SWE-Bench Verified), agentic tool use, and search tasks. It rivals top models from Anthropic and OpenAI while costing 1/10th to 1/20th the price, positioning itself as frontier intelligence 'too cheap to meter.' The model excels at full-stack development, office work (Word, Excel, PowerPoint), and autonomous agent workflows.

Chat

MiniMax M2.5 Highspeed

minimax/minimax-m2.5-highspeed

MiniMax M2.5 Highspeed is an optimized variant of M2.5 engineered for ultra-low latency and high-throughput workloads. It delivers the same core intelligence as the standard M2.5—including top-tier coding performance (80.2% SWE-Bench Verified) and strong agentic tool use—through rigorous inference optimization, running at approximately 100 tokens per second. The model is well-suited for latency-sensitive interactive applications, large-scale document automation, and high-frequency agentic pipelines where responsiveness matters as much as quality. Compared to standard M2.5, it trades cost efficiency for significantly faster response times, making it the practical choice when speed is a priority.

Chat

MiniMax M2-her

minimax/minimax-m2-her

MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn conversations. It stays consistent in tone and personality across conversations and supports rich message roles to learn from example dialogue. This makes it well-suited for storytelling, AI companions, and conversational experiences where natural flow matters.

Chat

MiniMax M2.1

minimax/minimax-m2.1

MiniMax-M2.1 is an enhanced version of M2 with significantly improved multi-language programming capabilities and office scenario support. It features more concise responses, better instruction following, and matches or exceeds Claude Sonnet 4.5 on coding benchmarks while maintaining excellent agent/tool scaffolding generalization.

Chat

MiniMax M2.1 Highspeed

minimax/minimax-m2.1-highspeed

MiniMax M2.1 Highspeed is a latency-optimized variant of MiniMax M2.1, designed for production workloads where inference speed is critical. It delivers approximately 100 tokens per second—roughly 1.7x faster than the standard M2.1—while preserving the same MoE architecture (230B total, 10B active parameters) and output quality. It excels in live coding assistants, autonomous agent loops with chained tool calls, and interactive document analysis with streaming output. Compared to the standard M2.1, it reduces time-to-first-token under concurrent load and supports automatic prompt caching for multi-turn agent pipelines, making it the preferred choice when developer-facing latency matters.

Chat

MiniMax M2

minimax/minimax-m2

MiniMax-M2 is a compact MoE model (230B total, 10B active parameters) optimized for coding and agentic workflows with a 128K context window. It ranks #1 among open-source models for tool use and agent tasks, delivering elite performance in multi-step development workflows at 8% the cost of comparable models.

Video

MiniMax Hailuo 02

minimax/hailuo-02

MiniMax Hailuo 02 is a next-generation AI video model ranked #2 globally, featuring native 1080p output and advanced physics simulation for realistic motion including gravity, fluid dynamics, and complex movements like gymnastics. It uses Noise-aware Compute Redistribution (NCR) architecture for 2.5x improved efficiency, with 3x more parameters and 4x more training data than its predecessor. The model supports both text-to-video and image-to-video generation with clips up to 10 seconds.

Chat

MiniMax M1

minimax/minimax-m1

MiniMax-M1 is the world's first open-source hybrid-attention reasoning model, featuring a 1 million token context window and 80K reasoning output budget. It excels in software engineering, long-context tasks, and complex reasoning while being trained with an efficient CISPO reinforcement learning algorithm.

Video

MiniMax Video-01 Director

minimax/video-01-director

MiniMax Video-01 Director is an AI video generation model that specializes in creating HD videos with precise cinematic camera control. It supports 720p resolution at 25fps and generates clips up to 5 seconds, allowing users to specify camera movements like pans, zooms, and tracking shots through natural language or bracketed commands. The model significantly reduces movement randomness compared to standard video models, enabling more accurate and intentional storytelling.

Chat

MiniMax-01

minimax/minimax-01

MiniMax-01 is a 456B parameter foundation model (45.9B activated) using a hybrid Lightning Attention + MoE architecture, achieving top-tier performance on reasoning, math, and coding benchmarks. It supports up to 4 million tokens of context, making it especially strong for long-context tasks and AI agent applications.

Frequently Asked Questions

What is this MiniMax API about?

The MiniMax API gives you access to models for AI chat and video generation. Through Puter.js, you can start using MiniMax models instantly with zero setup or configuration.

Which MiniMax models can I use?

Puter.js supports a variety of MiniMax models, including MiniMax M3, MiniMax M2.7, MiniMax M2.7 Highspeed, and more. Find all AI models supported by Puter.js in the AI model list.

How much does it cost?

With the User-Pays model, users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

What is Puter.js?

Puter.js is a JavaScript library that provides access to AI, storage, and other cloud services directly from a single API. It handles authentication, infrastructure, and scaling so you can focus on building your app.

Does this work with React / Vue / Vanilla JS / Node / etc.?

Yes — the MiniMax API through Puter.js works with any JavaScript framework, Node.js, or plain HTML. Just include the library and start building. See the documentation for more details.

MiniMax API

List of MiniMax Models

MiniMax M3

MiniMax M2.7

MiniMax M2.7 Highspeed

MiniMax M2.5

MiniMax M2.5 Highspeed

MiniMax M2-her

MiniMax M2.1

MiniMax M2.1 Highspeed

MiniMax M2

MiniMax Hailuo 02

MiniMax M1

MiniMax Video-01 Director

MiniMax-01

Frequently Asked Questions

Related Resources

Free, Unlimited MiniMax API

Getting Started with Puter.js

Free, Unlimited AI API