MiniMax: MiniMax M3

Q: Is MiniMax M3 free?

Yes, it is free if you're using it through Puter.js . With the User-Pays Model , you can add MiniMax M3 to your app at no cost — your users pay for their own AI usage directly, making it completely free for you as a developer.

minimax/minimax-m3

Access MiniMax M3 from MiniMax using Puter.js AI API.

Get Started

// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';

puter.ai.chat("Explain quantum computing in simple terms", {
    model: "minimax/minimax-m3"
}).then(response => {
    document.body.innerHTML = response.message.content;
});

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        puter.ai.chat("Explain quantum computing in simple terms", {
            model: "minimax/minimax-m3"
        }).then(response => {
            document.body.innerHTML = response.message.content;
        });
    </script>
</body>
</html>

# pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="https://api.puter.com/puterai/openai/v1/",
    api_key="YOUR_PUTER_AUTH_TOKEN",
)

response = client.chat.completions.create(
    model="minimax/minimax-m3",
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms"}
    ],
)

print(response.choices[0].message.content)

curl https://api.puter.com/puterai/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_PUTER_AUTH_TOKEN" \
  -d '{
    "model": "minimax/minimax-m3",
    "messages": [
      {"role": "user", "content": "Explain quantum computing in simple terms"}
    ]
  }'

Model Card

MiniMax M3 is a frontier-level multimodal language model built for long-horizon coding, agentic workflows, and complex reasoning. It introduces MiniMax Sparse Attention (MSA), which delivers 15x faster decoding and 9x faster prefill at 1M-token context compared to the prior generation.

M3 scores 59.0% on SWE-Bench Pro — surpassing GPT-5.5 and Gemini 3.1 Pro — and achieves the highest score on Claw-Eval (74.5%) for autonomous agent tasks. It accepts text, image, and video inputs natively and supports tool calling through standard MCP scaffolding.

Ideal for developers building coding agents, long-document pipelines, and multi-step automation that require sustained, multi-hour autonomous execution.

Context Window 1M

tokens

Max Output 512K

tokens

Input Cost $0.3

per million tokens

Output Cost $1.2

per million tokens

Input text, image, video

modalities

Tool Use Yes

Release Date May 31, 2026

Output Speed 94

tokens / sec

Latency 1.41s

time to first token

Model Playground

Try MiniMax M3 instantly in your browser.
This playground uses the Puter.js AI API — no API keys or setup required.

Chat minimax/minimax-m3

Chat with MiniMax M3

Benchmarks

How MiniMax M3 performs on standard evaluations.

Artificial Analysis

Intelligence Index

44.4

Better than 94% of tracked models

Artificial Analysis

Coding Index

58.6

Better than 77% of tracked models

Benchmark	Score
GPQA Diamond Graduate-level science Q&A	92.9%
Humanity's Last Exam Cross-domain reasoning	37.1%
SciCode Scientific programming	45.4%
IFBench Instruction following	82.9%
LCR Long-context reasoning	74.0%
Terminal-Bench Hard Agentic terminal tasks	42.4%
τ²-Bench Tool use / agents	88.9%

Scores sourced from Artificial Analysis.

Find other MiniMax models →

Chat

MiniMax M2.7

MiniMax M2.7 is a proprietary reasoning LLM from Chinese AI startup MiniMax, released on March 18, 2026, notable for being one of the first commercial models to actively participate in its own training through autonomous self-evolution loops. It excels at agentic coding workflows with a 56.2% score on SWE-Pro and strong performance in office productivity tasks, scoring the highest ELO (1495) on GDPval-AA among open-source-tier models. It targets developers building complex agent systems and automated workflows.

Chat

MiniMax M2.7 Highspeed

MiniMax M2.7 Highspeed is a high-throughput, inference-optimized variant of MiniMax M2.7, delivering approximately 100 tokens per second — roughly 66% faster than the standard version. It shares the same model weights and MoE architecture as M2.7, so output quality and reasoning capability are identical; the speed advantage comes entirely from inference-layer routing and batching optimizations. It supports text and image inputs with a 204K context window and features automatic prompt caching and parallel tool calling. Best suited for live coding assistants, autonomous agent pipelines, and interactive workflows where low latency and high throughput matter.

Chat

MiniMax M2.5

MiniMax M2.5 is a 230B-parameter Mixture-of-Experts model (10B active) from Shanghai-based MiniMax, designed for real-world productivity with state-of-the-art performance in coding (80.2% SWE-Bench Verified), agentic tool use, and search tasks. It rivals top models from Anthropic and OpenAI while costing 1/10th to 1/20th the price, positioning itself as frontier intelligence 'too cheap to meter.' The model excels at full-stack development, office work (Word, Excel, PowerPoint), and autonomous agent workflows.

Frequently Asked Questions

How do I use MiniMax M3?

You can access MiniMax M3 by MiniMax through Puter.js AI API. Include the library in your web app or Node.js project and start making calls with just a few lines of JavaScript — no backend and no configuration required. You can also use it with Python or cURL via Puter's OpenAI-compatible API.

Is MiniMax M3 free?

Yes, it is free if you're using it through Puter.js. With the User-Pays Model, you can add MiniMax M3 to your app at no cost — your users pay for their own AI usage directly, making it completely free for you as a developer.

What is the pricing for MiniMax M3?

MiniMax M3 costs $0.3 per 1M input tokens and $1.2 per 1M output tokens.

	Price per 1M tokens
Input	$0.3
Output	$1.2

Who created MiniMax M3?

MiniMax M3 was created by MiniMax and released on May 31, 2026.

What is the context window of MiniMax M3?

MiniMax M3 supports a context window of 1M tokens. For reference, that is roughly equivalent to 2,097 pages of text.

What is the max output length of MiniMax M3?

MiniMax M3 can generate up to 512K tokens in a single response.

What types of input can MiniMax M3 process?

MiniMax M3 accepts the following input types: text, image, video. It produces: text.

Does MiniMax M3 support tool use (function calling)?

Yes, MiniMax M3 supports tool use (function calling), allowing it to interact with external tools, APIs, and data sources as part of its response flow.

How does MiniMax M3 perform on benchmarks?

MiniMax M3 scores 44.4 on the Artificial Analysis Intelligence Index, outperforming 94% of tracked models. On coding, it scores 58.6 (outperforms 77% of models).

Does it work with React / Vue / Vanilla JS / Node / etc.?

Yes — the MiniMax M3 API works with any JavaScript framework, Node.js, or plain HTML through Puter.js. Just include the library and start building. See the documentation for more details.

Get started with Puter.js

Add MiniMax M3 to your app without worrying about API keys or setup.

Read the Docs View Tutorials