Tutorials

Mistral API Pricing: Full Breakdown of Costs (Jun 2026)

On this page

This guide covers current Mistral API pricing across every model, how billing works, what inflates a bill, and how to run Mistral for free, with a companion walkthrough in our free Mistral API tutorial.

How much does the Mistral API cost?

Mistral's featured model, Mistral Medium 3.5, costs $1.5 per million input tokens and $7.5 per million output tokens. The cheapest general-purpose model, Mistral Small 4, costs $0.1 per million input and $0.3 per million output, and Ministral 3 3B runs $0.1 in and out for edge workloads.

Model Input (/1M) Output (/1M) Notes
Mistral Medium 3.5 $1.5 $7.5 Featured model, coding and enterprise
Mistral Large 3 $0.5 $1.5 Open-weight, general-purpose
Mistral Small 4 $0.1 $0.3 Open, multimodal, Apache 2.0
Magistral Medium $2 $5 Reasoning model
Magistral Small $0.5 $1.5 Reasoning model, lightweight
Ministral 3 3B $0.1 $0.1 Edge
Ministral 3 8B $0.15 $0.15 Edge
Codestral $0.3 $0.9 Coding, fill-in-the-middle
Devstral 2 $0.4 $2 Agentic coding

Prices are in USD. Mistral is a French company and the pricing page has a currency toggle for EUR and a country selector. Batch processing takes 50% off the per-token rates. Rates below are the standard, non-batch figures.

How Mistral API pricing works

Mistral bills per token. Input tokens are the text you send (your prompt, system message, and any prior turns you resend). Output tokens are the text the model returns. The two are priced separately, and output is usually the more expensive side. A token is a chunk of text, roughly four characters of English, so about 750 words per thousand tokens.

Beyond that base, a few Mistral-specific mechanics affect what you pay.

Reasoning tokens are billed as output

The Magistral models (Magistral Medium at $2 / $5, Magistral Small at $0.5 / $1.5) produce reasoning text before their final answer. That reasoning counts as output tokens at the output rate. A reasoning model can return several times the visible answer in billed tokens, so a Magistral call can cost more than its short reply suggests.

Batch processing takes 50% off

Mistral applies a 50% discount to batch jobs. You submit a set of requests for asynchronous processing instead of real-time calls, and the per-token rate is halved. This fits offline work like document processing, evaluation runs, and bulk generation. It does not fit interactive chat, where you need the response immediately.

Prompt caching cuts repeated input by 90%

Mistral bills cached prompt tokens at 10% of the standard input rate, a 90% discount on the cached portion. Caching applies when requests share the same prompt prefix, such as a fixed system prompt or a growing conversation. You opt in by passing a stable prompt_cache_key (a conversation, session, or workflow ID) and keeping the shared prefix identical across calls. It works in 64-token blocks, so prompts under 64 tokens never hit the cache, and a hit is not guaranteed even when the prefix matches. Track cached usage per model under Admin > Usage. This stacks with batch and model choice as a third lever on the input side of the bill.

What makes your bill higher than expected

The Medium-over-Large pricing trap

Because Mistral Medium 3.5 ($1.5 / $7.5) is featured first, it is easy to default to it. On output it costs five times Mistral Large 3 ($0.5 / $1.5). For an output-heavy workload, defaulting to Medium 3.5 by name can multiply the bill over Large 3 for work Large 3 handles.

Agent API tools are billed per call, on top of tokens

Mistral's Agent API adds built-in tools, each with its own fee separate from model tokens: web search at $30 per 1,000 calls, code execution at $30 per 1,000 calls, image generation at $100 per 1,000 images, and premium news at $50 per 1,000 calls. A single agent turn can fire several tool calls, and those add to the token cost for the same turn.

Documents, audio, and images use their own meters

Non-text work is not priced per token. OCR 3 is $2 per 1,000 pages, with annotations at $3 per 1,000 pages. Voxtral TTS is $0.016 per 1,000 characters, transcription is billed per audio minute (from $0.003), and the Libraries tool charges $1 per million tokens to index plus $0.01 per call. For document and media pipelines these meters often dominate the token cost.

The free tier may use your data for training

The free Experiment tier can use your API inputs to train Mistral's models by default. You can opt out in the console, and zero-retention is available on paid tiers. If you process anything sensitive on the free tier, set the opt-out first or move to a paid tier.

Region and currency

Pricing shows in USD or EUR via the toggle, with a country selector. Confirm the figures in your own currency and region before you budget, since the displayed numbers change with the selector.

How to reduce Mistral API costs

1. Pick the right model

Model choice moves the bill more than any other change. Start at Mistral Small 4 ($0.1 / $0.3) and only move up when output quality requires it. For general work that needs more capability, Mistral Large 3 ($0.5 / $1.5) sits below Medium 3.5 on both input and output, so try Large 3 before Medium 3.5. Reserve Medium 3.5 and the Magistral reasoning models for work that genuinely needs them.

2. Use batch for anything asynchronous

Batch processing is 50% off. Any workload that does not need an immediate response (summarization, classification, evaluation, bulk content) should run as a batch job rather than real-time calls. That halves the per-token cost with no model change.

3. Reuse prompt prefixes with caching

Cached input tokens cost 10% of the standard input rate. If your requests share a fixed prefix, such as a long system prompt or the running history of a conversation, pass a stable prompt_cache_key and keep that prefix identical so the repeated tokens bill at the 90%-off cached rate instead of full price. This helps most on chat and agent workloads that resend the same context every turn.

4. Control output length

Output is the expensive side, most sharply on Medium 3.5 where output ($7.5) is five times input ($1.5). Cap responses with max-token limits, ask for concise formats, and avoid prompting for restated context you already have. On reasoning models, the billed reasoning tokens count here too.

5. Self-host open weights at scale

If you run steady high volume, open-weight models on your own hardware remove per-token billing entirely. You take on infrastructure and operations, which pays off above the volume where API token cost exceeds your compute cost, and keeps data in your environment.

6. Scope your tool calls

Each web search, code execution, image, and news call carries a flat fee on top of tokens. Only enable the tools a task needs, and avoid agent loops that fire tool calls you will not use.

Can you use the Mistral API for free?

Puter.js: the User-Pays model

Puter.js is a JavaScript library that lets you add Mistral models to your app with no API key, no backend, and no bill to you as the developer. It runs on the User-Pays model: each user of your app covers their own AI usage through their Puter account, so your cost stays at zero no matter how many users you have.

<html>
<body>
  <script src="https://js.puter.com/v2/"></script>
  <script>
    puter.ai.chat("Explain quantum computing in simple terms", {
      model: "mistralai/mistral-medium-3-5"
    }).then(response => {
      document.body.innerHTML = response.message.content;
    });
  </script>
</body>
</html>

We ran the same workload we use across our pricing guides: 500 monthly users sending 30 messages each, averaging 1,000 input and 300 output tokens per message, for 15M input and 4.5M output tokens a month. On Mistral Medium 3.5 through the API, our calculation puts that at $22.50 for input and $33.75 for output, about $56.25 a month, growing linearly with your user base. On Mistral Large 3 the same volume is about $14.25, and on Mistral Small 4 about $2.85.

Through Puter.js the same app costs you $0 at any scale, because each user carries their own usage. At the Small 4 tier the dollar saving is small, but Puter.js also removes API key and billing management and stays free as you grow.

Mistral free Experiment tier

Mistral's developer platform has a free Experiment tier for prototyping. It needs no credit card (phone verification only), gives rate-limited access to the model lineup, and is meant for testing rather than production. Exact token limits are not published on the pricing page (verify). Note the training default above: opt out in the console if your inputs are sensitive.

Self-hosting open weights

Several Mistral models are open-weight under Apache 2.0 or similar licenses. Running them yourself is free of per-token API cost; you pay only for the hardware. This is the most durable free path for high-volume or data-sensitive work.

OpenRouter

Some Mistral models, including free variants, are routed through OpenRouter. Availability and free quotas there change often, so confirm the current listing before relying on it.

Real-world cost examples

We modeled three common workloads at standard (non-batch) rates to show where the money goes.

Customer support chatbot. We assumed 50,000 conversations a month, four turns each, about 1,500 input and 400 output tokens per turn, which is 300M input and 80M output tokens. Chat is interactive, so batch does not apply.

Model Monthly cost
Mistral Small 4 $54
Mistral Large 3 $270
Mistral Medium 3.5 $1,050

The spread is the whole point: the same chatbot is about 19 times more expensive on Medium 3.5 than on Small 4, driven mostly by the output rate.

Summarizing 100 PDFs. We took 100 documents at about 30 pages each, roughly 15,000 tokens of extracted text per document and a 600-token summary, for 1.5M input and 0.06M output tokens. On Mistral Small 4 the summarization is about $0.17, or roughly $0.08 as a batch job. On Mistral Large 3 it is about $0.84. If the PDFs are scanned and need OCR 3 first, that adds about $6 for 3,000 pages, which is larger than the summarization cost itself. For document work, the OCR meter, not the token rate, sets the bill.

Daily content generation. We modeled 50 pieces a day, 30 days, about 500 input and 1,500 output tokens each, for 0.75M input and 2.25M output tokens a month. On Mistral Small 4 that is about $0.75, on Mistral Large 3 about $3.75, and on Mistral Medium 3.5 about $18. Run it as a batch job and each figure halves. Because the work is output-heavy, Medium 3.5's $7.5 output rate is what drives its number up.

Complete Mistral API pricing table

Text and chat models, per million tokens:

Model Input Output
Mistral Medium 3.5 $1.5 $7.5
Mistral Large 3 $0.5 $1.5
Mistral Small 4 $0.1 $0.3
Magistral Medium $2 $5
Magistral Small $0.5 $1.5
Ministral 3 3B $0.1 $0.1
Ministral 3 8B $0.15 $0.15
Ministral 3 14B $0.2 $0.2
Mistral NeMo $0.15 $0.15
Mixtral 8x7B $0.7 $0.7
Mixtral 8x22B $2 $6
Devstral 2 $0.4 $2
Devstral Small 2 $0.1 $0.3
Codestral $0.3 $0.9

Documents and audio:

Service Price
OCR 3 (OCR) $2 / 1,000 pages
OCR 3 (annotations) $3 / 1,000 pages
Voxtral TTS $0.016 / 1,000 characters
Voxtral Mini Transcribe 2 $0.003 / audio minute
Voxtral Mini Transcribe Realtime $0.006 / audio minute
Voxtral Small $0.004 / audio minute, $0.10 / 1M text input, $0.40 / 1M text output

Embeddings, classifiers, and moderation:

Service Price
Mistral Embed $0.1 / 1M input
Codestral Embed $0.15 / 1M input
Mistral Moderation $0.1 / 1M input
Classifier 3B $0.10 / 1M input, $0.10 / 1M output, $1 / 1M training, $2 / month storage
Classifier 8B $0.04 / 1M input, $0.04 / 1M output, $1 / 1M training, $2 / month storage

Agent API tools, billed on top of model tokens:

Tool Price
Web search $30 / 1,000 calls
Code execution $30 / 1,000 calls
Image generation $100 / 1,000 images
Premium news $50 / 1,000 calls
Libraries (indexing) $1 / 1M tokens
Libraries (call) $0.01 / call
Data capture $0.04 / 1M tokens

The full, current catalog with every variant is on the official Mistral pricing page.

Conclusion

Mistral's featured model, Mistral Medium 3.5, costs $1.5 input and $7.5 output per million tokens, with Mistral Large 3 cheaper at $0.5 / $1.5 and Mistral Small 4 the budget option at $0.1 / $0.3.

The levers that move a Mistral bill:

  • Pick the model by task, and remember Large 3 is cheaper than Medium 3.5.
  • Run asynchronous work as batch jobs for 50% off.
  • Reuse prompt prefixes so cached input bills at 10% of the standard rate.
  • Control output length, since output is the expensive side.
  • Self-host open weights at high volume.
  • Scope Agent API tool calls, which carry flat per-call fees.

Pricing verified against the official Mistral pricing page. Rates change, so confirm current figures before you budget.

Free, Serverless AI and Cloud

Start creating powerful web applications with Puter.js in seconds!

Get Started Now

Read the Docs Try the Playground