Cohere API Pricing: Full Breakdown of Costs (Jun 2026)
On this page
This guide covers current Cohere API pricing across every model, what drives your bill, how to bring it down, and how to call Cohere for free (see also our free Cohere API tutorial).
How much does the Cohere API cost?
Cohere's main paid model, Command R, costs $0.15 per 1M input tokens and $0.60 per 1M output tokens. The cheapest model, Command R7B, is $0.0375 input and $0.15 output per 1M tokens. Cohere's most capable model, Command A+, is open-source under an Apache 2.0 license: there is no per-token API charge, but you run it on your own hardware, so the cost is your infrastructure rather than a token rate.
| Model | Input / 1M | Output / 1M | Notes |
|---|---|---|---|
| Command R | $0.15 | $0.60 | 128K context, 4K max output |
| Command R7B | $0.0375 | $0.15 | 128K context, 4K max output |
| Command A+ | $0 | $0 | Open-weight (Apache 2.0); self-hosted |
| Embed 4 | $0.12 (text) | — | $0.47 / 1M for image inputs |
| Rerank 4 Fast | $2.00 / 1K searches | — | Per search, not per token |
| Rerank 4 Pro | $2.50 / 1K searches | — | Per search, not per token |
A few caveats before you budget. Production billing requires a Production API key; the free Trial key is rate limited and not permitted for production use. Older models (Command, Command R 03-2024, Command R+) still have published per-token rates, but Cohere lists them as existing-customer pricing only. Rerank is billed per search rather than per token, which changes how you estimate retrieval-heavy workloads (covered below).
Cohere API vs North and Compass
If you landed on Cohere's site looking for prices and mostly saw "contact sales," that is because Cohere sells two separate enterprise products alongside the developer API. North is an enterprise AI platform for workplace productivity and agents. Compass is an enterprise search and discovery system. Both are custom-priced and aimed at organizations, not individual developers.
This article is about the developer API: the per-token Command models, plus Embed and Rerank, that you call with an API key and pay for as you go. North and Compass pricing is quoted directly by Cohere's sales team.
How Cohere API pricing works
Generative models bill per token, split into input tokens (what you send) and output tokens (what the model generates). A token is roughly three-quarters of a word for typical English text, and closer to one token per word for simple text.
Output tokens cost more than input tokens. On Command R the gap is 4x ($0.60 vs $0.15), so the length of the model's replies usually moves your bill more than the length of your prompts.
Embed billing: per input token, with a higher image rate
Embed 4 charges $0.12 per 1M tokens for text and $0.47 per 1M tokens for image inputs. It is multimodal and handles text, images, and mixed documents such as PDFs across 100+ languages. There are no output token charges; you pay for what you send in to be embedded. Image inputs cost about 4x the text rate, so a multimodal index of scanned pages or screenshots costs more than the same content as plain text.
Rerank billing: per search, not per token
Rerank is priced by the search, not the token. Rerank 4 Fast is $2.00 per 1,000 searches and Rerank 4 Pro is $2.50 per 1,000 searches. One search is defined as a single query with up to 100 documents to rank. Any document longer than 500 tokens (including the query length) is automatically split into chunks, and each chunk counts as a separate document toward the 100-document limit. Long documents can therefore push a single ranking call into more than one billed search.
When you get billed
Cohere bills pay-as-you-go on Production keys. Your invoice is issued at the end of each calendar month, or whenever your outstanding balance reaches $250, whichever comes first.
What makes your bill higher than expected
Output tokens cost 4x input on Command R
Input is cheap; output is the expensive side. A prompt-heavy, short-answer task (classification, extraction, routing) is inexpensive, while long generated answers cost four times as much per token. Capping output length is the most direct control you have on Command R.
Rerank cost can exceed generation cost
In a retrieval pipeline that reranks on every query, the per-search Rerank charge can be larger than what you spend generating the answer. At $2.00 per 1,000 searches, 100,000 reranked queries a month is $200, which can outweigh a Command R generation bill for the same traffic. We work through this in the chatbot example below.
Rerank chunking on long documents
Because documents over 500 tokens are split into chunks that each count toward the 100-document cap, reranking long passages costs more per call than reranking short snippets. If you feed full articles into Rerank, you pay for the chunks, not the document count you started with.
Embed image inputs cost about 4x text
If you embed PDFs or images through Embed 4, those inputs bill at $0.47 per 1M tokens rather than the $0.12 text rate. A multimodal corpus costs more to index than a text-only one of the same size.
How to reduce Cohere API costs
1. Match the model to the task
This is the largest lever. Command R7B at $0.0375 / $0.15 is one quarter the price of Command R at $0.15 / $0.60. For classification, routing, extraction, and high-volume light summarization, R7B is often enough, and you only move up to Command R where you need the extra quality. For steady, high-volume workloads, self-hosting Command A+ removes per-token cost entirely in exchange for fixed GPU cost.
2. Control output length
Output tokens cost 4x input on Command R, so set sensible max output tokens and prompt for concise responses. Trimming a 600-token answer to 300 tokens halves the expensive half of the bill.
3. Right-size Rerank usage
Rerank is billed per search and per document chunk. Pass fewer candidate documents (retrieve a tighter top-K before reranking), keep documents under 500 tokens where you can to avoid chunk multiplication, and skip reranking on queries that do not need it. Rerank 4 Fast at $2.00 per 1K is cheaper than Rerank 4 Pro at $2.50 per 1K when latency and cost matter more than peak relevance.
4. Trim retrieved context before generation
In RAG, every retrieved passage you stuff into the Command R prompt is billed as input. Reranking first and passing only the top few results keeps input token counts down on the generation step.
5. Keep Embed inputs as text where possible
Embed image inputs cost about 4x text. If you can extract text from a document rather than embedding it as an image, do so and pay the $0.12 rate instead of $0.47.
6. Batch offline work
For workloads without latency requirements (overnight summarization, bulk classification), batching reduces overhead. Confirm current batch terms in your Cohere dashboard before relying on a specific discount.
Can you use the Cohere API for free?
Puter.js: the User-Pays model
Puter.js is a JavaScript library that lets you add Cohere models to your app with no API key, no backend, and no bill to you as the developer. It works on the User-Pays model: each user of your app covers their own AI usage through their Puter account, so your costs stay at zero no matter how many users you have.
<html>
<body>
<script src="https://js.puter.com/v2/"></script>
<script>
puter.ai.chat("Explain quantum computing in simple terms", {
model: "cohere/command-r-08-2024"
}).then(response => {
document.body.innerHTML = response.message.content;
});
</script>
</body>
</html>
We ran the same workload we use across our pricing guides: 500 monthly users sending 30 messages each, averaging 1,000 input and 300 output tokens per message, for 15M input and 4.5M output tokens a month. On Command R through the API, our calculation puts that at $2.25 for input and $2.70 for output, about $4.95 a month, growing linearly with your user base. Through Puter.js the same app costs you $0 at any scale, because each user carries their own usage.
Cohere's per-token rates are low, so the dollar figure here is small. The value of the User-Pays model on Cohere is less about the few dollars and more about shipping without an API key, without a backend to hold that key, and without managing per-user rate limits yourself. It also stays at $0 to you whether you have 500 users or 500,000.
Command A+ and North Mini Code (open-weight)
Cohere publishes Command A+ and North Mini Code (an agentic coding model) as open-source under Apache 2.0, free to download and run. There is no API charge, but you provide the hardware. This is the free path if you have GPU capacity and want full control over deployment.
Cohere Trial API key
Every Cohere account starts with a Trial API key that is free to use. It is rate limited and explicitly not permitted for production or commercial use, so it suits prototyping and evaluation rather than a live app.
OpenRouter
Cohere models are also reachable through aggregators such as OpenRouter, where you can sometimes find a Cohere model offered on a free tier. Availability and rate limits change, and most models there are still billed per token at rates that can differ from Cohere's direct pricing, so check before relying on it.
Real-world cost examples
Customer support chatbot
We modeled a support bot handling 100,000 messages a month, each with about 1,500 input tokens (system prompt, retrieved context, and history) and 350 output tokens.
| Model | Input cost | Output cost | Monthly total |
|---|---|---|---|
| Command R7B | $5.63 | $5.25 | ~$10.88 |
| Command R | $22.50 | $21.00 | ~$43.50 |
That is generation only. If the bot reranks retrieved passages on every message, we add 100,000 Rerank searches. At $2.00 per 1,000 searches (Rerank 4 Fast) that is $200 a month, plus a small Embed charge for the queries. In this setup the retrieval layer costs more than generating the answers on Command R, which is worth checking before you assume generation is your main expense.
Summarizing 100 PDFs
We calculated a one-time job of 100 PDFs averaging 30,000 tokens each (3M input tokens) producing 800-token summaries (80,000 output tokens). On Command R that is about $0.45 for input and $0.05 for output, roughly $0.50 total. On Command R7B it is about $0.12. If you also embed the documents to index them, Embed 4 adds about $0.36 at the text rate, or more if the PDFs are processed as images.
Daily content generation
We modeled a content workflow producing 50 pieces a day (1,500 a month), each from a 500-token brief and generating 1,200 tokens. That is 0.75M input and 1.8M output tokens a month. On Command R that comes to about $0.11 for input and $1.08 for output, roughly $1.19 a month. On Command R7B it is about $0.30 a month.
Complete Cohere API pricing table
Generative models:
| Model | Input / 1M | Output / 1M | Context |
|---|---|---|---|
| Command R | $0.15 | $0.60 | 128K |
| Command R7B | $0.0375 | $0.15 | 128K |
| Command A+ | $0 (open-weight) | $0 (open-weight) | — |
| Aya Expanse (8B, 32B) | $0.50 | $1.50 | — |
Retrieval models:
| Model | Rate | Unit |
|---|---|---|
| Embed 4 (text) | $0.12 | per 1M tokens |
| Embed 4 (image) | $0.47 | per 1M tokens |
| Rerank 4 Fast | $2.00 | per 1K searches |
| Rerank 4 Pro | $2.50 | per 1K searches |
Other:
| Item | Rate |
|---|---|
| Transcribe | from $3.75 / hour / instance (via Model Vault) |
| Model Vault (dedicated) | $4.00–$10.00 / hour / instance ($2,500–$6,500 / month) |
Existing-customer (legacy) rates: Command $1.00 / $2.00, Command-light $0.30 / $0.60, Command R 03-2024 $0.50 / $1.50, Command R+ 04-2024 $3.00 / $15.00, Command R+ 08-2024 $2.50 / $10.00 per 1M tokens. For Embed and Rerank model variants beyond those listed, and the latest catalog, see Cohere's official pricing page.
Conclusion
Cohere's practical paid model, Command R, costs $0.15 input and $0.60 output per 1M tokens, with Command R7B four times cheaper and the top Command A+ model free to run on your own hardware. The retrieval stack (Embed at $0.12 per 1M, Rerank at $2.00–$2.50 per 1K searches) is where retrieval-heavy apps spend the most.
The main cost levers:
- Match the model to the task; Command R7B handles most light work at a quarter of Command R's price.
- Cap output length, since output costs 4x input on Command R.
- Right-size Rerank: fewer documents, shorter chunks, skip it where it adds nothing.
- Trim retrieved context before generation.
- Keep Embed inputs as text rather than images where possible.
Pricing verified against cohere.com/pricing. Current per-token rates are shown in the tabbed sections of that page; confirm before relying on them for budgeting.
Related
Free, Serverless AI and Cloud
Start creating powerful web applications with Puter.js in seconds!
Get Started Now