Tutorials

Kimi API Pricing: Full Breakdown of Costs (Jun 2026)

On this page

This guide covers what the Kimi API costs across every model Moonshot AI currently sells, how the billing works, where bills run higher than expected, how to cut them, and how to use it for free.

How much does the Kimi API cost?

Kimi K2.6, Moonshot AI's flagship model, costs $0.95 per million input tokens and $4.00 per million output tokens. The cheapest current model, Kimi K2.5, runs $0.60 per million input and $3.00 per million output. Both prices are the standard (cache miss) rates; cached input is billed far lower, which the caching section below covers.

Model Input (cache miss) Input (cache hit) Output Context
Kimi K2.6 $0.95 $0.16 $4.00 256K
Kimi K2.7 Code $0.95 $0.19 $4.00 256K
Kimi K2.5 $0.60 $0.10 $3.00 256K

All rates are per 1M tokens and exclude tax, which is calculated at checkout based on your jurisdiction. A few caveats before you budget against these numbers:

  • The figures are for international billing in USD through api.moonshot.ai. Mainland China billing goes through a separate platform in RMB, and rates there differ.
  • Older Moonshot V1 models are still available at separate rates. They appear in the full pricing table near the end.

How Kimi API pricing works

The Kimi API bills per token. Input tokens are everything you send (your prompt, system instructions, conversation history, and any document text you pass in), and output tokens are everything the model generates back. A token is a chunk of text, roughly 3 to 4 characters of English on average, so a short word is one token and a long one can be several.

Automatic context caching

Every K2 model lists two input prices: a cache-miss rate and a cache-hit rate. When you send a request whose opening tokens match a recent request, those repeated tokens are billed at the cache-hit rate instead of the full rate. On K2.6 that drops input from $0.95 to $0.16 per million, about 83% off. K2.5 drops from $0.60 to $0.10. Caching is automatic, with no parameters to set and no separate cache to manage. It applies to the repeated prefix of a request, so a stable system prompt and a steady conversation history are what trigger it.

Thinking mode costs the same as standard output

Kimi's models run in thinking and non-thinking modes. Reasoning tokens the model generates while thinking are billed as ordinary output tokens at the same rate. There is no separate reasoning-token price and no surcharge for turning thinking on. The cost effect is indirect: thinking produces more output tokens, so a thinking response costs more than a terse one because it is longer, not because the rate changed.

Recharge tiers set rate limits, not prices

Kimi groups accounts into tiers (Tier0 through Tier5) based on cumulative recharge, from $1 up to $3,000. These tiers raise your concurrency, requests per minute, and token throughput. They do not change the per-token price. The price is flat at every tier. If you have seen "tiered pricing" on other providers and assumed volume discounts, that is not how Kimi's tiers work: spending more lifts your rate limits, not your unit cost.

Region and endpoint

International access uses api.moonshot.ai and bills in USD. Mainland China access uses a separate platform and bills in RMB, with its own rate card. The model is also a Chinese provider's, so if data residency is a requirement, that is a factor to weigh before sending production data through the hosted API. The open weights (covered in the free section) are one way around that.

What makes your bill higher than expected

Web search is billed twice

If you enable the built-in $web_search tool, each successful search call costs $0.005. Separately, the search results the tool returns are fed back into the model as input tokens on your next call, and you pay the normal input rate on those tokens. A single search can add several thousand input tokens to the following request. If a turn triggers a search but you stop without continuing, you pay only the $0.005 call fee and not the result tokens.

Cache misses

The low cache-hit rate only applies when the start of your request matches a recent one. Changing your system prompt, reordering history, or leaving a long gap between calls pushes you back to the cache-miss rate. A workload you budgeted at the cache-hit price can quietly run at six times that if the prefix is not stable. Designing prompts so the fixed part comes first keeps more of your input in the cheaper bracket.

Output tokens cost more than input

On K2.6, output is $4.00 per million against $0.95 for input, a bit more than four to one. Workloads that read a lot and write a little are cheap; workloads that generate long responses are where the bill concentrates. Long thinking traces land here too, since they count as output.

Reaching for K2.6 or K2.7 Code when K2.5 would do

K2.6 and K2.7 Code both cost $0.95 input and $4.00 output. K2.5 is $0.60 and $3.00. For tasks K2.5 handles well, defaulting to the newer models adds roughly 37% to input and 25% to output for no benefit. K2.7 Code is tuned for coding work specifically, so using it for general chat is paying its rate without using its strength.

Long context

The 256K context window means you can send very large inputs, and you pay the input rate on every token of them. Pasting a whole document set into each call, rather than retrieving only the relevant parts, turns context size into a recurring cost.

How to reduce Kimi API costs

1. Pick the smallest model that clears the bar

Model choice moves the bill more than anything else here. K2.5 at $0.60 / $3.00 is cheaper than K2.6 at $0.95 / $4.00 on every token. Test your task on K2.5 first and only move up if the output quality is not good enough. For coding agents, K2.7 Code earns its rate; for general text, it does not.

2. Keep your prompt prefix stable so caching kicks in

Put the fixed parts of your prompt (system instructions, few-shot examples, reference text) at the front and keep them byte-for-byte identical across calls. That lets the cache-hit rate apply to them, taking K2.6 input from $0.95 to $0.16 per million on the repeated portion. For a chatbot with a long system prompt, this is often the second-largest saving after model choice.

3. Move non-urgent jobs to the Batch API

The Batch API runs at 60% of the standard price, a 40% discount, in exchange for asynchronous completion within a time window. K2.6 batch is $0.57 input and $2.40 output; K2.5 batch is $0.36 and $1.80. For overnight jobs, bulk summarization, or dataset labeling where you do not need an immediate response, batch is the cheapest way to run a given model.

4. Cap and shape output

Because output costs roughly four times input, controlling response length pays off directly. Set a sensible max_tokens, ask for the format you actually need (a list or a field instead of an essay), and turn off thinking mode for tasks that do not need it so the model does not generate reasoning tokens you will not use.

5. Gate web search

Only attach the $web_search tool when a query needs fresh information, and remember each successful call is $0.005 plus the result tokens billed as input next turn. For questions the model can answer from its own knowledge, skipping the tool avoids both charges.

Can you use the Kimi API for free?

Puter.js: the User-Pays model

Puter.js is a JavaScript library that adds Kimi models to your app with no API key, no backend, and no bill to you as the developer. It runs on the User-Pays model: each user of your app covers their own AI usage through their own Puter account, so your cost stays at zero regardless of how many users you have.

<html>
<body>
  <script src="https://js.puter.com/v2/"></script>
  <script>
    puter.ai.chat("Explain quantum computing in simple terms", {
      model: "moonshotai/kimi-k2.6"
    }).then(response => {
      document.body.innerHTML = response.message.content;
    });
  </script>
</body>
</html>

We ran the workload we use across our pricing guides: 500 monthly users sending 30 messages each, averaging 1,000 input and 300 output tokens per message, for 15M input and 4.5M output tokens a month. On K2.6 through the API, our calculation puts that at $14.25 for input and $18.00 for output, about $32 a month, growing linearly as you add users. Through Puter.js the same app costs you $0 at any scale, because each user carries their own usage. Kimi's rates are low enough that the dollar saving at this volume is modest, but Puter.js also removes API key and billing management, and it stays at zero as the user base grows.

Open weights (self-host)

The K2 weights are available on Hugging Face under a Modified MIT license. If you have the hardware or pay for cloud compute yourself, you can run the model with no per-token cost and full control over where your data goes. The trade-off is that you take on the deployment and operations work.

Third-party free routes

Kimi models are also hosted by other providers. OpenRouter lists Kimi models and sometimes offers a free routing endpoint, and Cloudflare Workers AI has carried Kimi on its free API tier. Availability and limits on these change, so confirm the current terms before depending on them.

Recharge voucher

This is a credit rather than free access, but it is worth knowing: a minimum $1 recharge activates an account, and when cumulative recharge reaches $5 you receive a $5 voucher. Vouchers do not count toward the recharge total that sets your rate-limit tier.

Real-world cost examples

To show where the money actually goes, we modeled three common workloads at the standard (cache-miss) rates. Caching, batch, and output limits would lower most of these further.

Customer support chatbot

We modeled a chatbot handling 100,000 messages a month, averaging 1,000 input and 300 output tokens per message, for 100M input and 30M output tokens.

Model Input cost Output cost Monthly total
Kimi K2.5 $60.00 $90.00 $150.00
Kimi K2.6 $95.00 $120.00 $215.00
Kimi K2.7 Code $95.00 $120.00 $215.00

K2.6 and K2.7 Code come out the same here because their cache-miss input and output rates match; they differ only on the cache-hit rate. A stable system prompt that lands most input in the cache-hit bracket would cut the input column substantially, since K2.6 cached input is $0.16 per million rather than $0.95.

Summarizing 100 PDFs

We calculated a batch of 100 PDFs at roughly 10,000 tokens of extracted text each, with a 500-token summary per document, for 1M input and 50,000 output tokens. On K2.6 that is $0.95 for input and $0.20 for output, about $1.15 total. On K2.5 it is $0.60 and $0.15, about $0.75. File extraction itself is currently free on the platform, so you pay only for the extracted text once it is passed to the model as input.

Daily content generation

We modeled generating 20 pieces a day, each with a 500-token prompt and 1,500 tokens of output, run every day for a month, for 0.3M input and 0.9M output tokens. On K2.6 that comes to about $0.29 for input and $3.60 for output, roughly $3.89 a month. On K2.5 it is about $0.18 and $2.70, roughly $2.88. Output dominates here, so this is the workload where capping length and using K2.5 helps most.

Complete Kimi API pricing table

K2 family, per 1M tokens, USD, tax excluded:

Model Input (cache miss) Input (cache hit) Output Context
Kimi K2.6 $0.95 $0.16 $4.00 256K
Kimi K2.7 Code $0.95 $0.19 $4.00 256K
Kimi K2.5 $0.60 $0.10 $3.00 256K

Moonshot V1 legacy series, per 1M tokens:

Model Input Output Context
moonshot-v1-8k $0.20 $2.00 8K
moonshot-v1-32k $1.00 $3.00 32K
moonshot-v1-128k $2.00 $5.00 128K

Vision-preview variants of the three V1 models exist at the same rates as their text counterparts.

Batch API, 60% of standard price, per 1M tokens:

Model Input (cache miss) Input (cache hit) Output
Kimi K2.6 (Batch) $0.57 $0.10 $2.40
Kimi K2.5 (Batch) $0.36 $0.06 $1.80

Tools:

Tool Unit Price
Web search Per successful call $0.005

For the full catalog, including any models added since this was written, see the official pricing pages on the Kimi platform.

Conclusion

Kimi's flagship, K2.6, costs $0.95 per million input tokens and $4.00 per million output tokens, with K2.5 cheaper at $0.60 and $3.00 and K2.7 Code matching K2.6 for coding work. The main levers on your bill:

  • Model choice: K2.5 over K2.6 or K2.7 Code wherever quality allows.
  • Caching: keep a stable prompt prefix to hit the much lower cache-hit input rate.
  • Batch API: 40% off for work that can run asynchronously.
  • Output control: output costs about four times input, so cap and shape it.
  • Web search: gate it, since it bills both a per-call fee and result tokens.

All API rates above were verified against Moonshot AI's official pricing pages. Moonshot adjusts rates periodically, so confirm current numbers before you commit.

Free, Serverless AI and Cloud

Start creating powerful web applications with Puter.js in seconds!

Get Started Now

Read the Docs Try the Playground