Tutorials

DeepSeek API Pricing: Full Breakdown of Costs (Jun 2026)

On this page

In this guide, you'll learn what every current DeepSeek model costs, how the billing works (including the things that can make your bill higher than expected), the most effective ways to cut costs, and the options for using DeepSeek models for free.

How much does the DeepSeek API cost?

DeepSeek's main model, DeepSeek V4 Flash, costs $0.14 per 1 million input tokens and $0.28 per 1 million output tokens. The higher-capability model, DeepSeek V4 Pro, costs $1.74 per 1 million input tokens and $3.48 per 1 million output tokens. V4 Flash is among the cheapest frontier-class APIs available, roughly 35 to 100 times cheaper per token than GPT-5.5 or Claude Opus 4.8 at comparable context lengths.

Here's the full current lineup, which is just two models:

Model Best for Cache-miss input Cache-hit input Output (per 1M)
DeepSeek V4 Flash Default: chat, extraction, coding help, high-volume work $0.14 $0.0028 $0.28
DeepSeek V4 Pro Complex reasoning, multi-step analysis, hard coding $1.74 $0.145 $3.48

Two things to know before you budget. DeepSeek is text-only: there's no image, video, or audio generation, and no hosted tools like web search, so if you need those, this isn't the API for them.

All prices in this guide come from DeepSeek's official pricing documentation. Next, let's look at how the pricing works.

How DeepSeek API pricing works

The DeepSeek API uses pay-as-you-go, per-token billing: you're charged for the tokens you send (input) and the tokens the model generates (output). A token is roughly 4 characters or 0.75 English words; we explain tokens in more detail in our OpenAI pricing guide, and the mechanics are the same here.

Thinking mode is on by default and bills as output

Both V4 models run in thinking mode by default, where the model generates internal reasoning before its answer. Those reasoning tokens bill at the output rate even though you don't see them in the final response. Because thinking is the default, a request you expected to be cheap can generate far more output tokens than the visible answer suggests. You can switch to non-thinking mode for tasks that don't need step-by-step reasoning, which cuts output token counts directly.

Context caching is automatic and has no extra cost

Every DeepSeek request automatically uses context caching. When the start of your prompt matches a recent request (a reused system prompt, tool definitions, a shared document), the matching tokens bill at the cache-hit rate instead of the full input rate. On V4 Flash that drops input from $0.14 to $0.0028 per 1M tokens, a 98% reduction.

Unlike some providers, there's nothing to configure and nothing extra to pay: no cache_control parameter, no separate cache-write charge, and no hourly storage fee. Caching is pure savings here. The one condition, per DeepSeek's documentation, is that cache hits are best-effort and depend on an exact matching prefix, so keeping your system prompt byte-for-byte identical across calls is what raises your hit rate.

Long context does not cost extra

Both V4 models include a 1M-token context window and up to 384K tokens of output at the standard per-token rates. There's no separate, more expensive long-context price schedule, so a large-context request bills at the same rate per token as a small one.

What makes your bill higher than expected

Thinking tokens, because thinking is the default

The most common reason a DeepSeek bill runs higher than expected is thinking mode being left on for simple tasks. The model can produce thousands of reasoning tokens, billed as output, before a short answer. Disabling thinking for routine work is the first thing to check.

Output-heavy generation

Output costs twice the cache-miss input rate on both models, and the 384K maximum output combined with default thinking means a runaway response is the main cost risk. Long generations, verbose agents, and uncapped responses are where bills grow.

How to reduce DeepSeek API costs

The methods below are ordered by impact, with the most effective first.

1. Use V4 Flash for almost everything

Flash costs roughly one-twelfth of Pro on input and output, and DeepSeek's own positioning is that Flash matches Pro on simpler agent tasks. Use Flash for chat, extraction, classification, summarization, and lightweight coding, which covers most production traffic. Reserve V4 Pro for complex coding, multi-step analysis, and long-context work where quality matters more than the lowest token price.

2. Turn off thinking mode for routine tasks

Since thinking tokens bill as output and thinking is on by default, switching to non-thinking mode for tasks that don't need deep reasoning is an immediate saving. Keep thinking on only for the prompts that justify it.

3. Structure prompts for cache hits

Cache hits cost a small fraction of the cache-miss rate—2% on V4 Flash and about 8% on V4 Pro—and caching is automatic with no write cost to weigh, so there's no downside to optimizing for it. Put your static content (system prompt, instructions, shared context) at the start of the prompt and the variable content at the end, and keep the prefix identical across calls. Small differences in the prefix break the cache.

4. Cap output length

Output is the expensive side at twice the input rate, and thinking adds to it. Set a maximum output limit and ask for concise or structured responses, especially on high-volume workloads.

Can you use the DeepSeek API for free?

There are three ways to use DeepSeek models without paying, and one of them scales to any number of users.

Puter.js: the User-Pays model

Puter.js is a JavaScript library that lets you add DeepSeek models to your app with no API key, no backend, and no bill to you as the developer. It works on the User-Pays model: each user of your app covers their own AI usage through their Puter account, so your costs stay at zero no matter how many users you have.

<html>
<body>
  <script src="https://js.puter.com/v2/"></script>
  <script>
    puter.ai.chat("Explain quantum computing in simple terms", {
      model: "deepseek/deepseek-v4-flash"
    }).then(response => {
      document.body.innerHTML = response.message.content;
    });
  </script>
</body>
</html>

We ran a simple workload: 500 monthly users sending 30 messages each, averaging 1,000 input and 300 output tokens per message, for 15M input and 4.5M output tokens a month. On V4 Flash through the DeepSeek API, our calculation puts that at about $2.10 for input and $1.26 for output, roughly $3.40 a month. The bill is small because DeepSeek is cheap, but it still grows with usage: the same app at 50,000 users runs about $340 a month. Through Puter.js it stays $0 at any scale, with no API key to manage and no rate limits to design around, because each user carries their own usage.

Signup grant

New DeepSeek accounts receive a granted balance to test the API, and the granted balance is spent before any topped-up balance. The exact amount and validity window are set by DeepSeek and change, so check your balance on the platform after signing up. Once it's used, billing switches to standard pay-per-token rates.

OpenRouter's free endpoints

OpenRouter has long listed free variants of DeepSeek models (tagged :free), which makes it one of the easier ways to try DeepSeek without an account balance. Free usage is capped at 50 requests per day and 20 per minute, rising to 1,000 per day once you've purchased at least $10 in credits. It's built for testing, not continuous production use, and failed requests still count against your daily quota.

Real-world cost examples

Per-million-token prices are hard to apply directly, so here's what we calculated for a few real workloads, using the same method each time: estimate tokens per request, multiply by volume, then by the per-million rate.

Customer support chatbot. We modeled 1,000 conversations a month, averaging 8 messages each, with about 1,200 input tokens (system prompt plus history) and 250 output tokens per message. That's 9.6M input and 2M output tokens monthly, which we priced as:

Model Monthly cost
V4 Flash ~$1.90
V4 Pro ~$23.70

With automatic caching on the repeated system prompt and history, the V4 Flash figure drops below $1 a month, since cache-hit input is nearly free.

Summarizing 100 PDFs. At ~20,000 tokens per document with 500-token summaries, that's 2M input and 50K output tokens. We calculate about $0.29 on V4 Flash. Note that DeepSeek reads text, not PDF files directly, so you extract the text first.

Daily content generation. For 30 articles a month on V4 Flash, with 2,000-token prompts and roughly 4,000 output tokens each (including thinking tokens), we estimate about $0.04 a month. Low-volume generation on DeepSeek is effectively free.

Complete DeepSeek API pricing table

All prices are per 1M tokens.

Model Cache-miss input Cache-hit input Output Context Max output
DeepSeek V4 Flash $0.14 $0.0028 $0.28 1M 384K
DeepSeek V4 Pro $1.74 $0.145 $3.48 1M 384K

Both models support thinking (default) and non-thinking modes, JSON output, and tool calls, and both are served through an OpenAI-compatible endpoint and an Anthropic-compatible endpoint.

Conclusion

DeepSeek API pricing in 2026 is the lowest among frontier-class providers: V4 Flash at $0.14 input and $0.28 output per million tokens, and V4 Pro at $1.74/$3.48. The structure rewards a few simple choices:

  • Use V4 Flash for almost everything and reserve Pro for the hardest tasks.
  • Turn off thinking mode for routine work, since it's on by default and bills as output.
  • Keep your prompt prefix identical to get the automatic cache-hit rate, which is nearly free with no write or storage cost.
  • Cap output length, the most expensive side of the bill.

Prices here were verified against DeepSeek's official pricing page. DeepSeek adjusts rates periodically and some current rates may be promotional, so always confirm before committing to a budget.

Free, Serverless AI and Cloud

Start creating powerful web applications with Puter.js in seconds!

Get Started Now

Read the Docs Try the Playground