Qwen API Pricing: Full Breakdown of Costs (Jun 2026)
On this page
In this guide, you'll learn what every current Qwen model costs, how the billing works (including the region and tier rules that can make your bill higher than expected), the most effective ways to cut costs, and the options for using Qwen models for free.
How much does the Qwen API cost?
Alibaba's current flagship, Qwen3.7 Max, costs $1.25 per 1 million input tokens and $3.75 per 1 million output tokens. The most cost-effective model in the current generation, Qwen3.6 Flash, costs $0.19 per 1 million input tokens and $1.13 per 1 million output tokens.
Here's the quick view of the models most people ask about:
| Model | Best for | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|---|
| Qwen3.7 Max | Flagship: long-horizon reasoning and agentic work, 1M context | $1.25 | $3.75 |
| Qwen3.7 Plus | Multimodal agent: vision, video, and agentic pipelines | $0.32 | $1.28 |
| Qwen3.6 Plus | Balanced production workloads | $0.50 | $3.00 |
| Qwen3.6 Flash | High-volume simple tasks | $0.19 | $1.13 |
| Qwen3 Coder Plus | Agentic coding | $1.00 | $5.00 |
| Qwen3-VL Plus | Vision (image and document understanding) | $0.20 | $1.60 |
| Qwen Image 2.0 | Image generation | — | $0.04 per image |
Two things shape every number above. Qwen's prices depend on the region you deploy in: the rates shown are for the International (Singapore) endpoint, and the Chinese Mainland (Beijing) endpoint is 60 to 70% cheaper. And some models use tiered pricing that rises with the length of each request, so for those models the headline rate is for short prompts, not long ones.
All prices in this guide come from Alibaba Cloud's official Model Studio pricing. Next, let's look at how the pricing works.
Three ways to get Qwen
Qwen is available in more forms than most model families, and the pricing differs for each. The consumer chat app at chat.qwen.ai is free. The developer API, billed per token through Alibaba Cloud Model Studio (the platform formerly called DashScope), is what this article covers. And because most Qwen models are open-weight under the Apache 2.0 license, you can download and run them on your own hardware at no API cost.
How Qwen API pricing works
The Qwen API uses pay-as-you-go, per-token billing: you're charged for the tokens you send (input) and the tokens the model generates (output). A token is roughly 4 characters or 0.75 English words.
Three structural rules make Qwen pricing different from the other major APIs.
Region determines the price
Qwen models are served from several deployment regions, and the same model costs different amounts in each. The International mode (Singapore endpoint) is the default for most non-China developers and is the only region with a free quota. The Chinese Mainland mode (Beijing) is 60 to 70% cheaper but stores data in China and serves users there. The Global mode (US Virginia or Germany Frankfurt) sits in between. Pick the region before you estimate costs, because the gap between them is large.
Pricing is tiered by request length
Some Qwen models charge more per token as the request grows. Qwen3 Coder Plus, for example, bills $1.00 input up to 32K tokens, $1.80 up to 128K, $3.00 up to 256K, and $6.00 above 256K. The tier is set by the input tokens in each request, and the whole request bills at that tier's rate. A request that crosses a threshold costs more on every token, not just the tokens past the line. The newest general-purpose models like Qwen3.7 Max and Qwen3.6 Plus use flat per-token pricing across their full context window.
Thinking mode bills as output, and the premium varies by model
Qwen models support thinking and non-thinking modes, and reasoning tokens bill at the output rate. How much that costs varies by model — some charge the same rate whether thinking is on or off, others price extended reasoning separately. Check the specific model's pricing page before assuming thinking is cheap or expensive.
One more billing rule: Batch invocation takes 50% off both input and output, and context caching discounts input tokens for repeated prefixes, but the two discounts cannot be combined on the same request.
What makes your bill higher than expected
Crossing a tier threshold
The most common surprise is a request crossing a length tier. On Qwen3 Coder Plus, a prompt that grows from 30K to 40K tokens moves the entire request from $1.00 to $1.80 input. Workloads with variable prompt sizes can straddle a threshold and bill unpredictably unless you watch request length.
The coder long-context step
Qwen3-Coder-Plus has the steepest step in the catalog. It bills $1.00 input and $5.00 output up to 32K tokens, rising through $1.80/$9.00 and $3.00/$15.00, then reaching $6.00 input and $60.00 output above 256K tokens. Feeding an entire large codebase into one request moves output to twelve times the base rate, so chunking matters more here than anywhere else in the lineup.
Thinking-mode output on the models that charge for it
On models where thinking carries a premium, leaving it on for simple tasks inflates output token costs. It's the first setting to check when a bill runs high.
Multimodal token rates
The Omni models bill audio far higher than text. Qwen3-Omni Flash charges $0.43 per 1M text input tokens, but audio input carries a significant premium on top of that. Voice and audio workloads need their own estimate rather than text rates.
The wrong region
Running on the Singapore endpoint when your data could sit in Beijing means paying 60 to 70% more per token. Region is a cost lever as much as a compliance one.
How to reduce Qwen API costs
The methods below are ordered by impact, with the most effective first.
1. Pick the right model for the job
The spread between Qwen3.6 Flash at $0.19/$1.13 and Qwen3.7 Max at $1.25/$3.75 is about 7x on input. Use Flash for classification, extraction, summarization, and high-volume chat; Qwen3.6 Plus for most production work; and reserve Qwen3.7 Max for complex reasoning and long-horizon agentic tasks. Routing the bulk of traffic to Flash reduces spend more than any other change here.
2. Choose the cheapest region you can use
If your workload can run with data in China, the Beijing endpoint cuts token costs by 60 to 70%. If it can't, the Global (US or Germany) endpoint is still cheaper than Singapore for many models. Match the region to your data-residency needs, then take the lowest price that qualifies.
3. Keep requests under tier thresholds
Because crossing a length tier rebills the whole request, trimming context to stay under a threshold can cut the per-token rate in half. Summarize history, retrieve less, or split jobs rather than letting prompts drift over a tier line.
4. Use Batch or context caching, whichever fits
Batch invocation gives 50% off both input and output for work that can run asynchronously. Context caching discounts input on repeated prefixes like system prompts. Pick the one that matches your workload, since they can't stack: Batch for bulk offline jobs, caching for high-frequency requests that share a prefix.
5. Turn off thinking mode for simple tasks
On models where thinking output costs more than non-thinking, switch it off for work that doesn't need step-by-step reasoning.
6. Self-host the open weights for sustained volume
Most Qwen models are Apache 2.0 licensed, so for steady high-volume workloads, running them on your own or a third-party host can beat per-token API pricing. Third-party providers such as DeepInfra, Novita, and Fireworks often price the open-weight Qwen models below Alibaba's own hosting.
Can you use the Qwen API for free?
There are several ways to use Qwen for free:
Puter.js: the User-Pays model
Puter.js is a JavaScript library that lets you add Qwen models to your app with no API key, no backend, and no bill to you as the developer. It works on the User-Pays model: each user of your app covers their own AI usage through their Puter account, so your costs stay at zero no matter how many users you have.
<html>
<body>
<script src="https://js.puter.com/v2/"></script>
<script>
puter.ai.chat("Explain quantum computing in simple terms", {
model: "qwen/qwen3.6-flash"
}).then(response => {
document.body.innerHTML = response.message.content;
});
</script>
</body>
</html>
We ran the same workload we use across our pricing guides: 500 monthly users sending 30 messages each, averaging 1,000 input and 300 output tokens per message, for 15M input and 4.5M output tokens a month. On Qwen3.6 Flash through the API, our calculation puts that at $2.85 for input and $5.09 for output, about $8 a month, growing linearly with your user base. Through Puter.js the same app costs you $0 at any scale, with no API key to manage, because each user carries their own usage.
The free quota: 1 million tokens per model
New Alibaba Cloud accounts get a free quota of 1 million tokens for each eligible model on the International (Singapore) endpoint, valid for 90 days after activating Model Studio. Because the quota is per model rather than shared, evaluating across dozens of Qwen models adds up to tens of millions of free tokens, one of the most generous evaluation tiers available. Two caveats: the free quota exists only in the Singapore region (Beijing, Global, US, and EU have none), and the old free OAuth API tier was discontinued on April 15, 2026, so tutorials promising standing free access are out of date.
Open-weight self-hosting
Most Qwen models are released under Apache 2.0 on Hugging Face, so you can run the same model the API serves on your own hardware with no API key and no usage charge. This suits anyone with the infrastructure to host an open-weight model and a workload large enough to justify it.
OpenRouter's free endpoints
OpenRouter lists free variants of several Qwen models (tagged :free). Free usage is capped at 50 requests per day and 20 per minute, rising to 1,000 per day once you've purchased at least $10 in credits. It's built for testing rather than production, and failed requests still count against the daily quota.
Real-world cost examples
Per-million-token prices are hard to apply directly, so here's what we calculated for a few real workloads, using the same method each time: estimate tokens per request, multiply by volume, then by the per-million rate. All figures use the International (Singapore) rates; Beijing would be 60 to 70% lower.
Customer support chatbot. We modeled 1,000 conversations a month, averaging 8 messages each, with about 1,200 input tokens (system prompt plus history) and 250 output tokens per message. That's 9.6M input and 2M output tokens monthly, which we priced as:
| Model | Monthly cost |
|---|---|
| Qwen3.6 Flash | ~$4 |
| Qwen3.6 Plus | ~$11 |
| Qwen3.7 Max | ~$20 |
Context caching on the repeated system prompt lowers these further for all three.
Summarizing 100 PDFs. At ~20,000 tokens per document with 500-token summaries, that's 2M input and 50K output tokens. We calculate about $0.44 on Qwen3.6 Flash.
Daily content generation. For 30 articles a month on Qwen3.6 Plus, with 2,000-token prompts and roughly 4,000 output tokens each, we estimate about $0.39 a month.
Complete Qwen API pricing table
All prices are per 1M tokens, International (Singapore) deployment. Tiered models show the rate at each input-length band.
Proprietary text models
| Model | Input | Output |
|---|---|---|
| Qwen3.7 Max | $1.25 | $3.75 |
| Qwen3.7 Plus (multimodal) | $0.32 | $1.28 |
| Qwen3.6 Max Preview | $1.30 | $7.80 |
| Qwen3.6 Plus | $0.50 | $3.00 |
| Qwen3.6 Flash | $0.19 | $1.13 |
| Qwen3 Max | $1.20 | $6.00 |
| Qwen3 Max Thinking | $0.78 | $3.90 |
| QwQ Plus (reasoning) | $0.80 | $2.40 |
| Qwen Flash | $0.05 | $0.40 |
Coding models
| Model | Input tier | Input | Output |
|---|---|---|---|
| Qwen3 Coder 480B A35B | flat | $1.50 | $7.50 |
| Qwen3 Coder Plus | ≤32K / ≤128K / ≤256K / ≤1M | $1.00 / $1.80 / $3.00 / $6.00 | $5.00 / $9.00 / $15.00 / $60.00 |
| Qwen3 Coder Flash | ≤32K / ≤128K / ≤256K / ≤1M | $0.30 / $0.50 / $0.80 / $1.60 | $1.50 / $2.50 / $4.00 / $9.60 |
| Qwen3 Coder Next | flat | $0.11 | $0.80 |
Open-weight hosted models (selection)
| Model | Input | Output |
|---|---|---|
| Qwen3.5-397B-A17B | $0.60 | $3.60 |
| Qwen3.6-27B | $0.60 | $3.60 |
| Qwen3.5-122B-A10B | $0.40 | $3.20 |
| Qwen3.5-27B | $0.30 | $2.40 |
| Qwen3-235B-A22B | $0.70 | $2.80 |
| Qwen3-32B | $0.70 | $2.80 |
Multimodal and specialized
| Model | Price |
|---|---|
| Qwen3-VL Plus (vision) | $0.20 input, $1.60 output |
| Qwen3 VL 235B A22B (vision) | $0.70 input, $2.80 output |
| Qwen3-Omni Flash (audio/video) | $0.43 text input, $1.66 output |
| Qwen Image 2.0 | $0.04 per image |
| Qwen Image 2.0 Pro | $0.08 per image |
| Qwen VL OCR | $0.72 input, $0.72 output |
| Qwen-MT Turbo (translation) | $0.16 input, $0.49 output |
Batch invocation takes 50% off input and output on supported models; context caching discounts input on repeated prefixes; the two cannot be combined. The Chinese Mainland (Beijing) endpoint is 60 to 70% cheaper but has no free quota and stores data in China. For the full catalog of more than 145 model IDs, including embeddings, TTS, video, math, and hosted third-party models, see Alibaba's official pricing page.
Conclusion
Qwen API pricing in 2026 runs from $0.05 per million input tokens on Qwen Flash to $1.30 on Qwen3.6 Max Preview, with the current flagship Qwen3.7 Max at $1.25/$3.75, across one of the widest model catalogs of any provider. Keeping the bill predictable comes down to a few choices:
- Route most traffic to Qwen3.6 Flash and reserve Qwen3.7 Max for the hardest tasks.
- Pick the cheapest region your data-residency needs allow.
- For coding models with tiered pricing, keep requests under the length tier thresholds.
- Use Batch or context caching, whichever fits, but not both.
Between the 1-million-token-per-model free quota and the open-weight models you can self-host, there's also a real path to evaluating and even running Qwen at no cost.
Prices here were verified against Alibaba Cloud's official Model Studio pricing. Alibaba updates its catalog and rates frequently, and Qwen is a China-based provider with region and data-residency considerations, so always confirm current pricing and the right region before committing to a budget.
Related
Free, Serverless AI and Cloud
Start creating powerful web applications with Puter.js in seconds!
Get Started Now