Qwen API
Access Qwen instantly with Puter.js, and add AI to any app in a few lines of code without backend or API keys.
// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';
puter.ai.chat("Explain AI like I'm five!", {
model: "qwen/qwen3.5-flash-02-23"
}).then(response => {
console.log(response);
});
<html>
<body>
<script src="https://js.puter.com/v2/"></script>
<script>
puter.ai.chat("Explain AI like I'm five!", {
model: "qwen/qwen3.5-flash-02-23"
}).then(response => {
console.log(response);
});
</script>
</body>
</html>
List of Qwen Models
Qwen3.5-9B
qwen/qwen3.5-9b
Qwen 3.5 9B is a 9-billion parameter open-source multimodal model by Alibaba's Qwen Team, featuring a 262K native context window (extendable to ~1M tokens), support for text, image, and video input, and coverage of 201 languages. It uses a hybrid Gated DeltaNet architecture and outperforms much larger models like Qwen3-30B and OpenAI's gpt-oss-120B on key benchmarks including reasoning, vision, and document understanding.
ChatQwen3.5-122B-A10B
qwen/qwen3.5-122b-a10b
Qwen 3.5 122B (10B Active) is Alibaba's largest medium-sized MoE model, activating only 10B of its 122B total parameters per inference pass. It excels at agentic tasks like tool use and multi-step reasoning, leading the Qwen 3.5 lineup on benchmarks such as BFCL-V4 and BrowseComp. It supports 262K native context (extendable to 1M), native multimodal input, and 201 languages under Apache 2.0.
ChatQwen3.5-27B
qwen/qwen3.5-27b
Qwen 3.5 27B is the only dense (non-MoE) model in the Qwen 3.5 medium series, activating all 27B parameters on every forward pass for maximum per-token reasoning density. It ties GPT-5 mini on SWE-bench Verified at 72.4 and is competitive with Claude Sonnet 4.5 on visual reasoning benchmarks. It runs well on consumer hardware and is open-weight under Apache 2.0.
ChatQwen3.5-35B-A3B
qwen/qwen3.5-35b-a3b
Qwen 3.5 35B (3B Active) is a sparse MoE model that activates just 3B of its 35B total parameters, yet outperforms the previous-generation 235B flagship across language, vision, coding, and agent tasks. It uses a hybrid Gated DeltaNet + MoE architecture and can run on GPUs with as little as 8GB VRAM when quantized. It's the base model behind the hosted Qwen 3.5 Flash API.
ChatQwen3.5-Flash
qwen/qwen3.5-flash-02-23
Qwen 3.5 Flash is the production-optimized API version of the 35B-A3B model. It features a default 1M token context window, built-in tool/function calling support, and is priced at ~$0.10/M input tokens for low-latency agentic workflows. The '02-23' suffix indicates the February 23, 2026 snapshot/version date.
ChatQwen3.5 397B A17B
qwen/qwen3.5-397b-a17b
Qwen3.5-397B-A17B is an open-weight native vision-language model from Alibaba's Qwen team, released in February 2026. It uses a hybrid architecture combining Gated Delta Networks (linear attention) with a sparse mixture-of-experts design, totaling 397 billion parameters but activating only 17 billion per forward pass for efficient inference. The model delivers strong performance across reasoning, coding, agent tasks, and multimodal understanding, competing with frontier models like GPT-5.2, Claude 4.5 Opus, and Gemini-3 Pro. It supports 201 languages and dialects and features a 250k-token vocabulary. Its decoding throughput is reported at 8.6x that of Qwen3-Max under a 32k context length.
ChatQwen3.5 Plus 02-15
qwen/qwen3.5-plus-02-15
Qwen3.5-Plus is the hosted flagship model in the Qwen3.5 series, available through Alibaba Cloud Model Studio. It offers a 1 million token context window by default and includes built-in tools with adaptive tool use, including web search and code interpreter capabilities. The model supports reasoning mode (chain-of-thought), search, and a fast response mode without extended thinking. It is accessible via an OpenAI-compatible API and can be integrated with third-party coding tools like Claude Code, Cline, and OpenClaw. Qwen3.5-Plus is designed for agentic workflows that combine multimodal reasoning with tool use.
ChatQwen3 Max Thinking
qwen/qwen3-max-thinking
Qwen3 Max Thinking is Alibaba Cloud's flagship proprietary reasoning model with a 256K context window, featuring test-time scaling and adaptive tool-use capabilities (web search, code interpreter, memory) that allow it to reason iteratively and autonomously. It scores competitively against GPT-5.2 and Gemini 3 Pro on benchmarks like Humanity's Last Exam and HMMT, excelling in math, complex reasoning, and instruction following.
ChatQwen3 Coder Next
qwen/qwen3-coder-next
Qwen3-Coder-Next is an open-weight coding model from Alibaba's Qwen team with 80B total parameters but only 3B active per token, designed specifically for coding agents and local development with a 256K context window. It uses a sparse Mixture-of-Experts (MoE) architecture with hybrid attention, trained on 800K executable coding tasks using reinforcement learning to excel at long-horizon reasoning, tool calling, and recovering from execution failures. It achieves performance comparable to models with 10-20x more active parameters on benchmarks like SWE-Bench while maintaining low inference costs.
ChatQwen3 Next 80B A3B Instruct
qwen/qwen3-next-80b-a3b-instruct
Qwen3 Next 80B A3B Instruct is an innovative MoE model with hybrid attention (Gated DeltaNet + Gated Attention), achieving 10x inference throughput for 32K+ contexts while matching Qwen3-235B performance.
ChatQwen3 Next 80B A3B Thinking
qwen/qwen3-next-80b-a3b-thinking
Qwen3 Next 80B A3B Thinking is the reasoning-enhanced variant outperforming Gemini-2.5-Flash-Thinking on complex reasoning tasks with hybrid attention and multi-token prediction.
ChatQwen Plus 0728
qwen/qwen-plus-2025-07-28
Qwen Plus (2025-07-28) is a snapshot version of Qwen Plus from July 2025, offering consistent behavior and performance for production deployments requiring version stability.
ChatQwen Plus 0728 (thinking)
qwen/qwen-plus-2025-07-28:thinking
Qwen Plus (2025-07-28) Thinking is the reasoning-enhanced version that uses chain-of-thought processing for complex problems, providing step-by-step reasoning before delivering answers.
ChatQwen3 235B A22B Instruct 2507
qwen/qwen3-235b-a22b-2507
Qwen3 235B A22B (2507) is the July 2025 updated version with significant improvements in instruction following, reasoning, coding, tool usage, and 256K long-context understanding.
ChatQwen3 235B A22B Thinking 2507
qwen/qwen3-235b-a22b-thinking-2507
Qwen3 235B A22B Thinking (2507) is the reasoning-enhanced variant using extended chain-of-thought processing for complex math, coding, and logical problems with enhanced performance.
ChatQwen3 30B A3B Instruct 2507
qwen/qwen3-30b-a3b-instruct-2507
Qwen3 30B A3B Instruct (2507) is the July 2025 updated instruction-tuned version with improved capabilities in reasoning, coding, and tool usage at high efficiency.
ChatQwen3 30B A3B Thinking 2507
qwen/qwen3-30b-a3b-thinking-2507
Qwen3 30B A3B Thinking (2507) is the reasoning-enhanced variant optimized for complex problem-solving with extended chain-of-thought processing at high parameter efficiency.
ChatQwen3 Coder 480B A35B
qwen/qwen3-coder
Qwen3 Coder is the most agentic code model in the Qwen series, available in 30B and 480B MoE variants. It achieves SOTA on SWE-Bench with 256K native context, extendable to 1M tokens.
ChatQwen3 Coder 30B A3B Instruct
qwen/qwen3-coder-30b-a3b-instruct
Qwen3 Coder 30B A3B Instruct is an efficient MoE coding model with 30B total and 3.3B active parameters, offering strong agentic coding capabilities with 256K context support.
ChatQwen3 Coder Flash
qwen/qwen3-coder-flash
Qwen3 Coder Flash is a cost-effective coding model balancing performance and speed, suitable for scenarios requiring fast responses at lower cost while maintaining coding quality.
ChatQwen3 Coder Plus
qwen/qwen3-coder-plus
Qwen3 Coder Plus is the strongest Qwen coding API model, ideal for complex project generation and in-depth code reviews with up to 1M token context support.
ChatQwen3 VL 235B A22B Instruct
qwen/qwen3-vl-235b-a22b-instruct
Qwen3 VL 235B A22B Instruct is the flagship vision-language MoE model with 256K context, offering superior visual coding, spatial understanding, and long video comprehension up to 20 minutes.
ChatQwen3 VL 235B A22B Thinking
qwen/qwen3-vl-235b-a22b-thinking
Qwen3 VL 235B A22B Thinking is the reasoning-enhanced vision-language model excelling at visual math, detail analysis, and causal reasoning with extended chain-of-thought processing.
ChatQwen3 VL 30B A3B Instruct
qwen/qwen3-vl-30b-a3b-instruct
Qwen3 VL 30B A3B Instruct is an efficient vision-language MoE model offering strong image/video understanding with 3B active parameters and 256K context support.
ChatQwen3 VL 30B A3B Thinking
qwen/qwen3-vl-30b-a3b-thinking
Qwen3 VL 30B A3B Thinking is the reasoning-enhanced vision-language variant optimized for complex visual reasoning tasks with extended thinking capabilities.
ChatQwen3 VL 32B Instruct
qwen/qwen3-vl-32b-instruct
Qwen3 VL 32B Instruct is a dense vision-language model with strong text and visual capabilities, featuring visual coding, spatial understanding, and 256K context support.
ChatQwen3 VL 8B Instruct
qwen/qwen3-vl-8b-instruct
Qwen3 VL 8B Instruct is a compact vision-language model matching flagship text performance while supporting image/video understanding, visual coding, and 256K context length.
ChatQwen3 VL 8B Thinking
qwen/qwen3-vl-8b-thinking
Qwen3 VL 8B Thinking is the reasoning-enhanced compact vision model for complex visual analysis requiring step-by-step reasoning with efficient resource usage.
ChatQwen3 Max
qwen/qwen3-max
Qwen3 Max is the most powerful Qwen3 API model with SOTA agent programming and tool usage capabilities. It features non-thinking mode optimized for complex agent scenarios.
ChatQwen3 14B
qwen/qwen3-14b
Qwen3 14B is a dense language model with hybrid thinking/non-thinking modes, matching Qwen2.5-32B performance. It supports 119 languages and excels in math, coding, and reasoning tasks.
ChatQwen3 235B A22B
qwen/qwen3-235b-a22b
Qwen3 235B A22B is the flagship MoE model with 235B total and 22B active parameters, rivaling DeepSeek-R1 and o1. It features hybrid thinking modes and supports 119 languages with strong agentic capabilities.
ChatQwen3 30B A3B
qwen/qwen3-30b-a3b
Qwen3 30B A3B is an efficient MoE model with 30B total and 3B active parameters, outperforming QwQ-32B while using 10x fewer active parameters. It offers hybrid thinking modes and 119 language support.
ChatQwen3 32B
qwen/qwen3-32b
Qwen3 32B is a dense language model matching Qwen2.5-72B performance with hybrid thinking/non-thinking modes. It excels in STEM, coding, and reasoning while supporting 119 languages.
ChatQwen3 4B
qwen/qwen3-4b:free
Qwen3 4B is a compact model rivaling Qwen2.5-72B-Instruct performance, featuring hybrid thinking modes and 119 language support.
ChatQwen3 8B
qwen/qwen3-8b
Qwen3 8B is a dense model matching Qwen2.5-14B performance with hybrid thinking modes and 128K context. It offers strong reasoning, coding, and multilingual capabilities in a mid-sized package.
ChatQwen2.5 VL 32B Instruct
qwen/qwen2.5-vl-32b-instruct
Qwen 2.5 VL 32B Instruct is a mid-sized vision-language model offering enhanced image/video understanding with better alignment to human preferences. It bridges the gap between 7B and 72B variants.
ChatQwQ 32B
qwen/qwq-32b
QwQ 32B is a 32B parameter reasoning model rivaling DeepSeek-R1 (671B) through scaled reinforcement learning. It excels in math, coding, and complex reasoning with 131K context and agent capabilities.
ChatQwen-Max
qwen/qwen-max
Qwen Max is Alibaba's most powerful proprietary API model, a large-scale MoE with hundreds of billions of parameters. It delivers top-tier performance in reasoning, coding, math, and multilingual tasks via Alibaba Cloud Model Studio.
ChatQwen-Plus
qwen/qwen-plus
Qwen Plus is a high-performance proprietary API model balancing capability and cost, suitable for complex tasks requiring strong reasoning and multilingual support. Available through Alibaba Cloud Model Studio.
ChatQwen-Turbo
qwen/qwen-turbo
Qwen Turbo is a fast, cost-effective API model with up to 1M context length, ideal for simple tasks requiring quick responses. It supports multiple languages and offers flexible tiered pricing.
ChatQwen2.5-VL 7B Instruct
qwen/qwen-2.5-vl-7b-instruct
Qwen 2.5 VL 7B Instruct is a vision-language model capable of understanding images, documents, charts, and videos up to 1 hour. It supports OCR, visual reasoning, and can act as a visual agent for computer/phone use.
ChatQwen VL Max
qwen/qwen-vl-max
Qwen VL Max is Alibaba's most capable vision-language API model based on Qwen2.5-VL, offering superior image/video understanding, OCR, document analysis, and visual reasoning capabilities.
ChatQwen VL Plus
qwen/qwen-vl-plus
Qwen VL Plus is a balanced vision-language API model offering good performance at lower cost, suitable for image understanding, OCR, and multimodal tasks without requiring maximum capability.
ChatQwen2.5 VL 72B Instruct
qwen/qwen2.5-vl-72b-instruct
Qwen 2.5 VL 72B Instruct is the flagship open-source vision-language model excelling in document understanding, visual reasoning, and long video comprehension up to 1 hour with event pinpointing.
ImageQwen/Qwen-Image
qwen/qwen-image
Qwen2.5 Coder 32B Instruct
qwen/qwen-2.5-coder-32b-instruct
Qwen 2.5 Coder 32B Instruct is a code-specialized model matching GPT-4o's coding capabilities, supporting 40+ programming languages. It excels in code generation, repair, and reasoning with 128K context support.
ChatQwen2.5 72B Instruct
qwen/qwen-2.5-72b-instruct
Qwen 2.5 72B Instruct is Alibaba's flagship open-source language model with 72 billion parameters, trained on 18 trillion tokens with 128K context support. It excels in coding, math, instruction following, and multilingual tasks across 29+ languages.
ChatQwen2.5 7B Instruct
qwen/qwen-2.5-7b-instruct
Qwen 2.5 7B Instruct is a compact yet capable language model offering strong performance in coding, math, and general tasks. It supports 128K context length and 29+ languages while being efficient enough for smaller deployments.
ChatQwen2.5 Coder 7B Instruct
qwen/qwen2.5-coder-7b-instruct
Qwen 2.5 Coder 7B Instruct is a compact code-specialized model with strong code generation, reasoning, and repair capabilities. It supports multiple programming languages while being deployable on consumer hardware.
Frequently Asked Questions
The Qwen API gives you access to models for AI chat and image generation. Through Puter.js, you can start using Qwen models instantly with zero setup or configuration.
Puter.js supports a variety of Qwen models, including Qwen3.5-9B, Qwen3.5-122B-A10B, Qwen3.5-27B, and more. Find all AI models supported by Puter.js in the AI model list.
With the User-Pays model, users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.
Puter.js is a JavaScript library that provides access to AI, storage, and other cloud services directly from a single API. It handles authentication, infrastructure, and scaling so you can focus on building your app.
Yes — the Qwen API through Puter.js works with any JavaScript framework, Node.js, or plain HTML. Just include the library and start building. See the documentation for more details.