Qwen API

Q: How much does it cost?

With the User-Pays model , users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

Access Qwen instantly with Puter.js, and add AI to any app in a few lines of code without backend or API keys.

Get Started Read Tutorial

// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';

puter.ai.chat("Explain AI like I'm five!", {
    model: "qwen/qwen3.5-flash-02-23"
}).then(response => {
    console.log(response);
});

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        puter.ai.chat("Explain AI like I'm five!", {
            model: "qwen/qwen3.5-flash-02-23"
        }).then(response => {
            console.log(response);
        });
    </script>
</body>
</html>

List of Qwen Models

Chat

Qwen3.6 Flash

qwen/qwen3.6-flash

Qwen3.6 Flash is the speed-optimized tier of Alibaba's Qwen3.6 model family, designed for high-throughput, low-latency inference pipelines. It sits alongside Qwen3.6 Max Preview, Plus, and 35B-A3B in the product lineup, targeting use cases where fast response times matter more than peak benchmark scores. Like other Qwen3.6 models, it builds on a hybrid architecture combining linear attention with sparse mixture-of-experts routing. It is best suited for high-volume production workloads such as classification, extraction, summarization, and lightweight agent tasks where latency and cost efficiency are the primary constraints.

Chat

Qwen3.5 Plus 2026-04-20

qwen/qwen3.5-plus-20260420

Qwen3.5 Plus is a proprietary hosted model from Alibaba, built on the Qwen3.5-397B-A17B Mixture-of-Experts architecture with 397 billion total parameters and 17 billion active per token. Its headline feature is a 1-million-token native context window — among the largest available via API — making it well suited for processing entire codebases, long documents, or extended multi-turn conversations in a single request. It supports both a deep-thinking mode and an "Auto" mode that adaptively invokes tools like web search and code interpreters. This April 20, 2026 snapshot reflects ongoing improvements to the model since its original February 2026 launch. The Qwen3.5 series demonstrated strong multimodal performance across reasoning, coding, and vision tasks. A solid general-purpose option for developers needing large-context capabilities without migrating to the newer Qwen3.6 line.

Chat

Qwen3.6 27B

qwen/qwen3.6-27b

Qwen3.6 27B is a dense 27-billion-parameter multimodal model from Alibaba's Qwen team, purpose-built for agentic coding and repository-level reasoning. It scores 77.2% on SWE-bench Verified and 59.3% on Terminal-Bench 2.0, outperforming the previous-generation Qwen3.5-397B-A17B across all major coding benchmarks despite being far smaller. It natively supports text, image, and video inputs with a 262K-token context window, extendable to 1M tokens. A standout feature is Thinking Preservation, which retains reasoning traces across conversation turns — reducing redundant computation in multi-step agent loops. The model uses a hybrid attention architecture combining Gated DeltaNet with traditional self-attention. Ideal for developers building coding agents, multi-turn tool-use workflows, or frontend generation pipelines.

Chat

Qwen3.6 Max Preview

qwen/qwen3.6-max-preview

Qwen3.6 Max Preview is Alibaba's most capable language model to date — a proprietary flagship that claimed the top score on six major coding benchmarks at its April 20, 2026 release. It leads on SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, and SciCode. The Artificial Analysis Intelligence Index rates it at 52, well above the median for reasoning models in its price tier. It supports a 256K-token context window and is text-only at launch. As a preview release, Alibaba is still actively iterating on the model. Best suited for teams building coding agents, scientific computing tools, or frontend generation systems that need peak benchmark performance.

Chat

Qwen3.6 35B A3B

qwen/qwen3.6-35b-a3b

Qwen3.6 35B A3B is a sparse Mixture-of-Experts model with 35 billion total parameters but only 3 billion active per token, making it highly efficient for inference. Developed by Alibaba's Qwen team, it scores 73.4% on SWE-bench Verified and 51.5% on Terminal-Bench 2.0 — significantly outperforming dense models like Gemma 4-31B (52.0% on SWE-bench Verified). It natively handles text, image, and video with a 262K-token context window, extendable to 1M tokens. The model supports Thinking Preservation for stable multi-turn reasoning and includes native tool-calling capabilities. Released under Apache 2.0, it was the first open-weight model in the Qwen3.6 family. A strong choice for developers who want frontier-adjacent coding performance at a fraction of the compute cost of larger models.

Chat

Qwen3.6 Plus

qwen/qwen3.6-plus

Qwen 3.6 Plus is Alibaba's flagship large language model, built on a hybrid architecture combining linear attention with sparse mixture-of-experts routing for high throughput and scalability. It's optimized for agentic coding and complex multi-step workflows. On Terminal-Bench 2.0, it scores 61.6, surpassing Claude 4.5 Opus (59.3), while its 78.8 on SWE-bench Verified places it close behind. It also leads on MCPMark (48.2%) for tool-calling reliability. A native multimodal model, it handles text, images, and documents within a 1M-token context window with up to 65K output tokens. Notable features include always-on chain-of-thought reasoning, native function calling, and a preserve_thinking parameter that retains reasoning across multi-turn agent loops. A strong fit for developers building AI coding agents, terminal automation, and tool-using pipelines.

Chat

Qwen3.6 Plus Preview

qwen/qwen3.6-plus-preview:free

Qwen 3.6 Plus Preview is a next-generation large language model from Alibaba's Qwen team, built on a hybrid architecture designed for improved efficiency and scalability. Released as an early preview in March 2026, it succeeds the Qwen 3.5 Plus series with stronger reasoning and more reliable agentic behavior. The model offers a 1-million-token context window and up to 65,536 output tokens, making it well suited for processing large codebases, lengthy documents, or multi-step workflows in a single request. It supports tool use and function calling natively, with built-in chain-of-thought reasoning that is always active. Qwen 3.6 Plus Preview is particularly strong in agentic coding, front-end component generation, and complex problem-solving. It's a good fit for developers building AI-driven code review tools, multi-step agents, or applications that benefit from deep reasoning over large inputs.

Chat

Qwen3.5-9B

qwen/qwen3.5-9b

Qwen 3.5 9B is a 9-billion parameter open-source multimodal model by Alibaba's Qwen Team, featuring a 262K native context window (extendable to ~1M tokens), support for text, image, and video input, and coverage of 201 languages. It uses a hybrid Gated DeltaNet architecture and outperforms much larger models like Qwen3-30B and OpenAI's gpt-oss-120B on key benchmarks including reasoning, vision, and document understanding.

Image

Qwen Image 2.0

qwen/qwen-image-2.0

Qwen Image 2.0 is Alibaba's second-generation image foundation model, delivering a major upgrade over the original Qwen Image with a leaner 7B-parameter architecture that outperforms its 20B predecessor across the board. It generates natively at 2048×2048 resolution and unifies text-to-image generation and image editing into a single model — no separate pipelines needed. The model scores 88.32 on DPG-Bench, surpassing FLUX.1 (83.84) and GPT Image 1 (85.15), and ranks #1 on AI Arena's blind human evaluation for both generation and editing. Its headline feature is professional typography rendering: it handles prompts up to 1,000 tokens and can generate complete infographics, PPT slides, posters, and comics with accurate bilingual text layout. Ideal for developers building design-oriented workflows where text accuracy and prompt adherence are critical.

Image

Qwen Image 2.0 Pro

qwen/qwen-image-2.0-pro

Qwen Image 2.0 Pro is the highest-fidelity configuration of Alibaba's Qwen Image 2.0, built on the same 7B-parameter architecture but tuned to maximize visual quality over speed. Compared to the standard tier, Pro delivers richer color accuracy, finer detail rendering — visible in textures like hair strands, fabric weaves, and metallic reflections — and stronger adherence to complex, multi-element prompts. Text rendering is also crisper, making it better suited for commercial assets like branded posters and packaging. The standard Qwen Image 2.0 is optimized for fast iteration and prototyping. Pro is where you go for final production renders where every pixel matters. Best for developers building pipelines that need polished, client-ready output from a single API call.

Chat

Qwen3.5-Flash

qwen/qwen3.5-flash-02-23

Qwen 3.5 Flash is the production-optimized API version of the 35B-A3B model. It features a default 1M token context window, built-in tool/function calling support, and is priced at ~$0.10/M input tokens for low-latency agentic workflows. The '02-23' suffix indicates the February 23, 2026 snapshot/version date.

Chat

Qwen3.5-122B-A10B

qwen/qwen3.5-122b-a10b

Qwen 3.5 122B (10B Active) is Alibaba's largest medium-sized MoE model, activating only 10B of its 122B total parameters per inference pass. It excels at agentic tasks like tool use and multi-step reasoning, leading the Qwen 3.5 lineup on benchmarks such as BFCL-V4 and BrowseComp. It supports 262K native context (extendable to 1M), native multimodal input, and 201 languages under Apache 2.0.

Chat

Qwen3.5-27B

qwen/qwen3.5-27b

Qwen 3.5 27B is the only dense (non-MoE) model in the Qwen 3.5 medium series, activating all 27B parameters on every forward pass for maximum per-token reasoning density. It ties GPT-5 mini on SWE-bench Verified at 72.4 and is competitive with Claude Sonnet 4.5 on visual reasoning benchmarks. It runs well on consumer hardware and is open-weight under Apache 2.0.

Chat

Qwen3.5-35B-A3B

qwen/qwen3.5-35b-a3b

Qwen 3.5 35B (3B Active) is a sparse MoE model that activates just 3B of its 35B total parameters, yet outperforms the previous-generation 235B flagship across language, vision, coding, and agent tasks. It uses a hybrid Gated DeltaNet + MoE architecture and can run on GPUs with as little as 8GB VRAM when quantized. It's the base model behind the hosted Qwen 3.5 Flash API.

Chat

Qwen3.5 Plus 02-15

qwen/qwen3.5-plus-02-15

Qwen3.5-Plus is the hosted flagship model in the Qwen3.5 series, available through Alibaba Cloud Model Studio. It offers a 1 million token context window by default and includes built-in tools with adaptive tool use, including web search and code interpreter capabilities. The model supports reasoning mode (chain-of-thought), search, and a fast response mode without extended thinking. It is accessible via an OpenAI-compatible API and can be integrated with third-party coding tools like Claude Code, Cline, and OpenClaw. Qwen3.5-Plus is designed for agentic workflows that combine multimodal reasoning with tool use.

Chat

Qwen3.5 Plus

qwen/qwen3.5-plus

Qwen3.5 Plus is Alibaba's hosted flagship model in the Qwen3.5 series, built on the Qwen3.5-397B-A17B Mixture-of-Experts architecture with 397 billion total parameters and 17 billion active per token. Its headline feature is a 1-million-token native context window — among the largest available via API — making it well suited for processing entire codebases, long documents, or extended multi-turn conversations in a single request. It supports both a deep-thinking mode and an "Auto" mode that adaptively invokes tools like web search and code interpreters. A solid general-purpose option for developers needing large-context capabilities and agentic workflows that combine multimodal reasoning with tool use.

Chat

Qwen3.5 397B A17B

qwen/qwen3.5-397b-a17b

Qwen3.5-397B-A17B is an open-weight native vision-language model from Alibaba's Qwen team, released in February 2026. It uses a hybrid architecture combining Gated Delta Networks (linear attention) with a sparse mixture-of-experts design, totaling 397 billion parameters but activating only 17 billion per forward pass for efficient inference. The model delivers strong performance across reasoning, coding, agent tasks, and multimodal understanding, competing with frontier models like GPT-5.2, Claude 4.5 Opus, and Gemini-3 Pro. It supports 201 languages and dialects and features a 250k-token vocabulary. Its decoding throughput is reported at 8.6x that of Qwen3-Max under a 32k context length.

Chat

Qwen3 Max Thinking

qwen/qwen3-max-thinking

Qwen3 Max Thinking is Alibaba Cloud's flagship proprietary reasoning model with a 256K context window, featuring test-time scaling and adaptive tool-use capabilities (web search, code interpreter, memory) that allow it to reason iteratively and autonomously. It scores competitively against GPT-5.2 and Gemini 3 Pro on benchmarks like Humanity's Last Exam and HMMT, excelling in math, complex reasoning, and instruction following.

Chat

Qwen3 Coder Next

qwen/qwen3-coder-next

Qwen3-Coder-Next is an open-weight coding model from Alibaba's Qwen team with 80B total parameters but only 3B active per token, designed specifically for coding agents and local development with a 256K context window. It uses a sparse Mixture-of-Experts (MoE) architecture with hybrid attention, trained on 800K executable coding tasks using reinforcement learning to excel at long-horizon reasoning, tool calling, and recovering from execution failures. It achieves performance comparable to models with 10-20x more active parameters on benchmarks like SWE-Bench while maintaining low inference costs.

Chat

Qwen3 Max

qwen/qwen3-max

Qwen3 Max is the most powerful Qwen3 API model with SOTA agent programming and tool usage capabilities. It features non-thinking mode optimized for complex agent scenarios.

Chat

Qwen3-VL Plus

qwen/qwen3-vl-plus

Qwen3-VL Plus is Alibaba Cloud's hosted vision-language API model in the Qwen3-VL series, offering strong multimodal understanding without requiring self-hosted infrastructure. It handles a wide range of visual tasks including document parsing, chart analysis, OCR, image reasoning, and GUI interaction for PC and mobile interfaces. With a 262K token context window, it is well suited for processing lengthy documents, multi-page PDFs, and extended visual conversations in a single request. The model supports structured output and tool calling, making it a practical choice for developers building document intelligence pipelines, visual agents, and multimodal data extraction workflows via the OpenAI-compatible Alibaba Cloud Model Studio API.

Chat

Qwen3-Omni Flash

qwen/qwen3-omni-flash

Qwen3-Omni Flash is a fast, cost-efficient omni-modal model from Alibaba's Qwen3 series, designed for real-time multimodal applications. As a member of the Qwen3-Omni family, it ingests text, images, audio, and video in a single end-to-end architecture — no separate pipelines or modality-switching required. It produces text responses and supports low-latency streaming, making it well suited for voice assistants, live audio/video analysis, and cost-sensitive production workloads. The Flash tier prioritizes speed and throughput over the maximum capability of the full Qwen3-Omni model, with a 65K context window and 16K output limit optimized for shorter media clips and high-volume inference. Developers building real-time assistants, transcription tools, or multimodal agents who need broad input coverage at a lower cost point will find it a practical choice.

Chat

Qwen3 Next 80B A3B Instruct

qwen/qwen3-next-80b-a3b-instruct

Qwen3 Next 80B A3B Instruct is an innovative MoE model with hybrid attention (Gated DeltaNet + Gated Attention), achieving 10x inference throughput for 32K+ contexts while matching Qwen3-235B performance.

Chat

Qwen3 Next 80B A3B Thinking

qwen/qwen3-next-80b-a3b-thinking

Qwen3 Next 80B A3B Thinking is the reasoning-enhanced variant outperforming Gemini-2.5-Flash-Thinking on complex reasoning tasks with hybrid attention and multi-token prediction.

Image

Qwen Image

qwen/qwen-image

Qwen Image is a 20B-parameter image generation foundation model from Alibaba's Qwen series, built for text-to-image generation, image editing, and image understanding tasks. Its standout capability is high-fidelity text rendering — it accurately places readable text in both English and Chinese within generated images, making it especially strong for posters, slides, and design-heavy visuals. Beyond text, it supports a wide range of styles from photorealism to anime, and handles advanced editing operations like style transfer, object insertion/removal, and in-image text modification. The model also performs image understanding tasks including object detection, segmentation, depth estimation, and super-resolution. A versatile choice for developers who need generation, editing, and visual analysis in a single model. Licensed under Apache 2.0.

Chat

Qwen Plus 0728

qwen/qwen-plus-2025-07-28

Qwen Plus (2025-07-28) is a snapshot version of Qwen Plus from July 2025, offering consistent behavior and performance for production deployments requiring version stability.

Chat

Qwen Plus 0728 (thinking)

qwen/qwen-plus-2025-07-28:thinking

Qwen Plus (2025-07-28) Thinking is the reasoning-enhanced version that uses chain-of-thought processing for complex problems, providing step-by-step reasoning before delivering answers.

Chat

Qwen3 235B A22B Instruct 2507

qwen/qwen3-235b-a22b-2507

Qwen3 235B A22B (2507) is the July 2025 updated version with significant improvements in instruction following, reasoning, coding, tool usage, and 256K long-context understanding.

Chat

Qwen3 235B A22B Thinking 2507

qwen/qwen3-235b-a22b-thinking-2507

Qwen3 235B A22B Thinking (2507) is the reasoning-enhanced variant using extended chain-of-thought processing for complex math, coding, and logical problems with enhanced performance.

Chat

Qwen3 30B A3B Instruct 2507

qwen/qwen3-30b-a3b-instruct-2507

Qwen3 30B A3B Instruct (2507) is the July 2025 updated instruction-tuned version with improved capabilities in reasoning, coding, and tool usage at high efficiency.

Chat

Qwen3 30B A3B Thinking 2507

qwen/qwen3-30b-a3b-thinking-2507

Qwen3 30B A3B Thinking (2507) is the reasoning-enhanced variant optimized for complex problem-solving with extended chain-of-thought processing at high parameter efficiency.

Chat

Qwen3 Coder Flash

qwen/qwen3-coder-flash

Qwen3 Coder Flash is a cost-effective coding model balancing performance and speed, suitable for scenarios requiring fast responses at lower cost while maintaining coding quality.

Chat

Qwen Flash

qwen/qwen-flash

Qwen Flash is Alibaba's latency-optimized general-purpose language model, designed as the successor to Qwen Turbo for cost-efficient, high-throughput workloads. It offers a 1 million token context window with native support for context caching, making repeated or large-context requests significantly cheaper. The model supports function calling and is accessible via an OpenAI-compatible API through Alibaba Cloud Model Studio. Qwen Flash is a strong choice for developers running high-volume production tasks — classification, extraction, summarization, and lightweight agentic pipelines — where low latency and predictable pricing matter more than peak reasoning capability. Its flexible tiered pricing and context cache support make it especially cost-effective at scale.

Chat

Qwen3 Coder Plus

qwen/qwen3-coder-plus

Qwen3 Coder Plus is the strongest Qwen coding API model, ideal for complex project generation and in-depth code reviews with up to 1M token context support.

Chat

Qwen3 VL 235B A22B Thinking

qwen/qwen3-vl-235b-a22b-thinking

Qwen3 VL 235B A22B Thinking is the reasoning-enhanced vision-language model excelling at visual math, detail analysis, and causal reasoning with extended chain-of-thought processing.

Chat

Qwen3 VL 30B A3B Instruct

qwen/qwen3-vl-30b-a3b-instruct

Qwen3 VL 30B A3B Instruct is an efficient vision-language MoE model offering strong image/video understanding with 3B active parameters and 256K context support.

Chat

Qwen3 VL 30B A3B Thinking

qwen/qwen3-vl-30b-a3b-thinking

Qwen3 VL 30B A3B Thinking is the reasoning-enhanced vision-language variant optimized for complex visual reasoning tasks with extended thinking capabilities.

Chat

Qwen3 VL 32B Instruct

qwen/qwen3-vl-32b-instruct

Qwen3 VL 32B Instruct is a dense vision-language model with strong text and visual capabilities, featuring visual coding, spatial understanding, and 256K context support.

Chat

Qwen3 VL 8B Instruct

qwen/qwen3-vl-8b-instruct

Qwen3 VL 8B Instruct is a compact vision-language model matching flagship text performance while supporting image/video understanding, visual coding, and 256K context length.

Chat

Qwen3 VL 8B Thinking

qwen/qwen3-vl-8b-thinking

Qwen3 VL 8B Thinking is the reasoning-enhanced compact vision model for complex visual analysis requiring step-by-step reasoning with efficient resource usage.

Chat

Qwen3 30B A3B

qwen/qwen3-30b-a3b

Qwen3 30B A3B is an efficient MoE model with 30B total and 3B active parameters, outperforming QwQ-32B while using 10x fewer active parameters. It offers hybrid thinking modes and 119 language support.

Chat

Qwen3 4B

qwen/qwen3-4b:free

Qwen3 4B is a compact model rivaling Qwen2.5-72B-Instruct performance, featuring hybrid thinking modes and 119 language support.

Chat

Qwen3 14B

qwen/qwen3-14b

Qwen3 14B is a dense language model with hybrid thinking/non-thinking modes, matching Qwen2.5-32B performance. It supports 119 languages and excels in math, coding, and reasoning tasks.

Chat

Qwen3 235B A22B

qwen/qwen3-235b-a22b

Qwen3 235B A22B is the flagship MoE model with 235B total and 22B active parameters, rivaling DeepSeek-R1 and o1. It features hybrid thinking modes and supports 119 languages with strong agentic capabilities.

Chat

Qwen3 32B

qwen/qwen3-32b

Qwen3 32B is a dense language model matching Qwen2.5-72B performance with hybrid thinking/non-thinking modes. It excels in STEM, coding, and reasoning while supporting 119 languages.

Chat

Qwen3 8B

qwen/qwen3-8b

Qwen3 8B is a dense model matching Qwen2.5-14B performance with hybrid thinking modes and 128K context. It offers strong reasoning, coding, and multilingual capabilities in a mid-sized package.

Chat

Qwen3 Coder 480B A35B

qwen/qwen3-coder-480b-a35b-instruct

Qwen3 Coder is the most agentic code model in the Qwen series, available in 30B and 480B MoE variants. It achieves SOTA on SWE-Bench with 256K native context, extendable to 1M tokens.

Chat

Qwen3 Coder 30B A3B Instruct

qwen/qwen3-coder-30b-a3b-instruct

Qwen3 Coder 30B A3B Instruct is an efficient MoE coding model with 30B total and 3.3B active parameters, offering strong agentic coding capabilities with 256K context support.

Chat

Qwen3 VL 235B A22B Instruct

qwen/qwen3-vl-235b-a22b

Qwen3 VL 235B A22B Instruct is the flagship vision-language MoE model with 256K context, offering superior visual coding, spatial understanding, and long video comprehension up to 20 minutes.

Chat

Qwen3-VL 30B-A3B

qwen/qwen3-vl-30b-a3b

Qwen3-VL 30B-A3B is a compact mixture-of-experts vision-language model from Alibaba's Qwen team, with 30B total parameters and only 3B active per token for efficient inference. It supports image and text inputs with a 131K context window and delivers strong multimodal performance on benchmarks including MMMU and visual-math evaluations. Capabilities include document and chart understanding, OCR, visual coding (generating HTML/CSS/JS from images), 2D spatial grounding, and GUI agent tasks across desktop and mobile interfaces. The MoE architecture gives it the knowledge breadth of a much larger model while matching the latency and cost profile of a 3B dense model — making it a practical choice for developers who need reliable vision-language capabilities without the compute cost of the 235B flagship variant. Supports tool calling.

Chat

QVQ Max

qwen/qvq-max

QVQ Max is Alibaba's flagship visual reasoning model, built by the Qwen team to combine deep multimodal understanding with rigorous logical inference. Unlike standard vision-language models, QVQ Max is designed to think through what it sees — analyzing charts, diagrams, math problems, and everyday images step by step before responding. It scores 70.3% on MMMU and 71.4% on MathVista (mini), placing it among the top multimodal reasoning models available via API. The model handles text and image inputs across a 131K token context window and supports tool calling for agentic workflows. Ideal for developers building tutoring tools, visual data analysis pipelines, document understanding systems, or any application that requires both image comprehension and structured reasoning.

Chat

Qwen2.5 VL 32B Instruct

qwen/qwen2.5-vl-32b-instruct

Qwen 2.5 VL 32B Instruct is a mid-sized vision-language model offering enhanced image/video understanding with better alignment to human preferences. It bridges the gap between 7B and 72B variants.

Chat

QwQ 32B

qwen/qwq-32b

QwQ 32B is a 32B parameter reasoning model rivaling DeepSeek-R1 (671B) through scaled reinforcement learning. It excels in math, coding, and complex reasoning with 131K context and agent capabilities.

Chat

QwQ Plus

qwen/qwq-plus

QwQ Plus is a proprietary reasoning model from Alibaba's Qwen team, serving as the hosted API counterpart to the open-weight QwQ-32B release. Like QwQ-32B, it uses reinforcement learning to develop extended chain-of-thought reasoning, excelling at math competition problems, scientific reasoning, and complex coding tasks. QwQ-32B achieved 79.5% on AIME 2024, 90.6% on MATH-500, and 63.4% on LiveCodeBench — rivaling much larger models. QwQ Plus exposes these capabilities through a managed API endpoint with a 131K token context window and tool call support. Best suited for developers building applications that require step-by-step mathematical reasoning, algorithmic problem-solving, or multi-step logical inference.

Chat

Qwen2.5 VL 72B Instruct

qwen/qwen2.5-vl-72b-instruct

Qwen 2.5 VL 72B Instruct is the flagship open-source vision-language model excelling in document understanding, visual reasoning, and long video comprehension up to 1 hour with event pinpointing.

Chat

Qwen-Omni Turbo

qwen/qwen-omni-turbo

Qwen-Omni Turbo is Alibaba's cost-optimized omnimodal API model, built to process text, image, audio, and video inputs and return text responses in a single unified interface. It is the lighter, faster tier in the Qwen-Omni family, designed for developers who need full multimodal coverage at lower latency and cost than the flagship Qwen-Omni model. Audio files up to 40 seconds and video files up to 150 MB are supported, spanning common formats such as MP3, WAV, MP4, and MOV. The model handles tool calling natively and is accessible via an OpenAI-compatible API. Best suited for developers building applications that need to reason across mixed media inputs — such as audio transcription pipelines, video understanding workflows, or multimodal chatbots — where throughput and cost efficiency matter.

Chat

Qwen-MT Plus

qwen/qwen-mt-plus

Qwen-MT Plus is a specialized machine translation model from Alibaba's Qwen team, purpose-built for high-quality text translation across 92 languages covering over 95% of the world's population. Unlike general-purpose language models, Qwen-MT Plus is fine-tuned specifically for translation tasks, offering term intervention, domain prompting, and translation memory features that give developers fine-grained control over output. It supports translation between major languages including Chinese, English, Japanese, Korean, French, Spanish, German, Arabic, Thai, Indonesian, and Vietnamese. Best suited for developers building multilingual applications, content localization pipelines, or customer-facing translation features where accuracy, terminology consistency, and domain fidelity matter more than general conversational ability.

Chat

Qwen-MT Turbo

qwen/qwen-mt-turbo

Qwen-MT Turbo is a fast, cost-effective machine translation model from Alibaba's Qwen team, designed for high-volume text translation across 92 languages. As the Turbo tier of the Qwen-MT family, it trades some of the output fidelity of Qwen-MT Plus for significantly lower cost and faster throughput — making it the practical choice for latency-sensitive or budget-constrained translation workflows. Like its sibling, it supports term intervention, domain prompting, and translation memory, giving developers control over terminology and style. Best suited for developers building high-volume localization pipelines, real-time translation features, or cost-sensitive multilingual applications where speed and price efficiency matter more than maximum output quality.

Chat

Qwen2.5-Omni 7B

qwen/qwen2-5-omni-7b

Qwen2.5-Omni 7B is Alibaba's end-to-end omni-modal model capable of perceiving text, images, audio, and video simultaneously while generating text and natural speech in real time. Built on a Thinker-Talker architecture with TMRoPE (Time-aligned Multimodal RoPE) for synchronizing audio and video streams, the 7B model achieves strong benchmark results across all modalities. It ranked first on the MMAU audio understanding leaderboard, scored 59.2 on MMMU image reasoning (near GPT-4o-mini's 60.0), and achieved 64.3 on Video-MME for video understanding without subtitles. On OmniBench, which tests cross-modal integration, it reached 56.13%. The model supports tool/function calling and targets developers building voice assistants, video analysis tools, and multimodal pipelines that require a single model to handle diverse input types.

Chat

Qwen2.5 Coder 32B Instruct

qwen/qwen-2.5-coder-32b-instruct

Qwen 2.5 Coder 32B Instruct is a code-specialized model matching GPT-4o's coding capabilities, supporting 40+ programming languages. It excels in code generation, repair, and reasoning with 128K context support.

Chat

Qwen-Turbo

qwen/qwen-turbo

Qwen Turbo is a fast, cost-effective API model with up to 1M context length, ideal for simple tasks requiring quick responses. It supports multiple languages and offers flexible tiered pricing.

Chat

Qwen-VL OCR

qwen/qwen-vl-ocr

Qwen-VL OCR is Alibaba's specialized vision-language model purpose-built for text extraction and document parsing, derived from the Qwen-VL series. Unlike general-purpose VL models, it's optimized for OCR across scanned documents, tables, receipts, exam papers, forms, and handwritten content. It supports multilingual recognition including English, Chinese, French, German, Japanese, Korean, Russian, Italian, and Arabic. Capabilities include skewed image recognition, text localization with bounding box coordinates, table-to-HTML parsing, document-to-LaTeX conversion, and formula transcription. Built-in task modes return structured output as plain text, JSON, HTML, or LaTeX depending on the workflow. It's the right Qwen API choice for developers building document digitization, receipt parsing, or information extraction pipelines that need OCR-focused accuracy rather than general visual reasoning.

Chat

Qwen2.5 Coder 7B Instruct

qwen/qwen2.5-coder-7b-instruct

Qwen 2.5 Coder 7B Instruct is a compact code-specialized model with strong code generation, reasoning, and repair capabilities. It supports multiple programming languages while being deployable on consumer hardware.

Chat

Qwen2.5 72B Instruct

qwen/qwen2-5-72b-instruct

Qwen 2.5 72B Instruct is Alibaba's flagship open-source language model with 72 billion parameters, trained on 18 trillion tokens with 128K context support. It excels in coding, math, instruction following, and multilingual tasks across 29+ languages.

Chat

Qwen2.5 7B Instruct

qwen/qwen2-5-7b-instruct

Qwen 2.5 7B Instruct is a compact yet capable language model offering strong performance in coding, math, and general tasks. It supports 128K context length and 29+ languages while being efficient enough for smaller deployments.

Chat

Qwen2.5-VL 7B Instruct

qwen/qwen2-5-vl-7b-instruct

Qwen 2.5 VL 7B Instruct is a vision-language model capable of understanding images, documents, charts, and videos up to 1 hour. It supports OCR, visual reasoning, and can act as a visual agent for computer/phone use.

Chat

Qwen2.5 14B Instruct

qwen/qwen2-5-14b-instruct

Qwen2.5 14B Instruct is a 14.7-billion-parameter open-weight model from Alibaba's Qwen team, trained on 18 trillion tokens and released under Apache 2.0. It hits a practical sweet spot in the Qwen2.5 lineup — outperforming both the 7B variant and models like Gemma 2 27B and GPT-4o mini on seven key benchmarks, while remaining far more efficient than the flagship 72B. Core strengths include strong instruction following, structured output (JSON) generation, math, and code. It reaches ~97% tool-call success across hardware, making it reliable for agentic workflows. Multilingual support spans 29+ languages with a 128K context window and up to 8K output tokens. A strong choice for developers who need GPT-4o-mini-class quality at a fraction of the cost of larger frontier models.

Chat

Qwen2.5 32B Instruct

qwen/qwen2-5-32b-instruct

Qwen2.5 32B Instruct is a general-purpose language model from Alibaba's Qwen team, sitting at the practical sweet spot between the 14B and 72B variants in the Qwen2.5 series — delivering stronger reasoning and language understanding than the 14B while remaining far more cost-efficient than the 72B. Trained on 18 trillion tokens, the model scores 57.7 on MATH and outperforms Qwen2-72B on comprehensive evaluations despite having fewer parameters. It excels at instruction following, multi-step reasoning, mathematics, coding assistance, and multilingual tasks across 29+ languages, with a 131K token context window and full tool-call support. A well-rounded choice for developers who need reliable general-purpose performance — complex enough for demanding workflows, light enough to keep inference costs manageable.

Chat

Qwen2.5-VL 72B Instruct

qwen/qwen2-5-vl-72b-instruct

Qwen2.5-VL 72B Instruct is Alibaba's flagship open-source vision-language model, matching state-of-the-art closed models like GPT-4o and Claude 3.5 Sonnet on multimodal tasks. The model excels at document understanding (96.4 on DocVQA), OCR (88.8 on OCRBench), and structured data extraction from invoices, forms, tables, and charts. On MMMU it scores 70.2, and across 21 benchmarks it outperforms Gemini 2.0 Flash, GPT-4o, and Claude 3.5 Sonnet on 13 of them. Video understanding extends to over one hour of footage with second-level event pinpointing, enabled by dynamic FPS sampling and absolute time encoding. The model also functions as a visual agent capable of computer and phone use. A strong choice for developers building document pipelines, OCR workflows, visual Q&A systems, or multimodal agents.

Chat

Qwen VL Max

qwen/qwen-vl-max

Qwen VL Max is Alibaba's most capable vision-language API model based on Qwen2.5-VL, offering superior image/video understanding, OCR, document analysis, and visual reasoning capabilities.

Chat

Qwen-Max

qwen/qwen-max

Qwen Max is Alibaba's most powerful proprietary API model, a large-scale MoE with hundreds of billions of parameters. It delivers top-tier performance in reasoning, coding, math, and multilingual tasks via Alibaba Cloud Model Studio.

Chat

Qwen-Plus

qwen/qwen-plus

Qwen Plus is a high-performance proprietary API model balancing capability and cost, suitable for complex tasks requiring strong reasoning and multilingual support. Available through Alibaba Cloud Model Studio.

Chat

Qwen VL Plus

qwen/qwen-vl-plus

Qwen VL Plus is a balanced vision-language API model offering good performance at lower cost, suitable for image understanding, OCR, and multimodal tasks without requiring maximum capability.

Frequently Asked Questions

What is this Qwen API about?

The Qwen API gives you access to models for AI chat and image generation. Through Puter.js, you can start using Qwen models instantly with zero setup or configuration.

Which Qwen models can I use?

Puter.js supports a variety of Qwen models, including Qwen3.6 Flash, Qwen3.5 Plus 2026-04-20, Qwen3.6 27B, and more. Find all AI models supported by Puter.js in the AI model list.

How much does it cost?

With the User-Pays model, users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

What is Puter.js?

Puter.js is a JavaScript library that provides access to AI, storage, and other cloud services directly from a single API. It handles authentication, infrastructure, and scaling so you can focus on building your app.

Does this work with React / Vue / Vanilla JS / Node / etc.?

Yes — the Qwen API through Puter.js works with any JavaScript framework, Node.js, or plain HTML. Just include the library and start building. See the documentation for more details.

Qwen API

List of Qwen Models

Qwen3.6 Flash

Qwen3.5 Plus 2026-04-20

Qwen3.6 27B

Qwen3.6 Max Preview

Qwen3.6 35B A3B

Qwen3.6 Plus

Qwen3.6 Plus Preview

Qwen3.5-9B

Qwen Image 2.0

Qwen Image 2.0 Pro

Qwen3.5-Flash

Qwen3.5-122B-A10B

Qwen3.5-27B

Qwen3.5-35B-A3B

Qwen3.5 Plus 02-15

Qwen3.5 Plus

Qwen3.5 397B A17B

Qwen3 Max Thinking

Qwen3 Coder Next

Qwen3 Max

Qwen3-VL Plus

Qwen3-Omni Flash

Qwen3 Next 80B A3B Instruct

Qwen3 Next 80B A3B Thinking

Qwen Image

Qwen Plus 0728

Qwen Plus 0728 (thinking)

Qwen3 235B A22B Instruct 2507

Qwen3 235B A22B Thinking 2507

Qwen3 30B A3B Instruct 2507

Qwen3 30B A3B Thinking 2507

Qwen3 Coder Flash

Qwen Flash

Qwen3 Coder Plus

Qwen3 VL 235B A22B Thinking

Qwen3 VL 30B A3B Instruct

Qwen3 VL 30B A3B Thinking

Qwen3 VL 32B Instruct

Qwen3 VL 8B Instruct

Qwen3 VL 8B Thinking

Qwen3 30B A3B

Qwen3 4B

Qwen3 14B

Qwen3 235B A22B

Qwen3 32B

Qwen3 8B

Qwen3 Coder 480B A35B

Qwen3 Coder 30B A3B Instruct

Qwen3 VL 235B A22B Instruct

Qwen3-VL 30B-A3B

QVQ Max

Qwen2.5 VL 32B Instruct

QwQ 32B

QwQ Plus

Qwen2.5 VL 72B Instruct

Qwen-Omni Turbo

Qwen-MT Plus

Qwen-MT Turbo

Qwen2.5-Omni 7B

Qwen2.5 Coder 32B Instruct

Qwen-Turbo

Qwen-VL OCR

Qwen2.5 Coder 7B Instruct

Qwen2.5 72B Instruct

Qwen2.5 7B Instruct

Qwen2.5-VL 7B Instruct

Qwen2.5 14B Instruct

Qwen2.5 32B Instruct

Qwen2.5-VL 72B Instruct

Qwen VL Max

Qwen-Max

Qwen-Plus

Qwen VL Plus

Frequently Asked Questions

Related Resources

Free, Unlimited Qwen API

Getting Started with Puter.js

Free, Unlimited Gemini API