Puter AI Models

Puter AI Models https://developer.puter.com/ai/models/ New and updated AI models available on Puter en-us Thu, 28 May 2026 00:00:00 GMT Anthropic: Claude Opus 4.8 https://developer.puter.com/ai/anthropic/claude-opus-4-8/ https://developer.puter.com/ai/anthropic/claude-opus-4-8/ Thu, 28 May 2026 00:00:00 GMT Claude Opus 4.8 is Anthropic's latest flagship AI model, building on Opus 4.7 with improvements across coding, reasoning, and knowledge work — at the same price point. The model excels at complex, multi-step agentic workflows including coding, financial analysis, legal reasoning, and browser-based computer use. Early testers report it outperforms both prior Opus models and GPT-5.5 across several agentic benchmarks. Notably, it's around four times less likely than Opus 4.7 to let flaws in generated code pass without flagging them. Opus 4.8 supports configurable effort levels, letting developers trade quality against speed and token cost. A fast mode operates at 2.5× speed. Priced at $5/$25 per million input/output tokens, it targets teams running autonomous agents, analysis pipelines, and long-running async workflows where reliability and judgment are critical. Anthropic Anthropic: Claude Opus 4.8 Fast https://developer.puter.com/ai/anthropic/claude-opus-4.8-fast/ https://developer.puter.com/ai/anthropic/claude-opus-4.8-fast/ Wed, 27 May 2026 00:00:00 GMT Claude Opus 4.8 Fast is a high-speed configuration of Anthropic's latest flagship model, delivering up to 2.5x faster output token generation at significantly lower cost than previous fast variants. It runs the same Opus 4.8 model — which improves on Opus 4.7 in agentic coding (69.2% vs 64.3%), multidisciplinary reasoning with tools (57.9% vs 54.7%), and knowledge work — but optimized for lower latency. Fast mode pricing is $10/$50 per million input/output tokens, three times cheaper than fast mode for Opus 4.6 and 4.7 ($30/$150). It supports the full 1M token context window and 128k max output tokens. Choose Opus 4.8 Fast for latency-sensitive agentic pipelines, live coding sessions, and real-time workflows where throughput matters. For cost-sensitive or batch workloads, standard Opus 4.8 offers the same intelligence at half the price. Anthropic Qwen: Qwen3.7 Max https://developer.puter.com/ai/qwen/qwen3.7-max/ https://developer.puter.com/ai/qwen/qwen3.7-max/ Thu, 21 May 2026 00:00:00 GMT Qwen3.7 Max is Alibaba's flagship proprietary reasoning model, released in May 2026, built for long-horizon agentic workloads with a 1 million-token context window and a chain-of-thought reasoning architecture. It is purpose-built for complex, multi-step autonomous tasks. Alibaba demonstrated the model running for 35 hours without degradation, executing over 1,000 tool calls in a single session — making it a strong candidate for coding agents, automated pipelines, and deep document analysis. On benchmarks, it ranks 13th globally on LM Arena's text leaderboard and scores 56.6 on the Artificial Analysis Intelligence Index, making it the highest-ranked Chinese model on that index. It posted 90.2 on Arena-Hard v2 and 72.5 on SWE-Bench Verified. Qwen3.7 Max supports the Anthropic API protocol natively, so it integrates cleanly with tooling like Claude Code. It is well-suited for developers building coding assistants, research agents, or any API use case requiring extended reasoning over large contexts. Qwen xAI: Grok Build 0.1 https://developer.puter.com/ai/x-ai/grok-build-0.1/ https://developer.puter.com/ai/x-ai/grok-build-0.1/ Wed, 20 May 2026 00:00:00 GMT Grok Build 0.1 is xAI's fast coding model trained specifically for agentic software engineering workflows. Released in May 2026 and currently in early access, it is purpose-built for interactive coding agents, tool use, and multi-step development tasks rather than general-purpose conversation. The model accepts text and image inputs and produces text output, with a 256,000-token context window. It supports function calling, structured outputs, and built-in reasoning that is always active, enabling it to think through problems before responding. Developers building AI coding agents, automated code review pipelines, or multi-step development tools will find it a strong fit. At $1/M input and $2/M output tokens, it offers an accessible price point for agentic, high-throughput use cases. xAI Google: Gemini 3.5 Flash https://developer.puter.com/ai/google/gemini-3.5-flash/ https://developer.puter.com/ai/google/gemini-3.5-flash/ Tue, 19 May 2026 00:00:00 GMT Gemini 3.5 Flash is Google DeepMind's frontier-speed model that combines Flash-tier latency and cost with near-Pro-level reasoning, announced at Google I/O 2026. It processes output 4x faster than comparable frontier models while outperforming Gemini 3.1 Pro on coding and agentic benchmarks — 76.2% on Terminal-Bench 2.1, 83.6% on MCP Atlas, and 84.2% on CharXiv Reasoning. It's purpose-built for agentic workflows: orchestrating multi-step tool use, long-context document analysis, and iterative code generation. With a 1M token context window and full multimodal input support (text, image, audio, video, PDF), it handles complex real-world tasks at scale. At $1.50 per million input tokens and $9.00 per million output tokens, it's the best choice for developers who need frontier intelligence without frontier latency or cost. Google Perceptron: Perceptron Mk1 https://developer.puter.com/ai/perceptron/perceptron-mk1/ https://developer.puter.com/ai/perceptron/perceptron-mk1/ Tue, 12 May 2026 00:00:00 GMT Perceptron Mk1 is a vision-language model from Perceptron AI designed for video and embodied reasoning, capable of processing native video at up to 2 frames per second within its 32K context window. It excels at video QA, video summarization, event detection, open-vocabulary object detection and counting, OCR on real-world documents, hand pose estimation, and point-by-example grounding from multimodal prompts. On spatial reasoning benchmarks, Mk1 scores 85.1 on EmbSpatialBench and 72.4 on RefSpatialBench, outperforming frontier models. On VSI-Bench it reaches 88.5, the highest recorded score among compared models. At $0.15 per million input tokens, Mk1 is priced 80–90% below comparable frontier vision-language models, making it a strong choice for developers building video analysis, robotics data curation, or multimodal pipelines at scale. Perceptron Anthropic: Claude Opus 4.7 Fast https://developer.puter.com/ai/anthropic/claude-opus-4.7-fast/ https://developer.puter.com/ai/anthropic/claude-opus-4.7-fast/ Tue, 12 May 2026 00:00:00 GMT Claude Opus 4.7 Fast is a high-speed configuration of Anthropic's most capable model, delivering up to 2.5x faster output token generation with no reduction in quality or capabilities. It runs the same Opus 4.7 model — which scores 87.6% on SWE-bench Verified (up from 80.8% on Opus 4.6), 94.2% on GPQA Diamond, and 69.4% on Terminal-Bench 2.0 — but optimized for lower latency at premium pricing ($30/$150 per MTok). It supports the full 1M token context window and 128k max output tokens. Fast mode benefits are focused on output tokens per second, not time to first token. It is ideal for latency-sensitive agentic workflows, live coding sessions, and real-time tasks where response speed matters. For cost-sensitive or batch workloads, standard Opus 4.7 offers the same intelligence at lower cost. Anthropic InclusionAI: Ring 2.6 1T https://developer.puter.com/ai/inclusionai/ring-2.6-1t/ https://developer.puter.com/ai/inclusionai/ring-2.6-1t/ Fri, 08 May 2026 00:00:00 GMT Ring 2.6 1T is a trillion-parameter open-weights reasoning model from InclusionAI (Ant Group), released under the MIT license. It uses a Mixture-of-Experts architecture with approximately 63B active parameters per token and supports a 262K context window with up to 66K output tokens. The model offers adaptive reasoning effort through "high" and "xhigh" modes, letting developers tune thinking depth against token cost based on task complexity. It is purpose-built for agentic workflows, coding agents, tool use, and long-horizon multi-step task execution. Ring 2.6 1T scores 95.83 on AIME 2026, 88.27 on GPQA Diamond, and 87.60 on PinchBench in agent mode — surpassing GPT-5.4 and Gemini 3.1 Pro on that benchmark. A strong pick for developers building autonomous agent systems or complex reasoning pipelines. InclusionAI Baidu: Qianfan CoBuddy https://developer.puter.com/ai/baidu/cobuddy/ https://developer.puter.com/ai/baidu/cobuddy/ Wed, 06 May 2026 00:00:00 GMT CoBuddy is a code generation model from Baidu, released through the Qianfan platform and optimized for coding tasks and AI agent workflows. The model offers native support for both tool calling and reasoning, making it a strong fit for agentic use cases where the model needs to plan, invoke tools, and iterate. It provides a 131K token context window with up to 65K output tokens, giving it ample room for large codebases and extended generation. CoBuddy is engineered for high inference throughput and low end-to-end latency. It's a solid choice for developers building code-centric agents or assistive coding tools who need responsive performance alongside structured tool use. Baidu xAI: Grok 4.3 https://developer.puter.com/ai/x-ai/grok-4.3/ https://developer.puter.com/ai/x-ai/grok-4.3/ Fri, 01 May 2026 00:00:00 GMT Grok 4.3 is xAI's latest flagship reasoning model, designed for agentic workflows, instruction following, and tasks demanding high factual accuracy. It accepts text and image inputs with always-on reasoning that cannot be disabled. The model supports a 1 million token context window with no output token limit, making it well suited for long-document analysis and multi-step agent tasks. Priced at $1.25 per million input tokens and $2.50 per million output tokens, it delivers improved cost-efficiency over its predecessor Grok 4.20 — scoring higher on the Artificial Analysis Intelligence Index while costing roughly 20% less to run. Grok 4.3 showed a major jump in real-world agentic task performance, gaining over 300 Elo points on GDPval-AA versus Grok 4.20. It also scores 98% on τ²-Bench Telecom and 81% on IFBench. A strong pick for developers building cost-sensitive agent systems that need reliable tool use and instruction adherence. xAI IBM Granite: Granite 4.1 8B https://developer.puter.com/ai/ibm-granite/granite-4.1-8b/ https://developer.puter.com/ai/ibm-granite/granite-4.1-8b/ Thu, 30 Apr 2026 00:00:00 GMT IBM Granite 4.1 8B is a dense, decoder-only language model from IBM, built for enterprise workloads like tool calling, RAG, code generation, summarization, and classification. It supports a 131K-token context window and 12 languages including English, German, Spanish, French, Japanese, and Chinese. Despite its compact size, the 8B model matches or outperforms IBM's previous-generation 32B Mixture-of-Experts model across benchmarks — scoring 69.0 on ArenaHard, 68.3 on BFCL V3 (tool calling), and 92.5 on GSM8K. It implements OpenAI-compatible tool calling and supports fill-in-the-middle for code completion. Its dense architecture makes it straightforward to fine-tune for downstream tasks. Released under the Apache 2.0 license, it's a strong pick for developers who need reliable enterprise capabilities at an efficient parameter count. IBM Granite Mistral AI: Mistral Medium 3.5 https://developer.puter.com/ai/mistralai/mistral-medium-3-5/ https://developer.puter.com/ai/mistralai/mistral-medium-3-5/ Wed, 29 Apr 2026 00:00:00 GMT Mistral Medium 3.5 is a dense 128-billion-parameter multimodal model from Mistral AI that unifies instruction-following, reasoning, and coding into a single set of weights. It features a 256k-token context window, native function calling, structured JSON output, and vision capabilities via a custom-trained encoder that handles variable image sizes. A per-request reasoning_effort parameter lets you toggle between fast responses and deeper chain-of-thought processing, making the same model suitable for quick chat replies and complex agentic workflows. On benchmarks, it scores 77.6% on SWE-Bench Verified and 91.4% on τ³-Telecom. It replaces Mistral's previous Medium 3.1, Magistral, and Devstral 2 models. Priced at $1.50 per million input tokens and $7.50 per million output tokens, it's a strong fit for developers building tool-calling agents, long-horizon coding tasks, and multi-step automation pipelines. Mistral AI Poolside: Laguna M.1 https://developer.puter.com/ai/poolside/laguna-m.1/ https://developer.puter.com/ai/poolside/laguna-m.1/ Tue, 28 Apr 2026 00:00:00 GMT Laguna M.1 is Poolside's flagship agentic coding model, built for complex, long-horizon software engineering tasks. It's a 225B-parameter Mixture-of-Experts model with 23B activated parameters, offering a 128K context window and support for tool calling and reasoning. On SWE-bench Verified it scores 72.5%, and it reaches 46.9% on the harder SWE-bench Pro. These results place it in the same tier as far larger models like Qwen3.5 and DeepSeek V4-Flash while using a fraction of the active compute. Laguna M.1 is purpose-built for agentic workflows — writing code, running tests, inspecting failures, and iterating across files. If you need a model that can plan and execute multi-step engineering tasks end to end, this is Poolside's strongest option. Poolside Poolside: Laguna XS.2 https://developer.puter.com/ai/poolside/laguna-xs.2/ https://developer.puter.com/ai/poolside/laguna-xs.2/ Tue, 28 Apr 2026 00:00:00 GMT Laguna XS.2 is Poolside's compact, open-weight agentic coding model — a 33B-parameter Mixture-of-Experts architecture with only 3B activated parameters, released under the Apache 2.0 license. Despite its small footprint, it performs remarkably close to its larger sibling: 68.2% on SWE-bench Verified and 44.5% on SWE-bench Pro, nearly matching models many times its size. It supports the same agentic coding workflows as Laguna M.1, including tool use and multi-step reasoning. XS.2 is a second-generation model that incorporates lessons from M.1's training pipeline. It's a strong fit for developers who want a capable coding agent with low inference cost, or who need the flexibility of open weights for custom deployments. Poolside NVIDIA: Nemotron 3 Nano Omni https://developer.puter.com/ai/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning/ https://developer.puter.com/ai/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning/ Tue, 28 Apr 2026 00:00:00 GMT Nemotron 3 Nano Omni is an open multimodal model from NVIDIA that unifies text, image, video, and audio understanding in a single inference pass. Built on a 30B-parameter hybrid Mamba-Transformer Mixture-of-Experts architecture with only ~3B active parameters per token, it delivers strong reasoning at small-model inference costs. It tops six leaderboards across document intelligence (MMLongBench-Doc, OCRBenchV2), video and audio understanding (WorldSense, DailyOmni, VoiceBench), and achieves the highest throughput of any benchmarked model — open or closed — on MediaPerf's video tasks, with up to 9x higher throughput than comparable open omni models. Designed as a multimodal perception sub-agent in agentic systems, it excels at document reasoning, GUI-based computer use, speech transcription, and audio-video analysis — replacing fragmented multi-model pipelines with a single call. Supports up to 256K context with an optional reasoning mode. NVIDIA Qwen: Qwen3.6 Flash https://developer.puter.com/ai/qwen/qwen3.6-flash/ https://developer.puter.com/ai/qwen/qwen3.6-flash/ Mon, 27 Apr 2026 00:00:00 GMT Qwen3.6 Flash is the speed-optimized tier of Alibaba's Qwen3.6 model family, designed for high-throughput, low-latency inference pipelines. It sits alongside Qwen3.6 Max Preview, Plus, and 35B-A3B in the product lineup, targeting use cases where fast response times matter more than peak benchmark scores. Like other Qwen3.6 models, it builds on a hybrid architecture combining linear attention with sparse mixture-of-experts routing. It is best suited for high-volume production workloads such as classification, extraction, summarization, and lightweight agent tasks where latency and cost efficiency are the primary constraints. Qwen Qwen: Qwen3.5 Plus 2026-04-20 https://developer.puter.com/ai/qwen/qwen3.5-plus-20260420/ https://developer.puter.com/ai/qwen/qwen3.5-plus-20260420/ Mon, 27 Apr 2026 00:00:00 GMT Qwen3.5 Plus is a proprietary hosted model from Alibaba, built on the Qwen3.5-397B-A17B Mixture-of-Experts architecture with 397 billion total parameters and 17 billion active per token. Its headline feature is a 1-million-token native context window — among the largest available via API — making it well suited for processing entire codebases, long documents, or extended multi-turn conversations in a single request. It supports both a deep-thinking mode and an "Auto" mode that adaptively invokes tools like web search and code interpreters. This April 20, 2026 snapshot reflects ongoing improvements to the model since its original February 2026 launch. The Qwen3.5 series demonstrated strong multimodal performance across reasoning, coding, and vision tasks. A solid general-purpose option for developers needing large-context capabilities without migrating to the newer Qwen3.6 line. Qwen DeepSeek: DeepSeek V4 Flash https://developer.puter.com/ai/deepseek/deepseek-v4-flash/ https://developer.puter.com/ai/deepseek/deepseek-v4-flash/ Fri, 24 Apr 2026 00:00:00 GMT DeepSeek V4 Flash is a lightweight, efficiency-focused Mixture-of-Experts model from DeepSeek, with 284B total parameters and 13B activated per token. It supports a 1M-token context window and configurable reasoning modes (standard, high, and max thinking effort). Designed as the fast and economical option in the V4 family, Flash delivers reasoning capabilities that closely approach the larger V4 Pro, and performs on par with it on simpler agentic tasks. In its max reasoning mode, it achieves comparable reasoning scores to Pro when given a larger thinking budget. At $0.14/M input and $0.28/M output tokens, it's one of the cheapest frontier-tier models available — well suited for high-throughput workloads like coding assistants, chat systems, and agent pipelines where latency and cost matter most. DeepSeek DeepSeek: DeepSeek V4 Pro https://developer.puter.com/ai/deepseek/deepseek-v4-pro/ https://developer.puter.com/ai/deepseek/deepseek-v4-pro/ Fri, 24 Apr 2026 00:00:00 GMT DeepSeek V4 Pro is a 1.6T-parameter Mixture-of-Experts model from DeepSeek with 49B parameters activated per token, supporting a 1M-token context window. It is positioned as the strongest open-weight model currently available. V4 Pro leads all open-source models in math, coding, and STEM reasoning. On LiveCodeBench it scores 93.5, ahead of Gemini 3.1 Pro (91.7) and Claude Opus 4.6 (88.8). Its Codeforces rating of 3206 also tops GPT-5.4 (3168). On agentic tool-use benchmarks like MCPAtlas, it reaches near-parity with Opus 4.6. DeepSeek acknowledges it trails GPT-5.4 and Gemini 3.1 Pro overall by roughly 3–6 months of frontier development. Priced at $1.74/M input and $3.48/M output — a fraction of comparable closed-source models — it's a strong pick for complex reasoning, agentic coding, and knowledge-intensive tasks. DeepSeek OpenAI: GPT-5.5 https://developer.puter.com/ai/openai/gpt-5.5/ https://developer.puter.com/ai/openai/gpt-5.5/ Thu, 23 Apr 2026 00:00:00 GMT GPT-5.5 is OpenAI's newest frontier model for complex professional work — a fully retrained base model in the GPT-5 family. It excels at agentic coding, computer use, knowledge work, and scientific research, and is designed to plan, use tools, and carry multi-step tasks to completion autonomously. It achieves state-of-the-art scores on several benchmarks, including 82.7% on Terminal-Bench 2.0 (agentic coding workflows), 84.9% on GDPval (knowledge work across 44 occupations), and 78.7% on OSWorld-Verified (desktop computer use). It outperforms Claude Opus 4.7 and Gemini 3.1 Pro on most agentic and math benchmarks, though Opus 4.7 still leads on SWE-Bench Pro. Available with a 128,000 max output tokens. Supports image input, structured outputs, function calling, streaming, prompt caching, Batch, distillation, and a full Responses API tool suite including web search, computer use, MCP, hosted shell, and tool search. Defaults to medium reasoning effort (supports none through xhigh). OpenAI OpenAI: GPT-5.5 Pro https://developer.puter.com/ai/openai/gpt-5.5-pro/ https://developer.puter.com/ai/openai/gpt-5.5-pro/ Thu, 23 Apr 2026 00:00:00 GMT GPT-5.5 Pro is a version of GPT-5.5 that uses more compute to produce smarter, more precise responses on the hardest problems. It scores 39.6% on FrontierMath Tier 4 (expert-level mathematics) and 43.1% on Humanity's Last Exam (multidisciplinary zero-shot reasoning). Supports function calling, structured outputs, web search, file search, image generation, code interpreter, hosted shell, and MCP. Does not support computer use, apply patch, skills, tool search, or distillation. Shares the same 1,050,000-token context window and 128,000 max output tokens as GPT-5.5. Best suited for legal review, financial modeling, scientific research, and scenarios where first-pass accuracy outweighs cost and latency. OpenAI InclusionAI: Ling 2.6 1T https://developer.puter.com/ai/inclusionai/ling-2.6-1t/ https://developer.puter.com/ai/inclusionai/ling-2.6-1t/ Thu, 23 Apr 2026 00:00:00 GMT Ling 2.6 1T is InclusionAI's trillion-parameter flagship non-reasoning model, built by Ant Group's AGI initiative. It uses a Mixture-of-Experts architecture with approximately 50 billion active parameters per token, employing a "fast thinking" approach that reduces token costs to roughly a quarter of comparable models while maintaining top-tier output quality. The model targets advanced coding, complex reasoning, and large-scale agent workflows. It achieves state-of-the-art results on benchmarks like AIME 2025 and SWE-bench Verified, and ranks first among open-source models on ArtifactsBench for front-end code generation. On the Artificial Analysis Intelligence Index, it scores 34 — far above the median of 13 for comparable open-weight non-reasoning models. With a 262K context window and strong tool-use capabilities out of the box, Ling 2.6 1T is a strong fit for developers building autonomous agents or cost-sensitive pipelines that need flagship-level reasoning without a dedicated thinking model. InclusionAI Xiaomi: MiMo-V2.5 https://developer.puter.com/ai/xiaomi/mimo-v2.5/ https://developer.puter.com/ai/xiaomi/mimo-v2.5/ Wed, 22 Apr 2026 00:00:00 GMT MiMo V2.5 is a native omnimodal model from Xiaomi that processes text, images, video, and audio within a single architecture and a 1M-token context window. It delivers agentic performance close to its larger sibling, MiMo V2.5 Pro, at roughly half the token cost — scoring 62.3 on ClawEval (general) and 23.8 on ClawEval Multimodal. On video understanding, it reaches 87.7 on Video-MME, competitive with Gemini 3 Pro. Image understanding benchmarks include 81.0 on CharXiv RQ and 77.9 on MMMU-Pro. Priced at $0.40 per million input tokens and $2.00 per million output tokens, MiMo V2.5 is a strong fit for production agent pipelines where you need multimodal perception and reasoning without flagship-tier cost. Xiaomi Xiaomi: MiMo-V2.5-Pro https://developer.puter.com/ai/xiaomi/mimo-v2.5-pro/ https://developer.puter.com/ai/xiaomi/mimo-v2.5-pro/ Wed, 22 Apr 2026 00:00:00 GMT MiMo V2.5 Pro is Xiaomi's most capable model, built for complex software engineering, long-horizon agentic tasks, and autonomous multi-step workflows spanning over a thousand tool calls. It scores 57.2 on SWE-bench Pro, 63.8 on ClawEval, and 72.9 on τ3-Bench — placing it alongside Claude Opus 4.6 and GPT-5.4 across most agentic evaluations. Notably, it achieves this while using roughly 40–60% fewer tokens per trajectory than comparable frontier models. The 1M-token context window and 131K max output support entire codebases and extended autonomous sessions. Priced at $1.00 per million input tokens and $3.00 per million output tokens, MiMo V2.5 Pro targets developers building autonomous agents, code-generation pipelines, and complex tool-use workflows where sustained coherence over long contexts is critical. Xiaomi Tencent: Hy 3 Preview https://developer.puter.com/ai/tencent/hy3-preview/ https://developer.puter.com/ai/tencent/hy3-preview/ Wed, 22 Apr 2026 00:00:00 GMT Tencent Hy3 is a 295B-parameter Mixture-of-Experts reasoning model developed by Tencent's Hunyuan team, with only 21B parameters active per query. It supports a 256K-token context window and configurable reasoning levels (disabled, low, high), letting you trade off latency and depth per request. Hy3 is particularly strong on coding and agentic tasks. It scores 74.4% on SWE-bench Verified for real-world bug fixing and 67.1% on BrowseComp for complex web research. Its MoE architecture delivers competitive performance against much larger models — matching Kimi-K2.5 (1T+ parameters) on agent benchmarks at a fraction of the compute cost. Best suited for developers building agentic workflows, code generation pipelines, and multi-step reasoning applications where cost-efficiency matters. Tencent Qwen: Qwen3.6 27B https://developer.puter.com/ai/qwen/qwen3.6-27b/ https://developer.puter.com/ai/qwen/qwen3.6-27b/ Wed, 22 Apr 2026 00:00:00 GMT Qwen3.6 27B is a dense 27-billion-parameter multimodal model from Alibaba's Qwen team, purpose-built for agentic coding and repository-level reasoning. It scores 77.2% on SWE-bench Verified and 59.3% on Terminal-Bench 2.0, outperforming the previous-generation Qwen3.5-397B-A17B across all major coding benchmarks despite being far smaller. It natively supports text, image, and video inputs with a 262K-token context window, extendable to 1M tokens. A standout feature is Thinking Preservation, which retains reasoning traces across conversation turns — reducing redundant computation in multi-step agent loops. The model uses a hybrid attention architecture combining Gated DeltaNet with traditional self-attention. Ideal for developers building coding agents, multi-turn tool-use workflows, or frontend generation pipelines. Qwen OpenAI: GPT Image 2 https://developer.puter.com/ai/openai/gpt-image-2/ https://developer.puter.com/ai/openai/gpt-image-2/ Tue, 21 Apr 2026 00:00:00 GMT GPT Image 2 is OpenAI's state-of-the-art image generation and editing model, released in April 2026 as the successor to GPT Image 1. It accepts both text and image inputs, enabling generation from prompts as well as editing of existing images. The model is particularly strong at rendering text within images — signs, UI elements, labels, and multi-word strings — which was a persistent weakness in prior OpenAI image models. It also supports non-Latin scripts including Japanese, Korean, Chinese, Hindi, and Bengali. GPT Image 2 outputs up to 2K resolution, supports flexible image sizes, and can generate multiple images from a single prompt. It features a December 2025 knowledge cutoff and built-in reasoning capabilities. Ideal for developers building visual content pipelines, localized marketing assets, infographics, or UI mockups at scale. OpenAI InclusionAI: Ling 2.6 Flash https://developer.puter.com/ai/inclusionai/ling-2.6-flash/ https://developer.puter.com/ai/inclusionai/ling-2.6-flash/ Tue, 21 Apr 2026 00:00:00 GMT Ling 2.6 Flash is a high-efficiency open-weights instruct model from InclusionAI (Ant Group), featuring 104B total parameters with only 7.4B active via a Mixture-of-Experts architecture. It supports a 262K-token context window and is purpose-built for agentic workflows, coding, and document processing. The model scores 26 on the Artificial Analysis Intelligence Index — nearly double the median of 13 among comparable open-weight non-reasoning models, and a 10-point jump over its predecessor Ling-flash-2.0. It also achieves 59.3% on GPQA Diamond. Trained with Agentic Reinforcement Learning, Ling 2.6 Flash is optimized for tool use, terminal operations, and multi-step agent tasks while keeping token consumption notably low. A strong choice for developers building cost-sensitive agent pipelines or high-throughput automation that still demands capable reasoning and code generation. InclusionAI Qwen: Qwen3.6 Max Preview https://developer.puter.com/ai/qwen/qwen3.6-max-preview/ https://developer.puter.com/ai/qwen/qwen3.6-max-preview/ Mon, 20 Apr 2026 00:00:00 GMT Qwen3.6 Max Preview is Alibaba's most capable language model to date — a proprietary flagship that claimed the top score on six major coding benchmarks at its April 20, 2026 release. It leads on SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, and SciCode. The Artificial Analysis Intelligence Index rates it at 52, well above the median for reasoning models in its price tier. It supports a 256K-token context window and is text-only at launch. As a preview release, Alibaba is still actively iterating on the model. Best suited for teams building coding agents, scientific computing tools, or frontend generation systems that need peak benchmark performance. Qwen Moonshot AI: Kimi K2.6 https://developer.puter.com/ai/moonshotai/kimi-k2.6/ https://developer.puter.com/ai/moonshotai/kimi-k2.6/ Mon, 20 Apr 2026 00:00:00 GMT Kimi K2.6 is Moonshot AI's latest open-weight multimodal model, built on a 1-trillion-parameter mixture-of-experts architecture with a 256K context window. It excels at agentic coding and long-horizon execution, supporting sustained autonomous workflows with 4,000+ tool calls across languages like Rust, Go, and Python. On key benchmarks, it scores 58.6 on SWE-Bench Pro, 54.0 on HLE with Tools, and 50.0 on Toolathlon — competitive with GPT-5.4 and Claude Opus 4.6 on coding and agent tasks, though trailing them on pure reasoning. The model accepts text, image, and video input, supports both thinking and non-thinking modes, and offers an OpenAI-compatible API. It's a strong pick for developers building multi-step agentic workflows and complex software engineering pipelines. Moonshot AI Baidu: Qianfan OCR Fast https://developer.puter.com/ai/baidu/qianfan-ocr-fast/ https://developer.puter.com/ai/baidu/qianfan-ocr-fast/ Mon, 20 Apr 2026 00:00:00 GMT Qianfan OCR Fast is a document intelligence model from Baidu's Qianfan team, purpose-built for optical character recognition tasks. It is an upgraded variant of the base Qianfan-OCR, trained on specialized OCR data while retaining general multimodal capabilities. The underlying Qianfan-OCR architecture is a 4B-parameter end-to-end vision-language model that replaces traditional multi-stage OCR pipelines with a single model handling document parsing, layout analysis, table extraction, chart understanding, key information extraction, and document QA. It performs direct image-to-Markdown conversion and supports 192 languages. The base model scored 93.12 on OmniDocBench v1.5 and 79.8 on OlmOCR Bench, leading all end-to-end models on both. Qianfan OCR Fast offers a 65K-token context window and is well suited for developers building document processing pipelines — invoice parsing, report extraction, exam grading, or RAG over scanned documents. Baidu Qwen: Qwen3.6 35B A3B https://developer.puter.com/ai/qwen/qwen3.6-35b-a3b/ https://developer.puter.com/ai/qwen/qwen3.6-35b-a3b/ Fri, 17 Apr 2026 00:00:00 GMT Qwen3.6 35B A3B is a sparse Mixture-of-Experts model with 35 billion total parameters but only 3 billion active per token, making it highly efficient for inference. Developed by Alibaba's Qwen team, it scores 73.4% on SWE-bench Verified and 51.5% on Terminal-Bench 2.0 — significantly outperforming dense models like Gemma 4-31B (52.0% on SWE-bench Verified). It natively handles text, image, and video with a 262K-token context window, extendable to 1M tokens. The model supports Thinking Preservation for stable multi-turn reasoning and includes native tool-calling capabilities. Released under Apache 2.0, it was the first open-weight model in the Qwen3.6 family. A strong choice for developers who want frontier-adjacent coding performance at a fraction of the compute cost of larger models. Qwen Anthropic: Claude Opus 4.7 https://developer.puter.com/ai/anthropic/claude-opus-4-7/ https://developer.puter.com/ai/anthropic/claude-opus-4-7/ Thu, 16 Apr 2026 00:00:00 GMT Claude Opus 4.7 is Anthropic's most capable generally available model, built for complex reasoning and agentic coding. It offers a step-change improvement in long-horizon agentic work over its predecessor, Opus 4.6, along with strong gains in knowledge work, vision, and file-system-based memory. The model supports a 1M-token context window, 128k max output tokens, and adaptive thinking. It introduces high-resolution image input (up to 2576px / 3.75MP), a new `xhigh` effort level for demanding coding tasks, and task budgets (beta) that let the model self-moderate token usage across an agentic loop. Priced at $5 / $25 per million input/output tokens. Best suited for developers building autonomous agents, multi-step coding workflows, and vision-heavy pipelines where reliability and depth of reasoning matter most. Anthropic Z.AI: GLM 5.1 https://developer.puter.com/ai/z-ai/glm-5.1/ https://developer.puter.com/ai/z-ai/glm-5.1/ Tue, 07 Apr 2026 00:00:00 GMT GLM-5.1 is a frontier-class reasoning model from Z.ai (formerly Zhipu AI), built as a post-training refinement of GLM-5 with a focus on coding and agentic tasks. It uses a 744B-parameter Mixture-of-Experts architecture with 40B active parameters per token and supports a 200K context window. GLM-5.1 scored 58.4 on SWE-Bench Pro, surpassing GPT-5.4 (57.7) and Claude Opus 4.6 (57.3), and reached 95.3 on AIME 2026. It excels at long-horizon agentic workflows, multi-step tool use, and complex software engineering tasks. The model is text-only — no image or audio input. Z.AI Anthropic: Claude Opus 4.6 Fast https://developer.puter.com/ai/anthropic/claude-opus-4.6-fast/ https://developer.puter.com/ai/anthropic/claude-opus-4.6-fast/ Tue, 07 Apr 2026 00:00:00 GMT Claude Opus 4.6 Fast is a high-speed configuration of Anthropic's most intelligent model, delivering up to 2.5x faster output token generation with no reduction in quality or capabilities. It runs the same Opus 4.6 model — state-of-the-art on benchmarks like Terminal-Bench 2.0 for agentic coding, Humanity's Last Exam for multidisciplinary reasoning, and GDPval-AA for professional knowledge work — but optimized for lower latency at premium pricing ($30/$150 per MTok). It supports the full 1M token context window and 128k max output tokens. Fast mode is ideal for latency-sensitive, interactive workflows such as rapid iteration, live debugging, and real-time agentic tasks where waiting on responses breaks your flow. For cost-sensitive or batch workloads, standard Opus 4.6 offers the same intelligence at lower cost. Anthropic Wan AI: Wan 2.7 Text-to-Video https://developer.puter.com/ai/wan-ai/wan2.7-t2v/ https://developer.puter.com/ai/wan-ai/wan2.7-t2v/ Fri, 03 Apr 2026 00:00:00 GMT Wan 2.7 Text-to-Video is a diffusion-based video generation model from Alibaba, designed to produce cinematic video clips directly from text prompts. It generates native 720p and 1080p video with durations from 2 to 15 seconds, supporting flexible aspect ratios including 16:9, 9:16, and 1:1. A standout feature is optional audio input, which synchronizes character motion and lip movement to a provided audio track during generation. The model responds well to detailed, structured prompts and supports multi-shot narrative control through prompt language alone. It's part of a broader four-model suite that includes image-to-video, reference-to-video, and video editing capabilities. Best suited for marketing content, social media clips, film pre-visualization, and any production pipeline that needs programmatic access to high-quality video generation. Wan AI Google: Gemma 4 26B A4B https://developer.puter.com/ai/google/gemma-4-26b-a4b-it/ https://developer.puter.com/ai/google/gemma-4-26b-a4b-it/ Fri, 03 Apr 2026 00:00:00 GMT Gemma 4 26B A4B is a Mixture-of-Experts (MoE) open model from Google DeepMind, built from the same research as Gemini 3. It has 26B total parameters but activates only 3.8B per forward pass, delivering near-31B-dense quality at a fraction of the compute cost. The model supports a 256K token context window, multimodal image and text input, built-in step-by-step reasoning (thinking mode), and native function calling for agentic workflows. It currently ranks #6 among open models on the Arena AI text leaderboard with an estimated LMArena score of 1441 — competitive with models many times its active size. It excels at reasoning, coding, long-context tasks, and structured tool use. It's a strong pick for developers who need high throughput and low latency without sacrificing capability. Google Qwen: Qwen3.6 Plus https://developer.puter.com/ai/qwen/qwen3.6-plus/ https://developer.puter.com/ai/qwen/qwen3.6-plus/ Thu, 02 Apr 2026 00:00:00 GMT Qwen 3.6 Plus is Alibaba's flagship large language model, built on a hybrid architecture combining linear attention with sparse mixture-of-experts routing for high throughput and scalability. It's optimized for agentic coding and complex multi-step workflows. On Terminal-Bench 2.0, it scores 61.6, surpassing Claude 4.5 Opus (59.3), while its 78.8 on SWE-bench Verified places it close behind. It also leads on MCPMark (48.2%) for tool-calling reliability. A native multimodal model, it handles text, images, and documents within a 1M-token context window with up to 65K output tokens. Notable features include always-on chain-of-thought reasoning, native function calling, and a preserve_thinking parameter that retains reasoning across multi-turn agent loops. A strong fit for developers building AI coding agents, terminal automation, and tool-using pipelines. Qwen Google: Gemma 4 31B https://developer.puter.com/ai/google/gemma-4-31b-it/ https://developer.puter.com/ai/google/gemma-4-31b-it/ Thu, 02 Apr 2026 00:00:00 GMT Gemma 4 31B is a dense multimodal model from Google DeepMind, built on the same research foundation as Gemini 3. It is the most capable model in the Gemma 4 family, accepting text, image, and video input with a 256K-token context window. It delivers strong benchmark results: 89.2% on AIME 2026, 85.2% on MMLU Pro, 80.0% on LiveCodeBench v6, and 84.3% on GPQA Diamond. On the Arena AI text leaderboard, it ranks as the #3 open model globally, outperforming many models with far higher parameter counts. Gemma 4 31B features native function calling trained into the model, configurable chain-of-thought reasoning, and structured JSON output — making it especially well-suited for agentic workflows, coding tasks, and multi-turn tool use. It supports over 140 languages and serves as a strong foundation for fine-tuning. Google Z.AI: GLM 5V Turbo https://developer.puter.com/ai/z-ai/glm-5v-turbo/ https://developer.puter.com/ai/z-ai/glm-5v-turbo/ Wed, 01 Apr 2026 00:00:00 GMT GLM-5V-Turbo is Z.ai's (Zhipu AI) native multimodal coding model, designed to bridge visual perception and code generation in a single architecture. It processes images, video, and text natively and is optimized for agentic workflows — turning design mockups, screenshots, and UI layouts into runnable code. The model scores 94.8 on the Design2Code benchmark (vs. Claude Opus 4.6's 77.3) and leads on GUI agent benchmarks like AndroidWorld and WebVoyager. It also outperforms Claude Opus 4.5 on BrowseComp for agentic browsing tasks. Built on a 744B-parameter MoE architecture (40B active per token) with a ~200K context window. Trained with reinforcement learning across 30+ task types to maintain strong text-only coding alongside its vision strengths. Best suited for design-to-code generation, GUI automation, and vision-grounded agentic development. Z.AI Arcee AI: Trinity Large Thinking https://developer.puter.com/ai/arcee-ai/trinity-large-thinking/ https://developer.puter.com/ai/arcee-ai/trinity-large-thinking/ Wed, 01 Apr 2026 00:00:00 GMT Trinity Large Thinking is a 398-billion-parameter sparse Mixture-of-Experts reasoning model from Arcee AI, with approximately 13B active parameters per token, post-trained with extended chain-of-thought and agentic reinforcement learning. It generates explicit reasoning traces in thinking blocks before final responses, and its 262K context window accommodates long agentic reasoning chains. Benchmark results include 94.7% on τ²-Bench and 98.2% on LiveCodeBench, placing it at #2 on PinchBench behind only Claude Opus 4.6. Released under Apache 2.0, Trinity Large Thinking is the strongest option in the Trinity family for agentic pipelines, long-horizon planning, complex multi-step coding, and tasks that benefit from transparent reasoning traces. Arcee AI xAI: Grok 4.20 https://developer.puter.com/ai/x-ai/grok-4.20/ https://developer.puter.com/ai/x-ai/grok-4.20/ Tue, 31 Mar 2026 00:00:00 GMT Grok 4.20 is xAI's flagship large language model, offering a rare combination of low hallucination rates and high throughput at competitive pricing. It achieved a record 78% non-hallucination rate on the Artificial Analysis Omniscience benchmark — the highest of any model tested — making it a strong choice for applications where factual reliability matters more than peak reasoning scores. It scored 78.5% on GPQA Diamond and 87.3% on MATH-500. The model supports a 2M-token context window, text and image inputs, parallel function calling, structured outputs, and built-in web search. Reasoning can be toggled on or off per request via API parameter. At $2 per million input tokens and $6 per million output tokens, it's one of the most affordable frontier models available, with output speeds exceeding 230 tokens per second. xAI xAI: Grok 4.20 Multi-Agent https://developer.puter.com/ai/x-ai/grok-4.20-multi-agent/ https://developer.puter.com/ai/x-ai/grok-4.20-multi-agent/ Tue, 31 Mar 2026 00:00:00 GMT Grok 4.20 Multi-Agent is a variant of xAI's Grok 4.20 purpose-built for orchestrating multiple AI agents that collaborate on complex, multi-step tasks in real time. Rather than relying on a single inference pass, it coordinates parallel agents that independently search, analyze, and cross-reference information before synthesizing a final response. At low or medium reasoning effort it runs 4 agents; at high or extra-high effort it scales to 16. It scored a 68.7 agentic index on Artificial Analysis — among the highest available. The model shares Grok 4.20's 2M-token context window and natively supports web search, X search, and tool orchestration. It generates up to 2M output tokens per response, making it well suited for deep research workflows, multi-source analysis, and long-running agent pipelines. xAI Google: Veo 3.1 Lite https://developer.puter.com/ai/google/veo-3.1-lite/ https://developer.puter.com/ai/google/veo-3.1-lite/ Tue, 31 Mar 2026 00:00:00 GMT Veo 3.1 Lite is Google DeepMind's most cost-effective video generation model, built for high-volume applications where per-clip cost is a primary concern. It generates video at the same speed as Veo 3.1 Fast but at less than half the price — starting at $0.05 per second for 720p. The model supports text-to-video and image-to-video with 720p and 1080p output in landscape (16:9) or portrait (9:16), at configurable durations of 4, 6, or 8 seconds. It does not support 4K output, scene extension, or native audio generation — clips are silent by default. Veo 3.1 Lite is ideal for developers building batch video pipelines, social media automation, or interactive tools where cost per generation matters most and audio can be added in post-production. Google Qwen: Qwen3.6 Plus Preview https://developer.puter.com/ai/qwen/qwen3.6-plus-preview/ https://developer.puter.com/ai/qwen/qwen3.6-plus-preview/ Mon, 30 Mar 2026 00:00:00 GMT Qwen 3.6 Plus Preview is a next-generation large language model from Alibaba's Qwen team, built on a hybrid architecture designed for improved efficiency and scalability. Released as an early preview in March 2026, it succeeds the Qwen 3.5 Plus series with stronger reasoning and more reliable agentic behavior. The model offers a 1-million-token context window and up to 65,536 output tokens, making it well suited for processing large codebases, lengthy documents, or multi-step workflows in a single request. It supports tool use and function calling natively, with built-in chain-of-thought reasoning that is always active. Qwen 3.6 Plus Preview is particularly strong in agentic coding, front-end component generation, and complex problem-solving. It's a good fit for developers building AI-driven code review tools, multi-step agents, or applications that benefit from deep reasoning over large inputs. Qwen KwaiPilot: KAT-Coder-Pro V2 https://developer.puter.com/ai/kwaipilot/kat-coder-pro-v2/ https://developer.puter.com/ai/kwaipilot/kat-coder-pro-v2/ Fri, 27 Mar 2026 00:00:00 GMT KAT-Coder-Pro V2 is the flagship agentic coding model from Kwaipilot (Kuaishou's AI research division), built for enterprise-grade software engineering and SaaS integration. It uses a Mixture-of-Experts architecture with 72B active parameters and offers a 256K token context window. The model achieves a 79.6% solve rate on SWE-Bench Verified, placing it among the top code generation models globally. It scores 44 on the Artificial Analysis Intelligence Index, well above the median of 15 for comparable non-reasoning models in its price tier, and generates output at roughly 109 tokens per second. KAT-Coder-Pro V2 is designed for large-scale production environments, multi-system coordination, and agentic coding workflows. It also supports tool use, function calling, and web aesthetics generation for producing landing pages and presentation decks. KwaiPilot Reka AI: Reka Edge https://developer.puter.com/ai/rekaai/reka-edge/ https://developer.puter.com/ai/rekaai/reka-edge/ Fri, 20 Mar 2026 00:00:00 GMT Reka Edge is a 7B multimodal vision-language model that processes text, image, and video inputs with industry-leading performance in its size class for visual reasoning, object detection, and agentic tool-use. It features a ConvNeXt V2 vision encoder that extracts only 64 tokens per image tile, enabling exceptionally fast and low-latency inference ideal for real-time applications like robotics, automotive, and augmented reality. It demonstrates frontier-level tool-calling abilities and strong temporal video reasoning, outperforming comparable models on benchmarks like MLVU, MMVU, and RefCOCO. Reka AI OpenAI: GPT-5.4 Nano https://developer.puter.com/ai/openai/gpt-5.4-nano/ https://developer.puter.com/ai/openai/gpt-5.4-nano/ Thu, 19 Mar 2026 00:00:00 GMT GPT-5.4 Nano is the smallest and cheapest model in the GPT-5.4 family, offering a 400k context window at just $0.20/1M input tokens. It excels at classification, data extraction, ranking, and coding sub-agent tasks, outperforming the previous GPT-5 Mini on SWE-Bench Pro (52.4% vs 45.7%). It's ideal for high-volume, low-latency workloads and as a fast sub-agent in multi-model architectures. OpenAI Xiaomi: MiMo-V2-Omni https://developer.puter.com/ai/xiaomi/mimo-v2-omni/ https://developer.puter.com/ai/xiaomi/mimo-v2-omni/ Wed, 18 Mar 2026 00:00:00 GMT MiMo V2 Omni is Xiaomi's omni-modal foundation model that natively processes text, image, video, and audio within a unified architecture, combining multimodal perception with agentic capabilities like visual grounding, multi-step planning, and tool use. It supports over 10 hours of continuous audio understanding and a 256K context window. It outperformed Gemini 3 Pro and GPT-5.2 on several benchmarks. Xiaomi Xiaomi: MiMo-V2-Pro https://developer.puter.com/ai/xiaomi/mimo-v2-pro/ https://developer.puter.com/ai/xiaomi/mimo-v2-pro/ Wed, 18 Mar 2026 00:00:00 GMT MiMo V2 Pro is Xiaomi's flagship text-only reasoning model built for the 'agent era,' featuring over 1T total parameters (42B active) with a 1M-token context window, deeply optimized for agentic workflows like coding, tool calling, and task orchestration. Previously tested anonymously as 'Hunter Alpha' on OpenRouter where it topped daily API call charts, it ranks 8th globally and 2nd among Chinese LLMs on the Artificial Analysis Intelligence Index. Its agent performance approaches Claude Opus 4.6 at roughly one-fifth the cost. Xiaomi MiniMax: MiniMax M2.7 https://developer.puter.com/ai/minimax/minimax-m2.7/ https://developer.puter.com/ai/minimax/minimax-m2.7/ Wed, 18 Mar 2026 00:00:00 GMT MiniMax M2.7 is a proprietary reasoning LLM from Chinese AI startup MiniMax, released on March 18, 2026, notable for being one of the first commercial models to actively participate in its own training through autonomous self-evolution loops. It excels at agentic coding workflows with a 56.2% score on SWE-Pro and strong performance in office productivity tasks, scoring the highest ELO (1495) on GDPval-AA among open-source-tier models. It targets developers building complex agent systems and automated workflows. MiniMax Google: Gemini 3.1 Flash-Lite https://developer.puter.com/ai/google/gemini-3.1-flash-lite/ https://developer.puter.com/ai/google/gemini-3.1-flash-lite/ Wed, 18 Mar 2026 00:00:00 GMT Gemini 3.1 Flash-Lite is Google's fastest and most cost-efficient model in the Gemini 3.1 series, designed for high-volume, latency-sensitive workloads. It delivers 2.5x faster time-to-first-token and 45% higher output throughput than Gemini 2.5 Flash, scoring 86.9% on GPQA Diamond and 76.8% on MMMU Pro. An Intelligence Index score of 34 — nearly triple that of 2.5 Flash-Lite — puts it well above prior budget-tier models at the same price point. The model supports text, image, video, audio, and PDF input with a 1M token context window and configurable thinking levels. It's the right choice for developers building translation pipelines, content moderation systems, classification tasks, or any application where throughput and cost matter most. Google OpenAI: GPT-5.4 Mini https://developer.puter.com/ai/openai/gpt-5.4-mini/ https://developer.puter.com/ai/openai/gpt-5.4-mini/ Tue, 17 Mar 2026 00:00:00 GMT GPT-5.4 Mini is OpenAI's fast, efficient distillation of GPT-5.4, significantly improving over GPT-5 Mini across coding, reasoning, multimodal understanding, and tool use while running 2x faster. It approaches GPT-5.4-level performance on several benchmarks and features a 400k context window. OpenAI Mistral AI: Mistral Small 4 https://developer.puter.com/ai/mistralai/mistral-small-2603/ https://developer.puter.com/ai/mistralai/mistral-small-2603/ Mon, 16 Mar 2026 00:00:00 GMT Mistral Small 4 is a 119B-parameter open-source Mixture-of-Experts model (6B active per token) released under Apache 2.0, unifying instruction-following, reasoning, multimodal (text + image), and agentic coding into a single deployment. It features 128 experts, a 256k context window, and configurable reasoning effort that lets developers toggle between fast responses and deep step-by-step reasoning per request. Compared to its predecessor Mistral Small 3, it delivers 40% lower latency and 3x higher throughput while matching or surpassing GPT-OSS 120B on key benchmarks. Mistral AI Z.AI: GLM 5 Turbo https://developer.puter.com/ai/z-ai/glm-5-turbo/ https://developer.puter.com/ai/z-ai/glm-5-turbo/ Sun, 15 Mar 2026 00:00:00 GMT GLM-5 Turbo is a foundation model by Z.ai optimized for fast inference and agent-driven workflows, excelling at tool invocation, complex instruction decomposition, and long-chain task execution in OpenClaw scenarios. It is built on top of the GLM-5 architecture (744B parameters, 40B active) with DeepSeek Sparse Attention for reduced deployment cost and up to 205K token context. GLM-5 Turbo supports reasoning/thinking mode and is designed for real-world multi-step agentic tasks including scheduled, persistent, and high-throughput operations. Z.AI xAI: Grok 4.20 Beta https://developer.puter.com/ai/x-ai/grok-4.20-beta/ https://developer.puter.com/ai/x-ai/grok-4.20-beta/ Thu, 12 Mar 2026 00:00:00 GMT Grok 4.20 Beta is xAI's newest flagship model, featuring a native 4-agent collaboration system (Grok, Harper, Benjamin, Lucas) that reasons in parallel and debates internally before delivering a unified response. It introduces a rapid-learning architecture that improves weekly from real-world feedback, and builds on a ~3T parameter MoE backbone with up to 2M token context. It claims a 65% reduction in hallucinations over Grok 4.1 and strong gains in coding, math, and engineering reasoning. xAI xAI: Grok 4.20 Multi-Agent Beta https://developer.puter.com/ai/x-ai/grok-4.20-multi-agent-beta/ https://developer.puter.com/ai/x-ai/grok-4.20-multi-agent-beta/ Thu, 12 Mar 2026 00:00:00 GMT Grok 4.20 Multi-Agent Beta is an API-specific variant of Grok 4.20 optimized for orchestrating multiple agents that collaborate on deep research tasks. It supports web search and X search tools natively, uses the same 2M token context window, and is designed for developer workflows requiring structured multi-agent collaboration. xAI NVIDIA: Nemotron 3 Super https://developer.puter.com/ai/nvidia/nemotron-3-super-120b-a12b/ https://developer.puter.com/ai/nvidia/nemotron-3-super-120b-a12b/ Wed, 11 Mar 2026 00:00:00 GMT Nemotron 3 Super is NVIDIA's open-weight 120B-parameter hybrid Mamba-Transformer MoE model with only 12B active parameters, designed for running complex multi-agent agentic AI systems at scale. It features a 1-million-token context window to prevent goal drift across long tasks and delivers up to 5x higher throughput than its predecessor. The model excels at reasoning, coding, and tool use. NVIDIA Qwen: Qwen3.5-9B https://developer.puter.com/ai/qwen/qwen3.5-9b/ https://developer.puter.com/ai/qwen/qwen3.5-9b/ Tue, 10 Mar 2026 00:00:00 GMT Qwen 3.5 9B is a 9-billion parameter open-source multimodal model by Alibaba's Qwen Team, featuring a 262K native context window (extendable to ~1M tokens), support for text, image, and video input, and coverage of 201 languages. It uses a hybrid Gated DeltaNet architecture and outperforms much larger models like Qwen3-30B and OpenAI's gpt-oss-120B on key benchmarks including reasoning, vision, and document understanding. Qwen ByteDance Seed: Seed 2.0 Lite https://developer.puter.com/ai/bytedance-seed/seed-2.0-lite/ https://developer.puter.com/ai/bytedance-seed/seed-2.0-lite/ Tue, 10 Mar 2026 00:00:00 GMT Seed 2.0 Lite is ByteDance's mid-tier general-purpose LLM that balances strong performance with cost efficiency, scoring 93 on AIME 2025 and 2233 on Codeforces while supporting text, image, and video understanding plus tool-calling capabilities. It serves as the default production-grade model in the Seed 2.0 family, handling roughly 95% of enterprise workloads at about half the cost of the flagship Pro variant. It supports a 256K context window and is positioned as a high-performance alternative for tasks like code review, document processing, information synthesis, and agent-based workflows. ByteDance Seed OpenAI: GPT-5.4 https://developer.puter.com/ai/openai/gpt-5.4/ https://developer.puter.com/ai/openai/gpt-5.4/ Thu, 05 Mar 2026 00:00:00 GMT GPT-5.4 is OpenAI's latest frontier model released on March 5, 2026, designed for complex professional work with a 1.05M token context window, built-in computer-use capabilities, and improved coding from GPT-5.3-Codex. It is 33% less likely to make factual errors per claim compared to GPT-5.2 and scores 83% on OpenAI's GDPval knowledge work benchmark. OpenAI OpenAI: GPT-5.4 Pro https://developer.puter.com/ai/openai/gpt-5.4-pro/ https://developer.puter.com/ai/openai/gpt-5.4-pro/ Thu, 05 Mar 2026 00:00:00 GMT GPT-5.4 Pro is a higher-compute version of GPT-5.4 that allocates more reasoning time to produce smarter and more precise answers on complex tasks. It supports reasoning effort levels of medium, high, and xhigh, and shares the same 1.05M token context window as GPT-5.4. OpenAI Inception: Mercury 2 https://developer.puter.com/ai/inception/mercury-2/ https://developer.puter.com/ai/inception/mercury-2/ Wed, 04 Mar 2026 00:00:00 GMT Mercury 2 is a diffusion-based reasoning language model from Inception Labs that refines all tokens in parallel rather than generating them sequentially, achieving over 1,000 tokens per second — roughly 5x faster than speed-optimized competitors like Claude Haiku and GPT-5 Mini at comparable quality. On reasoning benchmarks, Mercury 2 scores 91.1 on AIME 2025 and 73.6 on GPQA. It also placed second on the Copilot Arena leaderboard for quality while ranking first for speed overall. With a 128K context window, it is purpose-built for latency-sensitive applications — real-time assistants, high-throughput pipelines, and cost-conscious production workloads where reasoning capability matters. Inception Qwen: Qwen Image 2.0 https://developer.puter.com/ai/qwen/qwen-image-2.0/ https://developer.puter.com/ai/qwen/qwen-image-2.0/ Tue, 03 Mar 2026 00:00:00 GMT Qwen Image 2.0 is Alibaba's second-generation image foundation model, delivering a major upgrade over the original Qwen Image with a leaner 7B-parameter architecture that outperforms its 20B predecessor across the board. It generates natively at 2048×2048 resolution and unifies text-to-image generation and image editing into a single model — no separate pipelines needed. The model scores 88.32 on DPG-Bench, surpassing FLUX.1 (83.84) and GPT Image 1 (85.15), and ranks #1 on AI Arena's blind human evaluation for both generation and editing. Its headline feature is professional typography rendering: it handles prompts up to 1,000 tokens and can generate complete infographics, PPT slides, posters, and comics with accurate bilingual text layout. Ideal for developers building design-oriented workflows where text accuracy and prompt adherence are critical. Qwen Qwen: Qwen Image 2.0 Pro https://developer.puter.com/ai/qwen/qwen-image-2.0-pro/ https://developer.puter.com/ai/qwen/qwen-image-2.0-pro/ Tue, 03 Mar 2026 00:00:00 GMT Qwen Image 2.0 Pro is the highest-fidelity configuration of Alibaba's Qwen Image 2.0, built on the same 7B-parameter architecture but tuned to maximize visual quality over speed. Compared to the standard tier, Pro delivers richer color accuracy, finer detail rendering — visible in textures like hair strands, fabric weaves, and metallic reflections — and stronger adherence to complex, multi-element prompts. Text rendering is also crisper, making it better suited for commercial assets like branded posters and packaging. The standard Qwen Image 2.0 is optimized for fast iteration and prototyping. Pro is where you go for final production renders where every pixel matters. Best for developers building pipelines that need polished, client-ready output from a single API call. Qwen OpenAI: GPT-5.3 Chat https://developer.puter.com/ai/openai/gpt-5.3-chat/ https://developer.puter.com/ai/openai/gpt-5.3-chat/ Tue, 03 Mar 2026 00:00:00 GMT GPT-5.3 Chat is OpenAI's latest conversational model update (also known as GPT-5.3 Instant), designed to make everyday ChatGPT interactions smoother and more natural. It reduces hallucinations by up to ~27%, cuts down on overly cautious refusals and 'cringe' preachy tone that plagued its predecessor GPT-5.2 Instant, and better integrates web search results with its own knowledge. OpenAI Google: Gemini 3.1 Flash Image https://developer.puter.com/ai/google/gemini-3.1-flash-image/ https://developer.puter.com/ai/google/gemini-3.1-flash-image/ Thu, 26 Feb 2026 00:00:00 GMT Gemini 3.1 Flash Image (also known as Nano Banana 2) is Google DeepMind's latest state-of-the-art image generation and editing model, combining Pro-level quality with the speed of the Flash architecture. It supports text and image input with up to 1M token context, generates images up to 4K resolution, and features advanced world knowledge, precise text rendering, subject consistency, and web-search grounding. Google ByteDance Seed: Seed 2.0 Mini https://developer.puter.com/ai/bytedance-seed/seed-2.0-mini/ https://developer.puter.com/ai/bytedance-seed/seed-2.0-mini/ Thu, 26 Feb 2026 00:00:00 GMT Seed 2.0 Mini is ByteDance's most lightweight and inference-efficient model in the Seed 2.0 family, released in February 2026 and optimized for low-latency, high-concurrency, and cost-sensitive applications. It features a 256K context window, multimodal capabilities (text, image, video), and a unique 4-level reasoning effort system. Despite being the smallest variant, it delivers strong benchmark scores (AIME 2025: 87.0, SWE-Bench: 67.9) at an extremely competitive price of $0.1/M input tokens. ByteDance Seed Qwen: Qwen3.5-Flash https://developer.puter.com/ai/qwen/qwen3.5-flash-02-23/ https://developer.puter.com/ai/qwen/qwen3.5-flash-02-23/ Wed, 25 Feb 2026 00:00:00 GMT Qwen 3.5 Flash is the production-optimized API version of the 35B-A3B model. It features a default 1M token context window, built-in tool/function calling support, and is priced at ~$0.10/M input tokens for low-latency agentic workflows. The '02-23' suffix indicates the February 23, 2026 snapshot/version date. Qwen Liquid AI: LFM2-24B-A2B https://developer.puter.com/ai/liquid/lfm-2-24b-a2b/ https://developer.puter.com/ai/liquid/lfm-2-24b-a2b/ Wed, 25 Feb 2026 00:00:00 GMT LFM2 24B A2B is a sparse Mixture-of-Experts model from Liquid AI featuring a novel hybrid architecture that combines gated short convolution blocks with Grouped Query Attention in a 3:1 ratio, developed through hardware-in-the-loop architecture search. With 24 billion total parameters but only ~2 billion active per token, it delivers high throughput while outperforming larger MoE competitors like Qwen3-30B-A3B in throughput benchmarks. It supports 9 languages, a 32K context window, native function calling, and structured outputs. A strong API choice for high-volume multi-agent pipelines, RAG backends, and multilingual applications that demand low per-token cost alongside capable general reasoning. Liquid AI OpenAI: GPT-5.3 Codex https://developer.puter.com/ai/openai/gpt-5.3-codex/ https://developer.puter.com/ai/openai/gpt-5.3-codex/ Tue, 24 Feb 2026 00:00:00 GMT GPT-5.3 Codex is OpenAI's most capable agentic coding model, combining frontier coding performance with strong general reasoning and professional knowledge capabilities. It was the first model instrumental in creating itself, having been used to debug its own training and manage its own deployment. It sets state-of-the-art on SWE-Bench Pro and Terminal-Bench while being 25% faster than its predecessor. OpenAI Qwen: Qwen3.5-122B-A10B https://developer.puter.com/ai/qwen/qwen3.5-122b-a10b/ https://developer.puter.com/ai/qwen/qwen3.5-122b-a10b/ Mon, 23 Feb 2026 00:00:00 GMT Qwen 3.5 122B (10B Active) is Alibaba's largest medium-sized MoE model, activating only 10B of its 122B total parameters per inference pass. It excels at agentic tasks like tool use and multi-step reasoning, leading the Qwen 3.5 lineup on benchmarks such as BFCL-V4 and BrowseComp. It supports 262K native context (extendable to 1M), native multimodal input, and 201 languages under Apache 2.0. Qwen Qwen: Qwen3.5-27B https://developer.puter.com/ai/qwen/qwen3.5-27b/ https://developer.puter.com/ai/qwen/qwen3.5-27b/ Mon, 23 Feb 2026 00:00:00 GMT Qwen 3.5 27B is the only dense (non-MoE) model in the Qwen 3.5 medium series, activating all 27B parameters on every forward pass for maximum per-token reasoning density. It ties GPT-5 mini on SWE-bench Verified at 72.4 and is competitive with Claude Sonnet 4.5 on visual reasoning benchmarks. It runs well on consumer hardware and is open-weight under Apache 2.0. Qwen Qwen: Qwen3.5-35B-A3B https://developer.puter.com/ai/qwen/qwen3.5-35b-a3b/ https://developer.puter.com/ai/qwen/qwen3.5-35b-a3b/ Mon, 23 Feb 2026 00:00:00 GMT Qwen 3.5 35B (3B Active) is a sparse MoE model that activates just 3B of its 35B total parameters, yet outperforms the previous-generation 235B flagship across language, vision, coding, and agent tasks. It uses a hybrid Gated DeltaNet + MoE architecture and can run on GPUs with as little as 8GB VRAM when quantized. It's the base model behind the hosted Qwen 3.5 Flash API. Qwen Aion Labs: Aion-2.0 https://developer.puter.com/ai/aion-labs/aion-2.0/ https://developer.puter.com/ai/aion-labs/aion-2.0/ Mon, 23 Feb 2026 00:00:00 GMT Aion 2.0 is a fine-tuned variant of DeepSeek V3.2, developed by AionLabs and optimized for immersive roleplaying and storytelling. It excels at generating narratives with natural tension, conflict, and dramatic stakes, and handles mature or darker themes with notable nuance. The model offers a 131K-token context window with up to 32K tokens of output, making it well-suited for long-form creative sessions. It supports function calling and streaming. On third-party benchmarks, it has scored 99.5% on general knowledge, 96% on mathematics, and 93.5% on coding tasks. Aion 2.0 is a strong pick for developers building interactive fiction, character-driven chat experiences, or creative writing tools where narrative depth and engagement matter more than raw speed. Aion Labs Google: Gemini 3.1 Pro https://developer.puter.com/ai/google/gemini-3.1-pro-preview/ https://developer.puter.com/ai/google/gemini-3.1-pro-preview/ Thu, 19 Feb 2026 00:00:00 GMT Gemini 3.1 Pro is Google's most advanced reasoning model, building on the Gemini 3 series with over double the reasoning performance of its predecessor (77.1% on ARC-AGI-2) and a 1M token context window. It features a three-tier thinking system (low, medium, high) for adjustable reasoning depth and is optimized for agentic workflows, software engineering, and complex problem-solving. Google Anthropic: Claude Sonnet 4.6 https://developer.puter.com/ai/anthropic/claude-sonnet-4-6/ https://developer.puter.com/ai/anthropic/claude-sonnet-4-6/ Tue, 17 Feb 2026 00:00:00 GMT Claude Sonnet 4.6 is Anthropic's latest mid-tier model released February 2026, delivering near-flagship Opus-level performance in coding, computer use, and agentic tasks at a fraction of the cost ($3/$15 per million tokens). It features a 1M token context window in beta, scores 79.6% on SWE-bench Verified and 72.5% on OSWorld. Developers preferred it over both Sonnet 4.5 (~70% of the time) and even Opus 4.5 (~59%) in real-world coding tests. Anthropic Qwen: Qwen3.5 Plus 02-15 https://developer.puter.com/ai/qwen/qwen3.5-plus-02-15/ https://developer.puter.com/ai/qwen/qwen3.5-plus-02-15/ Mon, 16 Feb 2026 00:00:00 GMT Qwen3.5-Plus is the hosted flagship model in the Qwen3.5 series, available through Alibaba Cloud Model Studio. It offers a 1 million token context window by default and includes built-in tools with adaptive tool use, including web search and code interpreter capabilities. The model supports reasoning mode (chain-of-thought), search, and a fast response mode without extended thinking. It is accessible via an OpenAI-compatible API and can be integrated with third-party coding tools like Claude Code, Cline, and OpenClaw. Qwen3.5-Plus is designed for agentic workflows that combine multimodal reasoning with tool use. Qwen Qwen: Qwen3.5 Plus https://developer.puter.com/ai/qwen/qwen3.5-plus/ https://developer.puter.com/ai/qwen/qwen3.5-plus/ Mon, 16 Feb 2026 00:00:00 GMT Qwen3.5 Plus is Alibaba's hosted flagship model in the Qwen3.5 series, built on the Qwen3.5-397B-A17B Mixture-of-Experts architecture with 397 billion total parameters and 17 billion active per token. Its headline feature is a 1-million-token native context window — among the largest available via API — making it well suited for processing entire codebases, long documents, or extended multi-turn conversations in a single request. It supports both a deep-thinking mode and an "Auto" mode that adaptively invokes tools like web search and code interpreters. A solid general-purpose option for developers needing large-context capabilities and agentic workflows that combine multimodal reasoning with tool use. Qwen Qwen: Qwen3.5 397B A17B https://developer.puter.com/ai/qwen/qwen3.5-397b-a17b/ https://developer.puter.com/ai/qwen/qwen3.5-397b-a17b/ Sun, 15 Feb 2026 00:00:00 GMT Qwen3.5-397B-A17B is an open-weight native vision-language model from Alibaba's Qwen team, released in February 2026. It uses a hybrid architecture combining Gated Delta Networks (linear attention) with a sparse mixture-of-experts design, totaling 397 billion parameters but activating only 17 billion per forward pass for efficient inference. The model delivers strong performance across reasoning, coding, agent tasks, and multimodal understanding, competing with frontier models like GPT-5.2, Claude 4.5 Opus, and Gemini-3 Pro. It supports 201 languages and dialects and features a 250k-token vocabulary. Its decoding throughput is reported at 8.6x that of Qwen3-Max under a 32k context length. Qwen Z.AI: GLM 5 https://developer.puter.com/ai/z-ai/glm-5/ https://developer.puter.com/ai/z-ai/glm-5/ Thu, 12 Feb 2026 00:00:00 GMT GLM-5 is Zhipu AI's (Z.ai) fifth-generation flagship open-weight foundation model with 744B total parameters (40B active) in a Mixture of Experts architecture, designed for agentic engineering, complex systems coding, and long-horizon agent tasks. It achieves state-of-the-art performance among open-weight models on coding and agentic benchmarks like SWE-bench Verified and Terminal Bench 2.0, approaching Claude Opus 4.5-level capability. Z.AI MiniMax: MiniMax M2.5 https://developer.puter.com/ai/minimax/minimax-m2.5/ https://developer.puter.com/ai/minimax/minimax-m2.5/ Thu, 12 Feb 2026 00:00:00 GMT MiniMax M2.5 is a 230B-parameter Mixture-of-Experts model (10B active) from Shanghai-based MiniMax, designed for real-world productivity with state-of-the-art performance in coding (80.2% SWE-Bench Verified), agentic tool use, and search tasks. It rivals top models from Anthropic and OpenAI while costing 1/10th to 1/20th the price, positioning itself as frontier intelligence 'too cheap to meter.' The model excels at full-stack development, office work (Word, Excel, PowerPoint), and autonomous agent workflows. MiniMax Qwen: Qwen3 Max Thinking https://developer.puter.com/ai/qwen/qwen3-max-thinking/ https://developer.puter.com/ai/qwen/qwen3-max-thinking/ Mon, 09 Feb 2026 00:00:00 GMT Qwen3 Max Thinking is Alibaba Cloud's flagship proprietary reasoning model with a 256K context window, featuring test-time scaling and adaptive tool-use capabilities (web search, code interpreter, memory) that allow it to reason iteratively and autonomously. It scores competitively against GPT-5.2 and Gemini 3 Pro on benchmarks like Humanity's Last Exam and HMMT, excelling in math, complex reasoning, and instruction following. Qwen Anthropic: Claude Opus 4.6 https://developer.puter.com/ai/anthropic/claude-opus-4-6/ https://developer.puter.com/ai/anthropic/claude-opus-4-6/ Thu, 05 Feb 2026 00:00:00 GMT Claude Opus 4.6 is Anthropic's latest model, released February 2026. It is a powerful model for coding and agentic tasks, with a 200K token context window and a 64K output context window. Anthropic Qwen: Qwen3 Coder Next https://developer.puter.com/ai/qwen/qwen3-coder-next/ https://developer.puter.com/ai/qwen/qwen3-coder-next/ Wed, 04 Feb 2026 00:00:00 GMT Qwen3-Coder-Next is an open-weight coding model from Alibaba's Qwen team with 80B total parameters but only 3B active per token, designed specifically for coding agents and local development with a 256K context window. It uses a sparse Mixture-of-Experts (MoE) architecture with hybrid attention, trained on 800K executable coding tasks using reinforcement learning to excel at long-horizon reasoning, tool calling, and recovering from execution failures. It achieves performance comparable to models with 10-20x more active parameters on benchmarks like SWE-Bench while maintaining low inference costs. Qwen StepFun: Step 3.5 Flash https://developer.puter.com/ai/stepfun/step-3.5-flash/ https://developer.puter.com/ai/stepfun/step-3.5-flash/ Thu, 29 Jan 2026 00:00:00 GMT Step 3.5 Flash is an open-source reasoning model from StepFun, built on a sparse Mixture-of-Experts (MoE) architecture with 196B total parameters but only 11B active per token. It supports a 256K-token context window and native tool calling. The model is purpose-built for agentic and coding workflows, with generation throughput of 100–300 tokens/sec in typical usage. It scores 74.4% on SWE-bench Verified, 97.3 on AIME 2025, 86.4% on LiveCodeBench-V6, and 88.2 on τ²-Bench. Step 3.5 Flash is a strong choice for developers building AI agents, code assistants, or multi-step reasoning pipelines who need frontier-level intelligence at low per-token cost. StepFun Upstage AI: Solar Pro 3 https://developer.puter.com/ai/upstage/solar-pro-3/ https://developer.puter.com/ai/upstage/solar-pro-3/ Tue, 27 Jan 2026 00:00:00 GMT Solar Pro 3 is a Mixture-of-Experts large language model from Upstage, featuring 102B total parameters with only 12B active per forward pass and a 128K token context window. The model is built for agentic workflows and complex reasoning, trained using Upstage's proprietary SnapPO reinforcement learning framework. It scores 72.3 on Tau2-all (the comprehensive agentic evaluation), roughly doubling its predecessor's 36.0. It also claims 100% schema compliance for structured output generation. Solar Pro 3 is particularly strong in Korean, with robust English and Japanese support — making it a standout choice for multilingual teams operating in East Asian markets. It targets enterprise use cases in domains like finance, healthcare, and legal, where reliable instruction following and structured outputs matter most. Upstage AI Moonshot AI: Kimi K2.5 https://developer.puter.com/ai/moonshotai/kimi-k2.5/ https://developer.puter.com/ai/moonshotai/kimi-k2.5/ Tue, 27 Jan 2026 00:00:00 GMT Kimi K2.5 is Moonshot AI's most capable open-source model, a natively multimodal (vision + text) trillion-parameter MoE with 32B active parameters released in January 2026. Built through continual pretraining on ~15 trillion mixed visual and text tokens atop the K2 base, it supports both thinking and instant modes with a 256K context window. It scored 76.8% on SWE-bench Verified, 96.1% on AIME 2025, and 50.2% on Humanity's Last Exam with tools — outperforming Claude Opus 4.5 and GPT-5.2 on the latter. Its standout feature is Agent Swarm, which coordinates up to 100 parallel sub-agents for complex tasks. K2.5 excels at vision-to-code generation, frontend development from screenshots, and large-scale agentic workflows, making it a strong choice for developers building multimodal AI agents. Moonshot AI Arcee AI: Trinity Large Preview https://developer.puter.com/ai/arcee-ai/trinity-large-preview/ https://developer.puter.com/ai/arcee-ai/trinity-large-preview/ Tue, 27 Jan 2026 00:00:00 GMT Trinity Large Preview is a 400-billion-parameter sparse Mixture-of-Experts model from Arcee AI, with approximately 13B active parameters per token. It uses 256 experts with 4 active per token, trained on over 17 trillion tokens. On MMLU it scores 87.2, and it achieved 24.0 on AIME 2025, demonstrating strong mathematical reasoning alongside general knowledge. The 128k context window supports long-document analysis and complex reasoning workflows. Trinity Large Preview is suited for complex reasoning, math, and coding-adjacent workflows where developers want near-frontier quality through an API at substantially lower cost than dense models of equivalent scale. Arcee AI MiniMax: MiniMax M2-her https://developer.puter.com/ai/minimax/minimax-m2-her/ https://developer.puter.com/ai/minimax/minimax-m2-her/ Fri, 23 Jan 2026 00:00:00 GMT MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn conversations. It stays consistent in tone and personality across conversations and supports rich message roles to learn from example dialogue. This makes it well-suited for storytelling, AI companions, and conversational experiences where natural flow matters. MiniMax Writer: Palmyra X5 https://developer.puter.com/ai/writer/palmyra-x5/ https://developer.puter.com/ai/writer/palmyra-x5/ Wed, 21 Jan 2026 00:00:00 GMT Palmyra X5 is Writer's most advanced enterprise LLM, featuring a 1-million-token context window and adaptive reasoning capabilities designed for agentic AI workflows. The model is purpose-built for orchestrating multi-step agents, with sub-second tool-calling latency (~300ms) and the ability to process a full million-token prompt in roughly 22 seconds. It supports code generation, structured outputs, and over 30 languages. On benchmarks, Palmyra X5 scores 48.7 on BigCodeBench (Full, Instruct), 53% on Longbench v2, and 19.1% on OpenAI's MRCR 8-needle test — close to GPT-4.1's 20.25% on the same evaluation. It's priced at $0.60/M input tokens and $6/M output tokens, positioning it as a cost-efficient alternative for teams building complex, data-heavy agent pipelines at scale. Writer Liquid AI: LFM2.5-1.2B-Instruct https://developer.puter.com/ai/liquid/lfm-2.5-1.2b-instruct/ https://developer.puter.com/ai/liquid/lfm-2.5-1.2b-instruct/ Tue, 20 Jan 2026 00:00:00 GMT LFM 2.5 1.2B Instruct is a compact instruction-tuned language model from Liquid AI, designed to deliver best-in-class performance at the 1-billion-parameter scale. Trained on 28 trillion tokens with reinforcement learning, it achieves strong scores across knowledge (MMLU-Pro: 44.35), reasoning (GPQA: 38.89), and instruction following (IFEval: 86.23) — outperforming similarly sized models like Llama-3.2-1B and Gemma-3-1B on these benchmarks. The model supports tool use, structured outputs, and function calling, making it a solid choice for lightweight agentic pipelines, chatbots, and latency-sensitive API integrations where cost and throughput matter most. Liquid AI Liquid AI: LFM2.5-1.2B-Thinking https://developer.puter.com/ai/liquid/lfm-2.5-1.2b-thinking/ https://developer.puter.com/ai/liquid/lfm-2.5-1.2b-thinking/ Tue, 20 Jan 2026 00:00:00 GMT LFM 2.5 1.2B Thinking is a compact reasoning model from Liquid AI that generates explicit chain-of-thought traces before producing answers, enabling more reliable performance on multi-step problems at the 1-billion-parameter scale. Compared to its instruct sibling, it shows major benchmark gains in math reasoning (MATH-500: 88 vs. 63), instruction following (Multi-IF: 69 vs. 61), and tool use (BFCLv3: 57 vs. 49). It matches or exceeds Qwen3-1.7B on most reasoning benchmarks despite having 40% fewer parameters. Well-suited for API use cases involving agentic tool calling, math, and code — anywhere a reasoning trace meaningfully improves answer quality. Liquid AI Z.AI: GLM 4.7 Flash https://developer.puter.com/ai/z-ai/glm-4.7-flash/ https://developer.puter.com/ai/z-ai/glm-4.7-flash/ Mon, 19 Jan 2026 00:00:00 GMT GLM 4.7 Flash is designed for speed and efficiency while maintaining strong performance. It features a 200K token context window, making it suitable for processing long documents and generating extended responses. Z.AI Z.AI: GLM 4.7 FlashX https://developer.puter.com/ai/z-ai/glm-4.7-flashx/ https://developer.puter.com/ai/z-ai/glm-4.7-flashx/ Mon, 19 Jan 2026 00:00:00 GMT GLM-4.7-FlashX is the fastest inference tier in Z.ai's GLM-4.7 generation, offering the lowest latency in the lineup. It shares the 200K-token context window and core improvements of the 4.7 generation — stronger coding, tool usage, multi-step reasoning, and natural conversational tone — while trading peak capability for maximum speed. The full GLM-4.7 model scores 73.8% on SWE-bench Verified, 84.9% on LiveCodeBench, and 95.7% on AIME 2025. FlashX inherits the same foundational training but is the right pick when response time matters more than squeezing out every point of accuracy. Targets high-throughput coding assistance, real-time agent orchestration, and latency-sensitive chat where the standard GLM-4.7 or GLM-4.7-Flash would be too slow for the concurrency requirements. Z.AI OpenAI: GPT Audio https://developer.puter.com/ai/openai/gpt-audio/ https://developer.puter.com/ai/openai/gpt-audio/ Mon, 19 Jan 2026 00:00:00 GMT OpenAI OpenAI: GPT Audio Mini https://developer.puter.com/ai/openai/gpt-audio-mini/ https://developer.puter.com/ai/openai/gpt-audio-mini/ Mon, 19 Jan 2026 00:00:00 GMT OpenAI Black Forest Labs: FLUX.2 [klein] 4B https://developer.puter.com/ai/black-forest-labs/flux-2-klein-4b/ https://developer.puter.com/ai/black-forest-labs/flux-2-klein-4b/ Thu, 15 Jan 2026 00:00:00 GMT FLUX.2 Klein 4B is a compact, Apache 2.0 licensed model distilled from the full FLUX.2 architecture, capable of sub-second image generation on consumer GPUs with ~13GB VRAM. It supports text-to-image, image editing, and multi-reference generation in a unified model. Black Forest Labs Black Forest Labs: FLUX.2 [klein] 9B https://developer.puter.com/ai/black-forest-labs/flux-2-klein-9b-base/ https://developer.puter.com/ai/black-forest-labs/flux-2-klein-9b-base/ Thu, 15 Jan 2026 00:00:00 GMT FLUX.2 Klein 9B is a larger variant of the Klein family built on a 9B flow model with an 8B Qwen3 text embedder, matching or exceeding models 5x its size in quality. It offers higher output diversity than the 4B distilled variant and is ideal for fine-tuning and research. Black Forest Labs Wan AI: Wan2.6 Image https://developer.puter.com/ai/wan-ai/wan2.6-image/ https://developer.puter.com/ai/wan-ai/wan2.6-image/ Thu, 25 Dec 2025 00:00:00 GMT Wan 2.6 Image is a 20-billion-parameter diffusion-based image generation and transformation model developed by Alibaba Cloud. Built on a Multimodal Diffusion Transformer (MMDiT) architecture, it supports text-to-image generation, image-to-image editing, and multi-reference style transfer. The model accepts up to three reference images per request, allowing developers to explicitly control style, subject, and composition by referencing inputs in the prompt (e.g., "image 1" for style, "image 2" for background). It generates outputs up to 2048×2048 pixels across a wide range of aspect ratios. Wan 2.6 Image is particularly strong at localized content generation, with sophisticated understanding of Asian cultural contexts and aesthetics. It's well suited for e-commerce product visualization, brand asset creation, marketing materials, and automated content pipelines where controllability and visual consistency matter more than pure artistic exploration. Wan AI MiniMax: MiniMax M2.1 https://developer.puter.com/ai/minimax/minimax-m2.1/ https://developer.puter.com/ai/minimax/minimax-m2.1/ Tue, 23 Dec 2025 00:00:00 GMT MiniMax-M2.1 is an enhanced version of M2 with significantly improved multi-language programming capabilities and office scenario support. It features more concise responses, better instruction following, and matches or exceeds Claude Sonnet 4.5 on coding benchmarks while maintaining excellent agent/tool scaffolding generalization. MiniMax ByteDance Seed: Seed 1.6 https://developer.puter.com/ai/bytedance-seed/seed-1.6/ https://developer.puter.com/ai/bytedance-seed/seed-1.6/ Tue, 23 Dec 2025 00:00:00 GMT Seed 1.6 is a general-purpose multimodal AI model by ByteDance featuring adaptive deep thinking, a 256K context window, and a sparse Mixture-of-Experts architecture with 230B total parameters (23B active per forward pass). ByteDance Seed ByteDance Seed: Seed 1.6 Flash https://developer.puter.com/ai/bytedance-seed/seed-1.6-flash/ https://developer.puter.com/ai/bytedance-seed/seed-1.6-flash/ Tue, 23 Dec 2025 00:00:00 GMT Seed 1.6 Flash is an ultra-fast multimodal model by ByteDance optimized for high-throughput and low-latency inference, supporting text, image, and video inputs with a 256K context window and up to 16K output tokens. ByteDance Seed Google: Gemini 3 Flash https://developer.puter.com/ai/google/gemini-3-flash-preview/ https://developer.puter.com/ai/google/gemini-3-flash-preview/ Wed, 17 Dec 2025 00:00:00 GMT Gemini 3 Flash is Google's frontier intelligence model built for speed, combining Pro-grade reasoning with Flash-level latency at a fraction of the cost. It excels at agentic coding, complex analysis, and multimodal understanding with configurable thinking levels. Google Mistral AI: Ministral 14B https://developer.puter.com/ai/mistralai/ministral-14b-2512/ https://developer.puter.com/ai/mistralai/ministral-14b-2512/ Tue, 16 Dec 2025 00:00:00 GMT Ministral 14B is part of the Ministral 3 family, a 14B parameter multimodal model with vision capabilities under Apache 2.0. It offers advanced capabilities for local deployment with instruct, base, and reasoning variants achieving 85% on AIME'25. Mistral AI Mistral AI: Ministral 3B https://developer.puter.com/ai/mistralai/ministral-3b-2512/ https://developer.puter.com/ai/mistralai/ministral-3b-2512/ Tue, 16 Dec 2025 00:00:00 GMT Ministral 3B is a compact 3B parameter multimodal model from the Ministral 3 family with vision capabilities. It runs on consumer hardware and edge devices, offering text and image understanding with 256K context in a 3-4GB quantized footprint. Mistral AI Mistral AI: Ministral 8B https://developer.puter.com/ai/mistralai/ministral-8b-2512/ https://developer.puter.com/ai/mistralai/ministral-8b-2512/ Tue, 16 Dec 2025 00:00:00 GMT Ministral 8B is an 8B parameter multimodal model offering best-in-class text and vision capabilities for edge deployment. It supports single-GPU operation and provides an optimal balance of performance and efficiency under Apache 2.0. Mistral AI Mistral AI: Mistral Small Creative https://developer.puter.com/ai/mistralai/mistral-small-creative/ https://developer.puter.com/ai/mistralai/mistral-small-creative/ Tue, 16 Dec 2025 00:00:00 GMT Mistral Small Creative is a specialized Labs model variant optimized for creative content generation. It builds on the Mistral Small architecture with adjustments for more imaginative and varied outputs in writing tasks. Mistral AI Xiaomi: MiMo-V2-Flash https://developer.puter.com/ai/xiaomi/mimo-v2-flash/ https://developer.puter.com/ai/xiaomi/mimo-v2-flash/ Sun, 14 Dec 2025 00:00:00 GMT MiMo-V2-Flash is Xiaomi's open-source Mixture-of-Experts language model with 309B total parameters (15B active), designed for high-speed reasoning, coding, and agentic workflows. It uses a hybrid attention architecture with Multi-Token Prediction to achieve up to 150 tokens/second inference while keeping costs extremely low. The model excels at software engineering benchmarks and supports a 256K context window. Xiaomi NVIDIA: Nemotron 3 Nano 30B A3B https://developer.puter.com/ai/nvidia/nemotron-3-nano-30b-a3b/ https://developer.puter.com/ai/nvidia/nemotron-3-nano-30b-a3b/ Sun, 14 Dec 2025 00:00:00 GMT Nemotron 3 Nano 30B A3B is a 31.6B total parameter (3.2B active) hybrid Mamba-Transformer MoE model trained from scratch by NVIDIA with a 1M token context window. It offers up to 3.3x higher throughput than comparable models and supports configurable reasoning traces for both agentic and conversational tasks. NVIDIA Allen AI: Molmo2 8B https://developer.puter.com/ai/allenai/molmo-2-8b/ https://developer.puter.com/ai/allenai/molmo-2-8b/ Sun, 14 Dec 2025 00:00:00 GMT Molmo 2 8B is an open vision-language model from the Allen Institute for AI (AI2), built on a Qwen3-8B language backbone with a SigLIP 2 vision encoder. It supports single images, multi-image inputs, and video clips. On its 11-benchmark image average, Molmo 2 8B leads all open-weight models in its class. It achieves 32.9% on video pointing versus 17% for Gemini 2.5 Pro, and tops open-weight scores across seven video benchmarks including Video-MME and MVBench. It also outperforms the original Molmo 72B on grounding tasks despite being far smaller. A strong choice for multimodal applications requiring precise spatial reasoning, visual grounding, or video understanding via API. Allen AI Z.AI: AutoGLM Phone Multilingual https://developer.puter.com/ai/z-ai/autoglm-phone-multilingual/ https://developer.puter.com/ai/z-ai/autoglm-phone-multilingual/ Thu, 11 Dec 2025 00:00:00 GMT AutoGLM Phone Multilingual is a 9B-parameter vision-language model from Z.ai purpose-built for autonomous smartphone control. It takes a screenshot of a phone screen, interprets the UI through multimodal perception, and outputs precise actions — taps, swipes, text input — to complete multi-step tasks described in natural language. The multilingual variant extends coverage beyond Chinese-optimized apps to English and other languages, making it suitable for international mobile automation workflows. Its architecture is based on GLM-4.1V-9B-Thinking, and it supports a 66K-token context window. Ideal for developers building mobile testing pipelines, phone-based AI assistants, or cross-app automation agents. Devices are controlled via ADB (Android) or HDC (HarmonyOS), with the model callable through a standard chat completions API. Z.AI OpenAI: GPT-5.2 https://developer.puter.com/ai/openai/gpt-5.2/ https://developer.puter.com/ai/openai/gpt-5.2/ Thu, 11 Dec 2025 00:00:00 GMT GPT-5.2 is OpenAI's flagship model for professional knowledge work and coding, outperforming industry professionals on GDPval across 44 occupations. It excels at spreadsheets, presentations, code, and complex multi-step projects. OpenAI OpenAI: GPT-5.2 Chat https://developer.puter.com/ai/openai/gpt-5.2-chat/ https://developer.puter.com/ai/openai/gpt-5.2-chat/ Thu, 11 Dec 2025 00:00:00 GMT GPT-5.2 Chat is the ChatGPT-optimized variant of GPT-5.2 with an August 2025 knowledge cutoff. It provides conversational interactions with the latest world knowledge before requiring web search. OpenAI OpenAI: GPT-5.2 Codex https://developer.puter.com/ai/openai/gpt-5.2-codex/ https://developer.puter.com/ai/openai/gpt-5.2-codex/ Thu, 11 Dec 2025 00:00:00 GMT GPT-5.2 Codex is OpenAI's most advanced agentic coding model for professional software engineering and defensive cybersecurity. It achieves state-of-the-art on SWE-Bench Pro with improved long-horizon work through context compaction. OpenAI OpenAI: GPT-5.2 Pro https://developer.puter.com/ai/openai/gpt-5.2-pro/ https://developer.puter.com/ai/openai/gpt-5.2-pro/ Thu, 11 Dec 2025 00:00:00 GMT GPT-5.2 Pro is a version of GPT-5.2 that thinks longer to produce smarter and more precise responses for challenging problems. It supports medium, high, and xhigh reasoning effort settings. OpenAI Google: Gemini 3 Pro Image https://developer.puter.com/ai/google/gemini-3-pro-image/ https://developer.puter.com/ai/google/gemini-3-pro-image/ Thu, 11 Dec 2025 00:00:00 GMT Gemini 3 Pro Image (Nano Banana Pro) is Google's most advanced image generation and editing model built on Gemini 3 Pro, featuring studio-quality output with support for 2K/4K resolution. It excels at accurate text rendering in multiple languages, uses Google Search grounding for real-time data, and employs thinking mode for complex reasoning through prompts. Google Allen AI: Olmo 3.1 32B Instruct https://developer.puter.com/ai/allenai/olmo-3.1-32b-instruct/ https://developer.puter.com/ai/allenai/olmo-3.1-32b-instruct/ Wed, 10 Dec 2025 00:00:00 GMT OLMo 3.1 32B Instruct is a fully open instruction-tuned language model from the Allen Institute for AI (AI2), designed for chat, tool use, and multi-turn dialogue at the 32B parameter scale. AI2 positions it as the most capable fully open 32B-scale instruct model, with strong performance on math (GSM8K, MATH), coding (HumanEval, MBPP+), and instruction-following (IFEval). It uses a hybrid attention architecture and maintains strong long-context retrieval performance (96.1 on RULER at 4K). Released under Apache 2.0 with full data and training transparency, it's a well-rounded choice for instruction-following, tool-augmented, or multi-turn chat applications. Allen AI Allen AI: Olmo 3.1 32B Think https://developer.puter.com/ai/allenai/olmo-3.1-32b-think/ https://developer.puter.com/ai/allenai/olmo-3.1-32b-think/ Wed, 10 Dec 2025 00:00:00 GMT OLMo 3.1 32B Think is the updated flagship reasoning model from the Allen Institute for AI (AI2), an improved successor to OLMo 3 32B Think with an additional 21 days of extended reinforcement learning training. The extended training yielded gains of 5+ points on AIME, 4+ points on ZebraLogic and IFEval, and 20+ points on IFBench over its predecessor. It supports a 64K context window and is licensed under Apache 2.0 with full training transparency. For API developers needing a high-performance open reasoning model for math, code, and complex instruction-following, OLMo 3.1 32B Think is AI2's most capable reasoning offering, competitive with Qwen 3 32B at the same scale. Allen AI Z.AI: GLM 4.6V https://developer.puter.com/ai/z-ai/glm-4.6v/ https://developer.puter.com/ai/z-ai/glm-4.6v/ Tue, 09 Dec 2025 00:00:00 GMT GLM-4.6V is a 106B vision-language model featuring native multimodal Function Calling—the first to directly pass images as tool inputs. It supports 128K context for processing 150+ page documents or 1-hour videos in a single pass. Z.AI Mistral AI: Devstral 2 https://developer.puter.com/ai/mistralai/devstral-2512/ https://developer.puter.com/ai/mistralai/devstral-2512/ Tue, 09 Dec 2025 00:00:00 GMT Devstral 2 is a 123B parameter dense transformer coding model achieving 72.2% on SWE-bench Verified with 256K context. Released under modified MIT license, it's the state-of-the-art open model for code agents, 7x more cost-efficient than Claude Sonnet. Mistral AI Z.AI: GLM 4.6V Flash https://developer.puter.com/ai/z-ai/glm-4.6v-flash/ https://developer.puter.com/ai/z-ai/glm-4.6v-flash/ Mon, 08 Dec 2025 00:00:00 GMT GLM-4.6V-Flash is a 9B-parameter vision-language model from Z.ai, the lightweight variant of the GLM-4.6V series. It supports a 128K-token context window and processes images, documents, charts, video frames, and text within a single request. Its key differentiator is native multimodal function calling — images and screenshots can be passed directly as tool parameters, and visual tool outputs are consumed in the same reasoning chain. This bridges the gap between visual perception and executable action for multimodal agent workflows. Best for latency-sensitive and cost-conscious applications that need vision-language capabilities: document understanding pipelines, UI-to-code conversion, visual QA, and multimodal agent loops. For maximum accuracy on complex visual reasoning, the full 106B GLM-4.6V model is available. Z.AI Z.AI: GLM 4.6V FlashX https://developer.puter.com/ai/z-ai/glm-4.6v-flashx/ https://developer.puter.com/ai/z-ai/glm-4.6v-flashx/ Mon, 08 Dec 2025 00:00:00 GMT GLM-4.6V-FlashX is the fastest inference tier in Z.ai's GLM-4.6V vision-language model series. Built on the same 9B-parameter architecture as GLM-4.6V-Flash, it shares the 128K-token context window and native multimodal function calling capabilities but is further optimized for throughput and minimal latency. It supports vision input, reasoning, tool use, and structured JSON output — the same feature set as GLM-4.6V-Flash with higher concurrency limits and faster response times. Ideal for high-volume visual processing pipelines where per-request latency is critical: real-time document scanning, automated UI testing at scale, or multimodal chat applications that need vision understanding without waiting on a larger model. Z.AI Relace: Relace Search https://developer.puter.com/ai/relace/relace-search/ https://developer.puter.com/ai/relace/relace-search/ Mon, 08 Dec 2025 00:00:00 GMT Relace Search is an agentic codebase search model that uses 4-12 parallel tool calls (view_file, grep) to explore repositories and return relevant files. It performs multi-step reasoning to produce precise results 4x faster than frontier models, designed to work as a subagent for coding workflows. Relace Nex AGI: DeepSeek V3.1 Nex N1 https://developer.puter.com/ai/nex-agi/deepseek-v3.1-nex-n1/ https://developer.puter.com/ai/nex-agi/deepseek-v3.1-nex-n1/ Mon, 08 Dec 2025 00:00:00 GMT DeepSeek V3.1 Nex N1 is an agentic large language model post-trained by Nex AGI on top of DeepSeek's V3.1 base, built specifically for autonomous task execution, tool use, and multi-step workflows. It uses a 670B-parameter Mixture of Experts architecture with 37B activated parameters per token and supports a 131K context window. The model is optimized for agent-oriented use cases: function calling, web search integration, code generation, and complex planning tasks. It performs well on agentic benchmarks including SWE-bench, GAIA 2, BFCL, and Terminal-Bench, with particular strength in practical coding and HTML generation. Nex N1 is a strong pick for developers building AI agents, research assistants, or automated pipelines that need reliable tool use and multi-hop reasoning at an accessible price point. Nex AGI OpenAI: GPT-5.1 Codex Max https://developer.puter.com/ai/openai/gpt-5.1-codex-max/ https://developer.puter.com/ai/openai/gpt-5.1-codex-max/ Thu, 04 Dec 2025 00:00:00 GMT GPT-5.1 Codex Max is OpenAI's frontier agentic coding model built for long-running, detailed work using context compaction. It's the first model trained to operate across multiple context windows coherently. OpenAI Mistral AI: Mistral Large 3 https://developer.puter.com/ai/mistralai/mistral-large-2512/ https://developer.puter.com/ai/mistralai/mistral-large-2512/ Tue, 02 Dec 2025 00:00:00 GMT Mistral Large 3 is a 675B parameter sparse MoE model (41B active) trained on 3000 H200 GPUs, representing Mistral's frontier open-weight multimodal model. It supports 256K context, native vision, and excels in agentic workflows and enterprise applications. Mistral AI Amazon: Nova 2 Lite https://developer.puter.com/ai/amazon/nova-2-lite-v1/ https://developer.puter.com/ai/amazon/nova-2-lite-v1/ Tue, 02 Dec 2025 00:00:00 GMT Amazon Nova 2 Lite is a fast, cost-effective multimodal reasoning model for everyday workloads that processes text, images, and video with a 1M token context window. It features extended thinking with adjustable reasoning intensity (low/medium/high) and built-in tools for web grounding and code execution. Released in December 2025, it excels at document processing, customer service chatbots, and agentic workflows. Amazon Z.AI: GLM 4.7 https://developer.puter.com/ai/z-ai/glm-4.7/ https://developer.puter.com/ai/z-ai/glm-4.7/ Mon, 01 Dec 2025 00:00:00 GMT GLM-4.7 is Zhipu AI's latest ~400B flagship released December 2025, optimized for coding with 200K context and 128K output. It scores 73.8% on SWE-bench and 95.7% on AIME 2025. Z.AI OpenAI: GPT Image 1.5 https://developer.puter.com/ai/openai/gpt-image-1.5/ https://developer.puter.com/ai/openai/gpt-image-1.5/ Mon, 01 Dec 2025 00:00:00 GMT GPT Image 1.5 is OpenAI's latest and most advanced image generation model released in December 2025, offering better instruction following, precise editing, and up to 4x faster generation than GPT Image 1. It maintains details during edits, improves on premature cropping and color bias issues, and is 20% cheaper than its predecessor. This model powers the ChatGPT Images feature and represents the current state-of-the-art in OpenAI's image generation lineup. OpenAI DeepSeek: DeepSeek V3.2 https://developer.puter.com/ai/deepseek/deepseek-v3.2/ https://developer.puter.com/ai/deepseek/deepseek-v3.2/ Mon, 01 Dec 2025 00:00:00 GMT DeepSeek V3.2 is the December 2025 flagship model featuring DeepSeek Sparse Attention for efficiency and massive reinforcement learning post-training, achieving GPT-5-level performance. It's the first DeepSeek model to integrate thinking directly into tool-use and excels at agentic AI tasks. DeepSeek DeepSeek: DeepSeek V3.2 Speciale https://developer.puter.com/ai/deepseek/deepseek-v3.2-speciale/ https://developer.puter.com/ai/deepseek/deepseek-v3.2-speciale/ Mon, 01 Dec 2025 00:00:00 GMT DeepSeek V3.2-Speciale is a high-compute variant designed exclusively for maximum reasoning accuracy, achieving gold-medal performance in IMO 2025, IOI 2025, and ICPC World Finals. It rivals Gemini 3.0 Pro but requires higher token usage and doesn't support tool calling. DeepSeek Arcee AI: Trinity Mini https://developer.puter.com/ai/arcee-ai/trinity-mini/ https://developer.puter.com/ai/arcee-ai/trinity-mini/ Mon, 01 Dec 2025 00:00:00 GMT Trinity Mini is a 26-billion-parameter sparse Mixture-of-Experts model from Arcee AI, with approximately 3B active parameters per token. It uses 128 experts with 8 active per token, blending global sparsity with gated attention techniques. Specifically tuned for multi-turn agent workflows, tool orchestration, function calling, and structured outputs, it scores 84.95 on MMLU and 59.67 on BFCL V3, with throughput exceeding 200 tokens per second. Released under Apache 2.0, the 128k context window and strong function-calling performance make Trinity Mini a practical choice for agentic systems, backend automation, and tool-use pipelines where inference speed and cost efficiency matter. Arcee AI Prime Intellect: INTELLECT-3 https://developer.puter.com/ai/prime-intellect/intellect-3/ https://developer.puter.com/ai/prime-intellect/intellect-3/ Thu, 27 Nov 2025 00:00:00 GMT INTELLECT-3 is a 106B-parameter Mixture-of-Experts reasoning model from Prime Intellect, with 12B active parameters per forward pass. It was post-trained from GLM-4.5-Air-Base using supervised fine-tuning followed by large-scale reinforcement learning. The model excels at math, code, science, and multi-step reasoning tasks. It scores 98.1% on MATH-500, 90.8% on AIME 2024, 69.3% on LiveCodeBench v6, and 74.4% on GPQA Diamond — outperforming the base GLM-4.5-Air it was trained from and competing with larger frontier models. Its MoE architecture keeps inference efficient despite the large total parameter count, making it a strong choice for developers who need high reasoning performance without the cost profile of much larger dense models. Fully open-weight under the MIT license, with a 131K token context window. Prime Intellect Allen AI: Olmo 3 32B Think https://developer.puter.com/ai/allenai/olmo-3-32b-think/ https://developer.puter.com/ai/allenai/olmo-3-32b-think/ Fri, 21 Nov 2025 00:00:00 GMT OLMo 3 32B Think is a fully open reasoning model from the Allen Institute for AI (AI2), and the first fully open 32B thinking model to expose intermediate chain-of-thought reasoning traces. Trained on multi-step math, code, and general problem-solving tasks using a thinking SFT, DPO, and RLVR training flow, it is the strongest fully open reasoning model at the 32B scale — narrowing the gap to open-weight models like Qwen 3-32B-Think while trained on 6x fewer tokens. All training data, code, weights, and checkpoints are publicly available under Apache 2.0. Best suited for complex multi-step reasoning and mathematical problem solving via API. Allen AI xAI: Grok 4.1 Fast https://developer.puter.com/ai/x-ai/grok-4-1-fast/ https://developer.puter.com/ai/x-ai/grok-4-1-fast/ Wed, 19 Nov 2025 00:00:00 GMT Grok 4.1 Fast is xAI's best tool-calling model released November 2025, featuring a 2M context window and halved hallucination rates versus Grok 4 Fast. It comes in reasoning and non-reasoning modes and is optimized for agentic workflows with native support for web search, X search, and code execution. xAI xAI: Grok 4.1 Fast Non-Reasoning https://developer.puter.com/ai/x-ai/grok-4-1-fast-non-reasoning/ https://developer.puter.com/ai/x-ai/grok-4-1-fast-non-reasoning/ Wed, 19 Nov 2025 00:00:00 GMT Grok 4.1 Fast Non-Reasoning is the low-latency, non-reasoning variant of Grok 4.1 Fast that skips extended chain-of-thought for speed-critical applications. It shares the same model weights and 2M context window as Grok 4.1 Fast but delivers instant responses without deliberation overhead, ideal for real-time customer support and streaming interactions. xAI Allen AI: Olmo 3 7B Instruct https://developer.puter.com/ai/allenai/olmo-3-7b-instruct/ https://developer.puter.com/ai/allenai/olmo-3-7b-instruct/ Wed, 19 Nov 2025 00:00:00 GMT OLMo 3 7B Instruct is a lightweight, fully open instruction-tuned chat model from the Allen Institute for AI (AI2), designed for instruction-following, question-answering, and multi-turn conversational dialogue. Among 7B-scale models, it is competitive with Qwen 2.5 and Gemma 3 equivalents, and represents a clear step up from Llama 3.1 8B in instruction-following quality. It supports a 66K token context window with a knowledge cutoff of December 2024. Released under a fully open license with complete training weights and data publicly available, it's well-suited for cost-efficient API usage where a capable small model is preferred. Allen AI Google: Gemini 3 Pro https://developer.puter.com/ai/google/gemini-3-pro-preview/ https://developer.puter.com/ai/google/gemini-3-pro-preview/ Tue, 18 Nov 2025 00:00:00 GMT Gemini 3 Pro is Google's most intelligent model, delivering state-of-the-art performance in reasoning, multimodal understanding, and agentic coding. It handles text, images, video, audio, and code with a 1M token context window and advanced tool-calling capabilities. Google OpenAI: GPT-5.1 https://developer.puter.com/ai/openai/gpt-5.1/ https://developer.puter.com/ai/openai/gpt-5.1/ Thu, 13 Nov 2025 00:00:00 GMT GPT-5.1 is OpenAI's model that dynamically adapts reasoning time based on task complexity, making it faster and more token-efficient on simpler tasks. It features 8 customizable personalities and supports multimodal inputs. OpenAI OpenAI: GPT-5.1 Chat https://developer.puter.com/ai/openai/gpt-5.1-chat/ https://developer.puter.com/ai/openai/gpt-5.1-chat/ Thu, 13 Nov 2025 00:00:00 GMT GPT-5.1 Chat is the conversational variant of GPT-5.1 used in ChatGPT with a warmer personality by default. It's available as gpt-5.1-chat-latest in the API for non-reasoning chat interactions. OpenAI OpenAI: GPT-5.1 Codex https://developer.puter.com/ai/openai/gpt-5.1-codex/ https://developer.puter.com/ai/openai/gpt-5.1-codex/ Thu, 13 Nov 2025 00:00:00 GMT GPT-5.1 Codex is a version of GPT-5.1 optimized for agentic coding tasks in Codex or similar environments. It's designed for long-running coding workflows with enhanced code generation capabilities. OpenAI OpenAI: GPT-5.1 Codex Mini https://developer.puter.com/ai/openai/gpt-5.1-codex-mini/ https://developer.puter.com/ai/openai/gpt-5.1-codex-mini/ Thu, 13 Nov 2025 00:00:00 GMT GPT-5.1 Codex Mini is a smaller, more cost-effective version of GPT-5.1 Codex providing approximately 4x more usage within subscription limits. It balances coding capability with efficiency. OpenAI Deep Cogito: Cogito v2.1 671B https://developer.puter.com/ai/deepcogito/cogito-v2.1-671b/ https://developer.puter.com/ai/deepcogito/cogito-v2.1-671b/ Thu, 13 Nov 2025 00:00:00 GMT Cogito v2.1 671B is a 671-billion-parameter Mixture-of-Experts language model from DeepCogito, with 37 billion parameters active per forward pass. Built with reinforcement learning via self-play, it excels at instruction following, coding, complex reasoning, multi-turn dialogue, and creative writing. It features a hybrid reasoning mode that produces results comparable to or better than DeepSeek R1 while using roughly 60% fewer reasoning tokens. It supports a 128K context window and 30+ languages. Benchmark highlights include 98.57% on MATH-500, 77.72% on GPQA Diamond, 84.69% on MMLU Pro, and 89.47% on AIME 2025. A strong open-weight choice for agents, coding assistants, or math-heavy applications needing frontier-level performance with token-efficient reasoning. Deep Cogito Moonshot AI: Kimi K2 Thinking https://developer.puter.com/ai/moonshotai/kimi-k2-thinking/ https://developer.puter.com/ai/moonshotai/kimi-k2-thinking/ Thu, 06 Nov 2025 00:00:00 GMT Kimi K2 Thinking is Moonshot AI's reasoning-enhanced variant of Kimi K2, trained to interleave step-by-step chain-of-thought with dynamic tool calls. It supports up to 200–300 sequential tool calls without drift, enabling deep autonomous research, coding, and analysis workflows. It achieves 71.3% on SWE-bench Verified, 44.9% on Humanity's Last Exam (with tools), 60.2% on BrowseComp, and 99.1% on AIME 2025 (with Python) — placing it among the top open-source thinking models. It uses native INT4 quantization and a 256K context window. K2 Thinking is designed for complex, multi-step tasks where extended reasoning and sustained tool orchestration matter more than low-latency responses. Moonshot AI Anthropic: Claude Opus 4.5 https://developer.puter.com/ai/anthropic/claude-opus-4-5/ https://developer.puter.com/ai/anthropic/claude-opus-4-5/ Sat, 01 Nov 2025 00:00:00 GMT Claude Opus 4.5 was released in November 2025. It sets the standard for production code, sophisticated agents, and complex enterprise tasks—scoring higher than Anthropic's own engineering candidates on technical tests. Anthropic Amazon: Nova Premier 1.0 https://developer.puter.com/ai/amazon/nova-premier-v1/ https://developer.puter.com/ai/amazon/nova-premier-v1/ Fri, 31 Oct 2025 00:00:00 GMT Amazon Nova Premier is the most capable multimodal model in the Nova family, designed for complex reasoning tasks requiring the highest accuracy. It processes text, images, and video with advanced understanding capabilities and serves as the best teacher model for distilling custom variants of smaller Nova models. Best suited for sophisticated enterprise applications demanding top-tier intelligence. Amazon Allen AI: Olmo 3 7B Think https://developer.puter.com/ai/allenai/olmo-3-7b-think/ https://developer.puter.com/ai/allenai/olmo-3-7b-think/ Fri, 31 Oct 2025 00:00:00 GMT OLMo 3 7B Think is an efficient reasoning model from the Allen Institute for AI (AI2), purpose-built for multi-step problem solving in math, coding, and general analytical tasks. Trained using a thinking SFT, thinking DPO, and RLVR pipeline, it generates structured chain-of-thought reasoning traces. On math benchmarks it matches Qwen 3 8B on MATH and comes within a few points on AIME 2024 and 2025. On coding, it leads similarly-sized models on HumanEvalPlus. Fully open under a permissive license, it is the most capable fully open reasoning option at the 7B scale — ideal for API use cases requiring strong reasoning at a small model footprint. Allen AI Perplexity: Sonar Pro Search https://developer.puter.com/ai/perplexity/sonar-pro-search/ https://developer.puter.com/ai/perplexity/sonar-pro-search/ Thu, 30 Oct 2025 00:00:00 GMT Sonar Pro Search is Perplexity's most advanced agentic search system, available exclusively via OpenRouter API, adding autonomous multi-step reasoning to Sonar Pro. Instead of single query+synthesis, it plans and executes entire research workflows using tools, making it ideal for deeper reasoning and analysis. This model powers Perplexity's Pro Search mode on their consumer platform. Perplexity Mistral AI: Voxtral Small 24B https://developer.puter.com/ai/mistralai/voxtral-small-24b-2507/ https://developer.puter.com/ai/mistralai/voxtral-small-24b-2507/ Thu, 30 Oct 2025 00:00:00 GMT Voxtral Small 24B is an open-source speech understanding model built on Mistral Small 3.1 under Apache 2.0. It handles transcription, translation, Q&A, and summarization directly from audio in 8+ languages with 32K token context. Mistral AI OpenAI: GPT-OSS Safeguard 20B https://developer.puter.com/ai/openai/gpt-oss-safeguard-20b/ https://developer.puter.com/ai/openai/gpt-oss-safeguard-20b/ Wed, 29 Oct 2025 00:00:00 GMT GPT-OSS Safeguard 20B is a safety-focused variant of the 20B open-weight model with additional content moderation capabilities. It includes enhanced safeguards for responsible deployment. OpenAI IBM Granite: Granite 4.0 Micro https://developer.puter.com/ai/ibm-granite/granite-4.0-h-micro/ https://developer.puter.com/ai/ibm-granite/granite-4.0-h-micro/ Mon, 20 Oct 2025 00:00:00 GMT Granite 4.0 Micro is a 3B-parameter dense language model from IBM, built on a conventional transformer architecture and optimized for low-latency, cost-efficient workloads. Despite its compact size, it significantly outperforms its predecessor Granite 3.3 8B across the board — a model more than twice its size. It scores 16 on the Artificial Analysis Intelligence Index, placing ahead of Gemma 3 4B (15). In RAG benchmarks, it outperforms much larger models including Llama 3.3 70B and Qwen3 8B. The model natively supports tool calling, function calling, multilingual generation, fill-in-the-middle code completion, RAG, and structured JSON output, with a 128K token context window. It's a strong fit for agentic sub-tasks, API orchestration, and scenarios where speed and cost matter more than peak reasoning power. IBM Granite Microsoft: Phi 4 Mini https://developer.puter.com/ai/microsoft/phi-4-mini-instruct/ https://developer.puter.com/ai/microsoft/phi-4-mini-instruct/ Fri, 17 Oct 2025 00:00:00 GMT Phi-4 Mini is a 3.8-billion-parameter small language model developed by Microsoft, designed to deliver strong reasoning performance in a compact form factor. It uses a dense decoder-only Transformer architecture with grouped-query attention and supports context lengths up to 128K tokens. The model excels at math, logic, coding, instruction following, and function calling — making it well-suited for agentic workflows that integrate external tools and APIs. It supports over 20 languages thanks to its expanded 200K-token vocabulary. Despite its small size, Phi-4 Mini performs competitively with much larger models on text-based reasoning tasks. It scored 88.6% on GSM8K and 83.7% on ARC-Challenge. It's a strong choice for developers who need capable reasoning at low latency and minimal compute cost. Microsoft Google: Veo 3.1 https://developer.puter.com/ai/google/veo-3.1/ https://developer.puter.com/ai/google/veo-3.1/ Wed, 15 Oct 2025 00:00:00 GMT Veo 3.1 is Google DeepMind's flagship AI video generation model, offering the highest quality output in the Veo family. It generates up to 4K resolution video with natively synchronized audio — including dialogue, sound effects, and ambient noise — all produced in a single joint diffusion process. The model supports text-to-video, image-to-video, reference image guidance (up to 3 images), and frame-to-frame generation. Clips are 8 seconds at base, extendable to over 60 seconds via scene chaining. Both 16:9 and native 9:16 aspect ratios are supported. Lip-sync accuracy sits under 120ms. Veo 3.1 achieved top human-preference scores on MovieGenBench for prompt adherence, visual quality, and audio sync, and state-of-the-art results on VBench I2V. It was the first major AI video model to support true 4K output. Best suited for high-fidelity creative and production work where quality is the priority over speed or cost. Google Google: Veo 3.1 Fast https://developer.puter.com/ai/google/veo-3.1-fast/ https://developer.puter.com/ai/google/veo-3.1-fast/ Wed, 15 Oct 2025 00:00:00 GMT Veo 3.1 Fast is the speed-optimized variant of Google DeepMind's Veo 3.1 video model, generating output roughly twice as fast as the standard version with only a minor quality trade-off. Independent testing shows quality differences of 1–8% depending on scene complexity — negligible for most use cases. It retains the full feature set of the standard model: native audio generation, text-to-video, image-to-video, reference images, frame-to-frame generation, and support for 720p, 1080p, and 4K resolutions. An 8-second 720p clip typically completes in 30–45 seconds. At roughly one-fifth the per-second cost of the standard model, Veo 3.1 Fast is well suited for rapid prototyping, iterative prompt testing, and production workflows where turnaround time and budget matter more than maximum fidelity. Google Anthropic: Claude Haiku 4.5 https://developer.puter.com/ai/anthropic/claude-haiku-4-5/ https://developer.puter.com/ai/anthropic/claude-haiku-4-5/ Wed, 15 Oct 2025 00:00:00 GMT Claude Haiku 4.5 is Anthropic's small, fast model released October 2025, optimized for low latency and cost. Despite being the cheapest option ($1/$5 per million tokens), it matches Sonnet 4 on coding benchmarks (73.3% SWE-bench). Anthropic Qwen: Qwen3 VL 8B Thinking https://developer.puter.com/ai/qwen/qwen3-vl-8b-thinking/ https://developer.puter.com/ai/qwen/qwen3-vl-8b-thinking/ Tue, 14 Oct 2025 00:00:00 GMT Qwen3 VL 8B Thinking is the reasoning-enhanced compact vision model for complex visual analysis requiring step-by-step reasoning with efficient resource usage. Qwen OpenAI: OpenAI o3 Deep Research https://developer.puter.com/ai/openai/o3-deep-research/ https://developer.puter.com/ai/openai/o3-deep-research/ Fri, 10 Oct 2025 00:00:00 GMT OpenAI o3 Deep Research is a powerful model that searches and synthesizes hundreds of sources to create comprehensive research reports. It's optimized for browsing and data analysis at research analyst level. OpenAI OpenAI: OpenAI o4 Mini Deep Research https://developer.puter.com/ai/openai/o4-mini-deep-research/ https://developer.puter.com/ai/openai/o4-mini-deep-research/ Fri, 10 Oct 2025 00:00:00 GMT OpenAI o4 Mini Deep Research is a faster, more affordable deep research model for complex multi-step research tasks. It can synthesize information from web search and internal data sources. OpenAI NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 https://developer.puter.com/ai/nvidia/llama-3.3-nemotron-super-49b-v1.5/ https://developer.puter.com/ai/nvidia/llama-3.3-nemotron-super-49b-v1.5/ Fri, 10 Oct 2025 00:00:00 GMT Llama 3.3 Nemotron Super 49B v1.5 is an upgraded 49B parameter reasoning model derived from Llama 3.3 70B Instruct, optimized for single-GPU deployment on H100/H200 through Neural Architecture Search. It supports 128K context and is post-trained for agentic workflows including RAG, tool calling, and multi-turn conversations. NVIDIA Baidu: ERNIE 4.5 21B A3B Thinking https://developer.puter.com/ai/baidu/ernie-4.5-21b-a3b-thinking/ https://developer.puter.com/ai/baidu/ernie-4.5-21b-a3b-thinking/ Thu, 09 Oct 2025 00:00:00 GMT ERNIE 4.5 21B A3B Thinking is Baidu's reasoning-enhanced language model built on the 21B A3B architecture with explicit chain-of-thought capabilities. It activates only 3B of its 21B parameters per token while specializing in logic, mathematics, coding, and multi-step reasoning tasks. The model supports extended context up to 131K tokens and is optimized for complex problem-solving through structured thinking. Baidu Qwen: Qwen3 VL 30B A3B Instruct https://developer.puter.com/ai/qwen/qwen3-vl-30b-a3b-instruct/ https://developer.puter.com/ai/qwen/qwen3-vl-30b-a3b-instruct/ Mon, 06 Oct 2025 00:00:00 GMT Qwen3 VL 30B A3B Instruct is an efficient vision-language MoE model offering strong image/video understanding with 3B active parameters and 256K context support. Qwen Qwen: Qwen3 VL 30B A3B Thinking https://developer.puter.com/ai/qwen/qwen3-vl-30b-a3b-thinking/ https://developer.puter.com/ai/qwen/qwen3-vl-30b-a3b-thinking/ Mon, 06 Oct 2025 00:00:00 GMT Qwen3 VL 30B A3B Thinking is the reasoning-enhanced vision-language variant optimized for complex visual reasoning tasks with extended thinking capabilities. Qwen OpenAI: GPT-5 Pro https://developer.puter.com/ai/openai/gpt-5-pro/ https://developer.puter.com/ai/openai/gpt-5-pro/ Mon, 06 Oct 2025 00:00:00 GMT GPT-5 Pro is an enhanced version of GPT-5 that thinks longer using parallel test-time compute to provide the highest quality answers. It replaces o3-pro for complex enterprise and research tasks. OpenAI OpenAI: GPT Image 1 Mini https://developer.puter.com/ai/openai/gpt-image-1-mini/ https://developer.puter.com/ai/openai/gpt-image-1-mini/ Wed, 01 Oct 2025 00:00:00 GMT GPT Image 1 Mini is OpenAI's cost-optimized image generation model released in October 2025, offering the same capabilities as GPT Image 1 at approximately 80% lower cost. It's designed for high-throughput production use cases where cost and latency are priorities over peak image fidelity. The model trades some fine detail and photorealism for significantly reduced pricing. OpenAI ByteDance Seed: Seedream 4.0 https://developer.puter.com/ai/bytedance-seed/seedream-4.0/ https://developer.puter.com/ai/bytedance-seed/seedream-4.0/ Wed, 01 Oct 2025 00:00:00 GMT ByteDance Seed Z.AI: GLM 4.6 https://developer.puter.com/ai/z-ai/glm-4.6/ https://developer.puter.com/ai/z-ai/glm-4.6/ Tue, 30 Sep 2025 00:00:00 GMT GLM-4.6 is Zhipu AI's 355B-parameter (32B active) flagship text model with 200K context, excelling at coding, agentic workflows, and search tasks. It's 15% more token-efficient than GLM-4.5 and ranks as the #1 domestic model in China. Z.AI DeepSeek: DeepSeek V3.2 Exp https://developer.puter.com/ai/deepseek/deepseek-v3.2-exp/ https://developer.puter.com/ai/deepseek/deepseek-v3.2-exp/ Mon, 29 Sep 2025 00:00:00 GMT DeepSeek V3.2-Exp is the September 2025 experimental predecessor to V3.2, introducing DeepSeek Sparse Attention architecture through continued training on V3.1-Terminus. It served as a testing ground for the sparse attention innovations later refined in V3.2. DeepSeek Anthropic: Claude Sonnet 4.5 https://developer.puter.com/ai/anthropic/claude-sonnet-4-5/ https://developer.puter.com/ai/anthropic/claude-sonnet-4-5/ Mon, 29 Sep 2025 00:00:00 GMT Claude Sonnet 4.5 is Anthropic's most capable model for agents and computer use, released September 2025. It can maintain focus for 30+ hours on complex tasks, supports a 1M token context window (beta), and is described as their "most aligned frontier model." Anthropic TheDrummer: Cydonia 24B V4.1 https://developer.puter.com/ai/thedrummer/cydonia-24b-v4.1/ https://developer.puter.com/ai/thedrummer/cydonia-24b-v4.1/ Sat, 27 Sep 2025 00:00:00 GMT Cydonia 24B v4.1 is a 24-billion parameter uncensored creative writing model based on Mistral Small 3.2, optimized for roleplay, storytelling, and long-form narratives with a 131K token context window. It excels at character consistency, descriptive prose without being overly flowery, and maintains good recall and prompt adherence. The model also performs well for coding and instruction-following tasks. TheDrummer Relace: Relace Apply 3 https://developer.puter.com/ai/relace/relace-apply-3/ https://developer.puter.com/ai/relace/relace-apply-3/ Fri, 26 Sep 2025 00:00:00 GMT Relace Apply 3 is a specialized code-patching model that merges AI-generated code edits into existing source files at up to 10,000 tokens per second. It supports a 256K context window and works with diffs from models like Claude and GPT-4, making code integration fast and reliable. Relace Google: Gemini 2.5 Flash Lite Preview 09-2025 https://developer.puter.com/ai/google/gemini-2.5-flash-lite-preview-09-2025/ https://developer.puter.com/ai/google/gemini-2.5-flash-lite-preview-09-2025/ Thu, 25 Sep 2025 00:00:00 GMT Gemini 2.5 Flash-Lite Preview (September 2025) is a preview version of Google's cost-optimized Flash-Lite model. It's designed for high-volume classification, translation, and routing tasks with improved cost efficiency. Google Qwen: Qwen3 Max https://developer.puter.com/ai/qwen/qwen3-max/ https://developer.puter.com/ai/qwen/qwen3-max/ Tue, 23 Sep 2025 00:00:00 GMT Qwen3 Max is the most powerful Qwen3 API model with SOTA agent programming and tool usage capabilities. It features non-thinking mode optimized for complex agent scenarios. Qwen Qwen: Qwen3 VL 235B A22B Thinking https://developer.puter.com/ai/qwen/qwen3-vl-235b-a22b-thinking/ https://developer.puter.com/ai/qwen/qwen3-vl-235b-a22b-thinking/ Tue, 23 Sep 2025 00:00:00 GMT Qwen3 VL 235B A22B Thinking is the reasoning-enhanced vision-language model excelling at visual math, detail analysis, and causal reasoning with extended chain-of-thought processing. Qwen Qwen: Qwen3-VL Plus https://developer.puter.com/ai/qwen/qwen3-vl-plus/ https://developer.puter.com/ai/qwen/qwen3-vl-plus/ Tue, 23 Sep 2025 00:00:00 GMT Qwen3-VL Plus is Alibaba Cloud's hosted vision-language API model in the Qwen3-VL series, offering strong multimodal understanding without requiring self-hosted infrastructure. It handles a wide range of visual tasks including document parsing, chart analysis, OCR, image reasoning, and GUI interaction for PC and mobile interfaces. With a 262K token context window, it is well suited for processing lengthy documents, multi-page PDFs, and extended visual conversations in a single request. The model supports structured output and tool calling, making it a practical choice for developers building document intelligence pipelines, visual agents, and multimodal data extraction workflows via the OpenAI-compatible Alibaba Cloud Model Studio API. Qwen OpenAI: GPT-5 Codex https://developer.puter.com/ai/openai/gpt-5-codex/ https://developer.puter.com/ai/openai/gpt-5-codex/ Tue, 23 Sep 2025 00:00:00 GMT GPT-5 Codex is a version of GPT-5 optimized for agentic coding tasks in Codex or similar environments. It's designed for software engineering workflows with enhanced code generation capabilities. OpenAI DeepSeek: DeepSeek V3.1 Terminus https://developer.puter.com/ai/deepseek/deepseek-v3.1-terminus/ https://developer.puter.com/ai/deepseek/deepseek-v3.1-terminus/ Mon, 22 Sep 2025 00:00:00 GMT DeepSeek V3.1-Terminus is the September 2025 refined update to V3.1, addressing user-reported issues like language mixing and improving Code Agent and Search Agent capabilities. It represents the final, most stable version of the V3 architecture before V3.2. DeepSeek xAI: Grok 4 Fast https://developer.puter.com/ai/x-ai/grok-4-fast/ https://developer.puter.com/ai/x-ai/grok-4-fast/ Fri, 19 Sep 2025 00:00:00 GMT Grok 4 Fast is an optimized variant delivering performance similar to Grok 4 but using 40% fewer thinking tokens with a massive 2 million token context window. It offers up to 64x cost reduction versus early frontier models like OpenAI's o3. xAI xAI: Grok 4 Fast Non-Reasoning https://developer.puter.com/ai/x-ai/grok-4-fast-non-reasoning/ https://developer.puter.com/ai/x-ai/grok-4-fast-non-reasoning/ Fri, 19 Sep 2025 00:00:00 GMT Grok 4 Fast Non-Reasoning is the speed-optimized, non-reasoning variant of Grok 4 Fast that bypasses extended chain-of-thought for instant responses. It uses the same unified architecture with a 2M context window but skips deliberation, delivering up to 342 tokens/second throughput for high-volume, latency-sensitive applications. xAI Alibaba: Tongyi DeepResearch 30B A3B https://developer.puter.com/ai/alibaba/tongyi-deepresearch-30b-a3b/ https://developer.puter.com/ai/alibaba/tongyi-deepresearch-30b-a3b/ Thu, 18 Sep 2025 00:00:00 GMT Tongyi DeepResearch 30B A3B is an agentic large language model from Alibaba's Tongyi Lab, purpose-built for long-horizon, multi-step information-seeking and web research tasks. It uses a Mixture-of-Experts architecture with 30.5B total parameters but only 3.3B activated per token, keeping inference costs low. The model achieves state-of-the-art results across agentic research benchmarks, scoring 32.9 on Humanity's Last Exam, 43.4 on BrowseComp, 70.9 on GAIA, 75.0 on xbench-DeepSearch, and 90.6 on FRAMES — outperforming OpenAI o3 and DeepSeek-V3.1 on most of these tasks. It supports a 128K context window and two inference modes: a standard ReAct mode and a heavier iterative research mode for maximum performance. Best suited for developers building autonomous research agents, deep fact-finding pipelines, or complex multi-source synthesis workflows — especially where cost efficiency matters. Alibaba Qwen: Qwen3-Omni Flash https://developer.puter.com/ai/qwen/qwen3-omni-flash/ https://developer.puter.com/ai/qwen/qwen3-omni-flash/ Mon, 15 Sep 2025 00:00:00 GMT Qwen3-Omni Flash is a fast, cost-efficient omni-modal model from Alibaba's Qwen3 series, designed for real-time multimodal applications. As a member of the Qwen3-Omni family, it ingests text, images, audio, and video in a single end-to-end architecture — no separate pipelines or modality-switching required. It produces text responses and supports low-latency streaming, making it well suited for voice assistants, live audio/video analysis, and cost-sensitive production workloads. The Flash tier prioritizes speed and throughput over the maximum capability of the full Qwen3-Omni model, with a 65K context window and 16K output limit optimized for shorter media clips and high-volume inference. Developers building real-time assistants, transcription tools, or multimodal agents who need broad input coverage at a lower cost point will find it a practical choice. Qwen Qwen: Qwen Plus 0728 https://developer.puter.com/ai/qwen/qwen-plus-2025-07-28/ https://developer.puter.com/ai/qwen/qwen-plus-2025-07-28/ Mon, 08 Sep 2025 00:00:00 GMT Qwen Plus (2025-07-28) is a snapshot version of Qwen Plus from July 2025, offering consistent behavior and performance for production deployments requiring version stability. Qwen Qwen: Qwen Plus 0728 (thinking) https://developer.puter.com/ai/qwen/qwen-plus-2025-07-28:thinking/ https://developer.puter.com/ai/qwen/qwen-plus-2025-07-28:thinking/ Mon, 08 Sep 2025 00:00:00 GMT Qwen Plus (2025-07-28) Thinking is the reasoning-enhanced version that uses chain-of-thought processing for complex problems, providing step-by-step reasoning before delivering answers. Qwen NVIDIA: Nemotron Nano 9B V2 https://developer.puter.com/ai/nvidia/nemotron-nano-9b-v2/ https://developer.puter.com/ai/nvidia/nemotron-nano-9b-v2/ Fri, 05 Sep 2025 00:00:00 GMT Nemotron Nano 9B V2 is a 9B parameter hybrid Mamba-Transformer model trained from scratch by NVIDIA with a 128K context window, achieving up to 6x higher inference throughput than similar models like Qwen3-8B. It features controllable reasoning budget allowing developers to balance accuracy and response time for edge deployment. NVIDIA Moonshot AI: Kimi K2 0905 https://developer.puter.com/ai/moonshotai/kimi-k2-0905/ https://developer.puter.com/ai/moonshotai/kimi-k2-0905/ Thu, 04 Sep 2025 00:00:00 GMT Kimi K2 0905 is Moonshot AI's September 2025 update to the original Kimi K2, delivering enhanced coding performance and improved tool-calling reliability. It shares the same 1-trillion-parameter MoE architecture with 32B active parameters but doubles the context window from 128K to 256K tokens. Key improvements include stronger frontend development capabilities — producing cleaner, more polished UI code for frameworks like React, Vue, and Angular — along with better integration across popular agent scaffolds. It scored 53.7% Pass@1 on LiveCodeBench. This version is ideal for developers who want K2's agentic strengths with improved real-world coding quality and longer context support for large codebases. Moonshot AI Qwen: Qwen3 Next 80B A3B Instruct https://developer.puter.com/ai/qwen/qwen3-next-80b-a3b-instruct/ https://developer.puter.com/ai/qwen/qwen3-next-80b-a3b-instruct/ Mon, 01 Sep 2025 00:00:00 GMT Qwen3 Next 80B A3B Instruct is an innovative MoE model with hybrid attention (Gated DeltaNet + Gated Attention), achieving 10x inference throughput for 32K+ contexts while matching Qwen3-235B performance. Qwen Qwen: Qwen3 Next 80B A3B Thinking https://developer.puter.com/ai/qwen/qwen3-next-80b-a3b-thinking/ https://developer.puter.com/ai/qwen/qwen3-next-80b-a3b-thinking/ Mon, 01 Sep 2025 00:00:00 GMT Qwen3 Next 80B A3B Thinking is the reasoning-enhanced variant outperforming Gemini-2.5-Flash-Thinking on complex reasoning tasks with hybrid attention and multi-token prediction. Qwen Mistral AI: Magistral Medium 1.2 https://developer.puter.com/ai/mistralai/magistral-medium-2509/ https://developer.puter.com/ai/mistralai/magistral-medium-2509/ Mon, 01 Sep 2025 00:00:00 GMT Magistral Medium is Mistral's enterprise reasoning model with chain-of-thought capabilities, scoring 73.6% on AIME2024 (90% with majority voting). It excels in multilingual step-by-step reasoning for legal, financial, and scientific applications. Mistral AI Mistral AI: Magistral Small 1.2 https://developer.puter.com/ai/mistralai/magistral-small-2509/ https://developer.puter.com/ai/mistralai/magistral-small-2509/ Mon, 01 Sep 2025 00:00:00 GMT Magistral Small is a 24B parameter open-source reasoning model under Apache 2.0, achieving 70.7% on AIME2024. It provides traceable, multilingual chain-of-thought reasoning in English, French, Spanish, German, Italian, Arabic, Russian, and Chinese. Mistral AI MiniMax: MiniMax M2 https://developer.puter.com/ai/minimax/minimax-m2/ https://developer.puter.com/ai/minimax/minimax-m2/ Mon, 01 Sep 2025 00:00:00 GMT MiniMax-M2 is a compact MoE model (230B total, 10B active parameters) optimized for coding and agentic workflows with a 128K context window. It ranks #1 among open-source models for tool use and agent tasks, delivering elite performance in multi-step development workflows at 8% the cost of comparable models. MiniMax Meituan: LongCat Flash Chat https://developer.puter.com/ai/meituan/longcat-flash-chat/ https://developer.puter.com/ai/meituan/longcat-flash-chat/ Mon, 01 Sep 2025 00:00:00 GMT LongCat Flash Chat is a 560-billion-parameter Mixture-of-Experts (MoE) language model developed by Meituan, dynamically activating roughly 27B parameters per token for an efficient balance of capability and cost. As a non-thinking foundation model, it's optimized for conversational and agentic tasks, with particular strengths in tool use and multi-step interactions. It supports a 128K-token context window and delivers over 100 tokens per second at inference. On benchmarks, it scores 86.5 on ArenaHard-V2, 89.7 on MMLU, and 67.7 on τ²-Bench, performing competitively with models like DeepSeek-V3.1 and Kimi-K2 while activating fewer parameters. A strong pick for developers building agentic workflows, coding assistants, or complex tool-calling pipelines where speed and efficiency matter. Meituan KwaiPilot: KAT-Coder-Pro V1 https://developer.puter.com/ai/kwaipilot/kat-coder-pro/ https://developer.puter.com/ai/kwaipilot/kat-coder-pro/ Mon, 01 Sep 2025 00:00:00 GMT KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model, built by Kuaishou's Kwaipilot team and designed specifically for real-world software engineering tasks. It achieves a 73.4% solve rate on SWE-Bench Verified, reflecting strong performance on practical code generation and bug-fixing scenarios. The model has been optimized for tool-use capability, multi-turn interaction, and instruction following through a multi-stage training pipeline that includes supervised fine-tuning, reinforcement fine-tuning, and agentic RL. KAT-Coder-Pro V1 supports multi-tool parallel invocation, enabling it to complete complex agentic workflows with fewer interaction rounds. It offers a 256K-token context window and up to 128K output tokens. It's a text-only, non-reasoning model — so expect direct responses without chain-of-thought overhead, well-suited for coding agents and automated engineering pipelines. KwaiPilot Google: Gemini 2.5 Flash Preview 09-2025 https://developer.puter.com/ai/google/gemini-2.5-flash-preview-09-2025/ https://developer.puter.com/ai/google/gemini-2.5-flash-preview-09-2025/ Mon, 01 Sep 2025 00:00:00 GMT Gemini 2.5 Flash Preview (September 2025) is a preview version of Google's hybrid reasoning Flash model with controllable thinking capabilities. It balances quality, cost, and latency for enterprise-scale applications. Google Mistral AI: Codestral (August 2025) https://developer.puter.com/ai/mistralai/codestral-2508/ https://developer.puter.com/ai/mistralai/codestral-2508/ Fri, 29 Aug 2025 00:00:00 GMT Codestral is Mistral's cutting-edge code generation model supporting 80+ programming languages with optimized low-latency performance. It specializes in fill-in-the-middle completion, code correction, and test generation with 2.5x faster performance than its predecessor. Mistral AI xAI: Grok Code Fast 1 https://developer.puter.com/ai/x-ai/grok-code-fast-1/ https://developer.puter.com/ai/x-ai/grok-code-fast-1/ Thu, 28 Aug 2025 00:00:00 GMT Grok Code Fast 1 is a speedy, economical reasoning model built from scratch specifically for agentic coding workflows, released August 2025. It excels at TypeScript, Python, Java, Rust, C++, and Go with a 256K context window and ~92 tokens/second throughput. xAI Qwen: Qwen3 30B A3B Thinking 2507 https://developer.puter.com/ai/qwen/qwen3-30b-a3b-thinking-2507/ https://developer.puter.com/ai/qwen/qwen3-30b-a3b-thinking-2507/ Thu, 28 Aug 2025 00:00:00 GMT Qwen3 30B A3B Thinking (2507) is the reasoning-enhanced variant optimized for complex problem-solving with extended chain-of-thought processing at high parameter efficiency. Qwen Nous Research: Hermes 4 405B https://developer.puter.com/ai/nousresearch/hermes-4-405b/ https://developer.puter.com/ai/nousresearch/hermes-4-405b/ Tue, 26 Aug 2025 00:00:00 GMT Hermes 4 405B is a frontier hybrid-mode reasoning model based on Llama-3.1-405B, trained on a 60B token dataset with verified reasoning traces. It features toggleable deep reasoning via think tags, massive improvements in math, code, STEM, and logic, and achieves state-of-the-art on RefusalBench for reduced censorship. Nous Research Nous Research: Hermes 4 70B https://developer.puter.com/ai/nousresearch/hermes-4-70b/ https://developer.puter.com/ai/nousresearch/hermes-4-70b/ Tue, 26 Aug 2025 00:00:00 GMT Hermes 4 70B is a hybrid reasoning model based on Llama-3.1-70B with toggleable deep thinking mode using think tags. It offers major improvements in math, code, STEM, logic, and creative writing while supporting JSON schema adherence, function calling, and reduced refusal rates compared to other models. Nous Research DeepSeek: DeepSeek V3.1 https://developer.puter.com/ai/deepseek/deepseek-chat-v3.1/ https://developer.puter.com/ai/deepseek/deepseek-chat-v3.1/ Thu, 21 Aug 2025 00:00:00 GMT DeepSeek V3.1 is an August 2025 hybrid model that combines the capabilities of V3 and R1, supporting both thinking and non-thinking modes via chat template switching. It features 671B parameters (37B activated), 128K context, and significantly improved tool-calling and agent capabilities. DeepSeek NVIDIA: Nemotron Nano 12B 2 VL https://developer.puter.com/ai/nvidia/nemotron-nano-12b-v2-vl/ https://developer.puter.com/ai/nvidia/nemotron-nano-12b-v2-vl/ Mon, 18 Aug 2025 00:00:00 GMT Nemotron Nano 12B V2 VL is a 12.6B parameter multimodal vision-language model built on a hybrid Mamba-Transformer architecture for document intelligence and video understanding. It processes multiple images, documents, and videos while achieving leading results on OCRBench v2 with up to 2.5x higher throughput using Efficient Video Sampling. NVIDIA OpenAI: GPT-4o Audio Preview https://developer.puter.com/ai/openai/gpt-4o-audio-preview/ https://developer.puter.com/ai/openai/gpt-4o-audio-preview/ Fri, 15 Aug 2025 00:00:00 GMT GPT-4o Audio Preview is a model for audio inputs and outputs with the Chat Completions API. It enables speech-in, speech-out conversational interactions and audio generation capabilities. OpenAI Google: Imagen 4 Fast https://developer.puter.com/ai/google/imagen-4.0-fast/ https://developer.puter.com/ai/google/imagen-4.0-fast/ Fri, 15 Aug 2025 00:00:00 GMT Imagen 4 Fast is Google's speed-optimized text-to-image model offering generation up to 10x faster than Imagen 3 at just $0.02 per image. It's ideal for rapid prototyping, high-volume tasks, and iterative exploration while maintaining improved text rendering and style versatility. Google Google: Imagen 4 Ultra https://developer.puter.com/ai/google/imagen-4.0-ultra/ https://developer.puter.com/ai/google/imagen-4.0-ultra/ Fri, 15 Aug 2025 00:00:00 GMT Imagen 4 Ultra is Google's highest-fidelity text-to-image model designed for professional-grade realism with superior prompt adherence and nuanced interpretation of complex scenes. It delivers exceptional detail in textures, lighting, and atmosphere with 2K resolution output at $0.06 per image. Google Google: Imagen 4 https://developer.puter.com/ai/google/imagen-4.0/ https://developer.puter.com/ai/google/imagen-4.0/ Fri, 15 Aug 2025 00:00:00 GMT Imagen 4 is Google DeepMind's flagship text-to-image generation model, available through the Gemini API and Google AI Studio. It delivers significant improvements over Imagen 3, particularly in rendering text, typography, and fine details like intricate fabrics and textures. The model supports output up to 2K resolution across a range of aspect ratios, generating images in roughly 2.5 seconds. A Fast variant optimized for high-volume use runs at $0.02 per image, while the standard model is $0.04 and the Ultra tier—built for precise prompt adherence—is $0.06. In human evaluations on GenAI-Bench, Imagen 4 scored highly against other leading image generation models on overall preference. All outputs are embedded with Google's SynthID watermark for AI-content traceability. It's a strong fit for developers building creative tools, marketing asset pipelines, or any application requiring reliable, high-quality image generation from text prompts. Google Mistral AI: Mistral Medium 3.1 https://developer.puter.com/ai/mistralai/mistral-medium-3.1/ https://developer.puter.com/ai/mistralai/mistral-medium-3.1/ Wed, 13 Aug 2025 00:00:00 GMT Mistral Medium 3.1 (August 2025) is a frontier-class multimodal model with improved tone and performance. It features 128K context, native vision, and enhanced reasoning for STEM and enterprise workflows at competitive pricing. Mistral AI Mistral AI: Mistral Medium 3.1 https://developer.puter.com/ai/mistralai/mistral-medium-2508/ https://developer.puter.com/ai/mistralai/mistral-medium-2508/ Tue, 12 Aug 2025 00:00:00 GMT Mistral Medium 3.1 is Mistral's frontier-class multimodal model released August 2025 with 128K context. It delivers near-frontier performance at $0.4/$2 per million tokens, excelling in reasoning, coding, and enterprise workflows. Mistral AI Baidu: ERNIE 4.5 21B A3B https://developer.puter.com/ai/baidu/ernie-4.5-21b-a3b/ https://developer.puter.com/ai/baidu/ernie-4.5-21b-a3b/ Tue, 12 Aug 2025 00:00:00 GMT ERNIE 4.5 21B A3B is a lightweight text-only language model from Baidu using a Mixture-of-Experts architecture with 21B total parameters but only 3B active per token. It excels at general language understanding, generation, reasoning, and coding tasks while remaining computationally efficient. Released under Apache 2.0, it achieves competitive performance against larger models like Qwen3-30B-A3B despite having 30% fewer total parameters. Baidu Baidu: ERNIE 4.5 VL 28B A3B https://developer.puter.com/ai/baidu/ernie-4.5-vl-28b-a3b/ https://developer.puter.com/ai/baidu/ernie-4.5-vl-28b-a3b/ Tue, 12 Aug 2025 00:00:00 GMT ERNIE 4.5 VL 28B A3B is a lightweight multimodal vision-language model with 28B total parameters but only 3B active per token. It processes both images and text simultaneously, enabling tasks like image comprehension, chart analysis, document understanding, and cross-modal reasoning. The model offers both thinking and non-thinking modes while matching performance of larger models like Qwen2.5-VL-32B. Baidu Z.AI: GLM 4.5V https://developer.puter.com/ai/z-ai/glm-4.5v/ https://developer.puter.com/ai/z-ai/glm-4.5v/ Mon, 11 Aug 2025 00:00:00 GMT GLM-4.5V is a 106B-parameter vision-language model achieving SOTA on 42 multimodal benchmarks, capable of image/video reasoning, GUI agent tasks, document parsing, and visual grounding. It features a thinking mode toggle and 64K multimodal context under MIT license. Z.AI AI21 Labs: Jamba Large 1.7 https://developer.puter.com/ai/ai21/jamba-large-1.7/ https://developer.puter.com/ai/ai21/jamba-large-1.7/ Fri, 08 Aug 2025 00:00:00 GMT Jamba Large 1.7 is AI21 Labs' flagship open-weight language model, built on a hybrid SSM-Transformer (Mamba-Transformer) architecture with a Mixture of Experts design — 398B total parameters with 94B active during inference. Its standout feature is a 256K-token context window, making it well suited for processing lengthy documents, contracts, and knowledge bases. The model supports function calling, JSON mode, and nine languages including English, Spanish, French, German, and Arabic. Jamba Large 1.7 emphasizes grounding and instruction-following, delivering contextually faithful responses with strong steerability. It generates output at roughly 69 tokens per second via the AI21 API. It targets enterprise workflows in domains like finance, healthcare, and legal — where long-context accuracy and data control matter most. AI21 Labs OpenAI: GPT-5 https://developer.puter.com/ai/openai/gpt-5/ https://developer.puter.com/ai/openai/gpt-5/ Thu, 07 Aug 2025 00:00:00 GMT GPT-5 is OpenAI's unified reasoning system combining a fast model, a deeper thinking model, and an automatic router. It achieves 45% fewer factual errors than GPT-4o and sets state-of-the-art scores on math, coding, and health benchmarks. OpenAI OpenAI: GPT-5 Chat https://developer.puter.com/ai/openai/gpt-5-chat/ https://developer.puter.com/ai/openai/gpt-5-chat/ Thu, 07 Aug 2025 00:00:00 GMT GPT-5 Chat is the non-reasoning version of GPT-5 used in ChatGPT, designed for conversational interactions. It's available as gpt-5-chat-latest in the API and provides fast responses without extended thinking. OpenAI OpenAI: GPT-5 Mini https://developer.puter.com/ai/openai/gpt-5-mini/ https://developer.puter.com/ai/openai/gpt-5-mini/ Thu, 07 Aug 2025 00:00:00 GMT GPT-5 Mini is a faster, more cost-efficient version of GPT-5 optimized for well-defined tasks and precise prompts. It provides a balance between performance and speed for everyday use cases. OpenAI OpenAI: GPT-5 Nano https://developer.puter.com/ai/openai/gpt-5-nano/ https://developer.puter.com/ai/openai/gpt-5-nano/ Thu, 07 Aug 2025 00:00:00 GMT GPT-5 Nano is OpenAI's fastest and cheapest GPT-5 variant, ideal for summarization and classification tasks. It offers extremely low latency for high-volume, simple inference workloads. OpenAI Anthropic: Claude Opus 4.1 https://developer.puter.com/ai/anthropic/claude-opus-4-1/ https://developer.puter.com/ai/anthropic/claude-opus-4-1/ Tue, 05 Aug 2025 00:00:00 GMT Claude Opus 4.1 is an August 2025 incremental upgrade to Opus 4 focused on agentic tasks and real-world coding. It improved coding accuracy to 74.5% on SWE-bench with finer-grained refactoring and more precise bug fixes. Anthropic Qwen: Qwen Image https://developer.puter.com/ai/qwen/qwen-image/ https://developer.puter.com/ai/qwen/qwen-image/ Fri, 01 Aug 2025 00:00:00 GMT Qwen Image is a 20B-parameter image generation foundation model from Alibaba's Qwen series, built for text-to-image generation, image editing, and image understanding tasks. Its standout capability is high-fidelity text rendering — it accurately places readable text in both English and Chinese within generated images, making it especially strong for posters, slides, and design-heavy visuals. Beyond text, it supports a wide range of styles from photorealism to anime, and handles advanced editing operations like style transfer, object insertion/removal, and in-image text modification. The model also performs image understanding tasks including object detection, segmentation, depth estimation, and super-resolution. A versatile choice for developers who need generation, editing, and visual analysis in a single model. Licensed under Apache 2.0. Qwen MiniMax: MiniMax Hailuo 02 https://developer.puter.com/ai/minimax/hailuo-02/ https://developer.puter.com/ai/minimax/hailuo-02/ Thu, 31 Jul 2025 00:00:00 GMT MiniMax Hailuo 02 is a next-generation AI video model ranked #2 globally, featuring native 1080p output and advanced physics simulation for realistic motion including gravity, fluid dynamics, and complex movements like gymnastics. It uses Noise-aware Compute Redistribution (NCR) architecture for 2.5x improved efficiency, with 3x more parameters and 4x more training data than its predecessor. The model supports both text-to-video and image-to-video generation with clips up to 10 seconds. MiniMax Qwen: Qwen3 30B A3B Instruct 2507 https://developer.puter.com/ai/qwen/qwen3-30b-a3b-instruct-2507/ https://developer.puter.com/ai/qwen/qwen3-30b-a3b-instruct-2507/ Tue, 29 Jul 2025 00:00:00 GMT Qwen3 30B A3B Instruct (2507) is the July 2025 updated instruction-tuned version with improved capabilities in reasoning, coding, and tool usage at high efficiency. Qwen Z.AI: GLM 4.5 https://developer.puter.com/ai/z-ai/glm-4.5/ https://developer.puter.com/ai/z-ai/glm-4.5/ Mon, 28 Jul 2025 00:00:00 GMT GLM-4.5 is Zhipu AI's flagship 355B-parameter open-source model (32B active) designed for agentic AI applications with dual thinking/non-thinking modes. It excels at reasoning, coding, and tool use, ranking 3rd globally among all models on combined benchmarks under MIT license. Z.AI Z.AI: GLM 4.5 Air https://developer.puter.com/ai/z-ai/glm-4.5-air/ https://developer.puter.com/ai/z-ai/glm-4.5-air/ Mon, 28 Jul 2025 00:00:00 GMT GLM-4.5-Air is a compact 106B-parameter variant (12B active) of GLM-4.5, offering competitive agentic performance with significantly lower resource requirements. It supports the same dual reasoning modes and 128K context window as its larger sibling. Z.AI Z.AI: GLM 4.5 AirX https://developer.puter.com/ai/z-ai/glm-4.5-airx/ https://developer.puter.com/ai/z-ai/glm-4.5-airx/ Mon, 28 Jul 2025 00:00:00 GMT GLM-4.5-AirX is the ultra-fast inference variant of Z.ai's GLM-4.5-Air, a 106B-parameter Mixture-of-Experts model with 12B active parameters per forward pass. It shares the same architecture and 128K-token context window as GLM-4.5-Air but is optimized for maximum throughput and minimal latency. GLM-4.5-Air itself delivers strong results — scoring 59.8 across 12 industry benchmarks and outperforming models like Gemini 2.5 Flash and Qwen3-235B on reasoning evaluations. AirX preserves that capability while targeting low-latency, high-concurrency production scenarios. Best suited for real-time agent pipelines, high-volume chat, and latency-sensitive coding assistance where the full GLM-4.5's throughput is insufficient but you still need competitive reasoning and tool-use performance. Z.AI Z.AI: GLM 4.5 Flash https://developer.puter.com/ai/z-ai/glm-4.5-flash/ https://developer.puter.com/ai/z-ai/glm-4.5-flash/ Mon, 28 Jul 2025 00:00:00 GMT GLM-4.5-Flash is the free tier in Z.ai's GLM-4.5 model family, optimized for coding, reasoning, and agent tasks. It shares the hybrid reasoning architecture of the broader GLM-4.5 series, supporting both a thinking mode for complex multi-step problems and a non-thinking mode for instant responses. With a 128K-token context window and native support for function calling, structured output, and streaming, it provides a capable baseline for developers prototyping agent workflows or building cost-sensitive applications. It integrates with coding agent frameworks like Claude Code and Roo Code. An excellent starting point for teams evaluating the GLM-4.5 ecosystem — no cost to experiment, with a clear upgrade path to GLM-4.5 or GLM-4.5-X for heavier workloads. Z.AI Z.AI: GLM 4.5 X https://developer.puter.com/ai/z-ai/glm-4.5-x/ https://developer.puter.com/ai/z-ai/glm-4.5-x/ Mon, 28 Jul 2025 00:00:00 GMT GLM-4.5-X is the high-performance, ultra-fast inference variant of Z.ai's flagship GLM-4.5 model. It retains the full 355B-parameter MoE architecture (32B active) and 128K-token context window while being tuned for significantly faster response times — exceeding 100 tokens per second in real-world tests. GLM-4.5 itself ranks among the top models globally across 12 benchmarks spanning reasoning, coding, and agentic tasks, with an aggregate score of 63.2. The X variant delivers that same capability ceiling with latency suitable for interactive applications. Designed for production workloads where both quality and speed matter — real-time coding agents, interactive tool-use pipelines, and high-concurrency deployments that can't afford the response time of the standard GLM-4.5 endpoint. Z.AI StepFun: Step3 https://developer.puter.com/ai/stepfun/step3/ https://developer.puter.com/ai/stepfun/step3/ Mon, 28 Jul 2025 00:00:00 GMT Step3 is a multimodal reasoning model from StepFun, built on a Mixture-of-Experts architecture with 321B total parameters and 38B active per token. It accepts both text and image inputs, making it suitable for vision-language tasks. The model is engineered for cost-effective decoding through two co-designed innovations: Multi-Matrix Factorization Attention (MFA) to reduce KV cache size, and Attention-FFN Disaggregation (AFD) for more efficient distributed inference. StepFun reports it achieves significantly higher tokens-per-GPU throughput than DeepSeek-V3 at comparable context lengths. Step3 targets use cases that require grounded multimodal reasoning — interpreting diagrams, documents, and images alongside text — with reduced hallucination. StepFun Qwen: Qwen3 Coder Flash https://developer.puter.com/ai/qwen/qwen3-coder-flash/ https://developer.puter.com/ai/qwen/qwen3-coder-flash/ Mon, 28 Jul 2025 00:00:00 GMT Qwen3 Coder Flash is a cost-effective coding model balancing performance and speed, suitable for scenarios requiring fast responses at lower cost while maintaining coding quality. Qwen Qwen: Qwen Flash https://developer.puter.com/ai/qwen/qwen-flash/ https://developer.puter.com/ai/qwen/qwen-flash/ Mon, 28 Jul 2025 00:00:00 GMT Qwen Flash is Alibaba's latency-optimized general-purpose language model, designed as the successor to Qwen Turbo for cost-efficient, high-throughput workloads. It offers a 1 million token context window with native support for context caching, making repeated or large-context requests significantly cheaper. The model supports function calling and is accessible via an OpenAI-compatible API through Alibaba Cloud Model Studio. Qwen Flash is a strong choice for developers running high-volume production tasks — classification, extraction, summarization, and lightweight agentic pipelines — where low latency and predictable pricing matter more than peak reasoning capability. Its flexible tiered pricing and context cache support make it especially cost-effective at scale. Qwen Qwen: Qwen3 235B A22B Thinking 2507 https://developer.puter.com/ai/qwen/qwen3-235b-a22b-thinking-2507/ https://developer.puter.com/ai/qwen/qwen3-235b-a22b-thinking-2507/ Fri, 25 Jul 2025 00:00:00 GMT Qwen3 235B A22B Thinking (2507) is the reasoning-enhanced variant using extended chain-of-thought processing for complex math, coding, and logical problems with enhanced performance. Qwen Z.AI: GLM 4 32B https://developer.puter.com/ai/z-ai/glm-4-32b/ https://developer.puter.com/ai/z-ai/glm-4-32b/ Thu, 24 Jul 2025 00:00:00 GMT GLM-4-32B is a 32-billion parameter bilingual (Chinese-English) foundation model by Zhipu AI, pre-trained on 15TB of reasoning-focused data. It delivers performance comparable to GPT-4o on code generation, function calling, and Q&A tasks while remaining deployable on accessible hardware. Z.AI Qwen: Qwen3 Coder Plus https://developer.puter.com/ai/qwen/qwen3-coder-plus/ https://developer.puter.com/ai/qwen/qwen3-coder-plus/ Wed, 23 Jul 2025 00:00:00 GMT Qwen3 Coder Plus is the strongest Qwen coding API model, ideal for complex project generation and in-depth code reviews with up to 1M token context support. Qwen Qwen: Qwen3 VL 32B Instruct https://developer.puter.com/ai/qwen/qwen3-vl-32b-instruct/ https://developer.puter.com/ai/qwen/qwen3-vl-32b-instruct/ Tue, 22 Jul 2025 00:00:00 GMT Qwen3 VL 32B Instruct is a dense vision-language model with strong text and visual capabilities, featuring visual coding, spatial understanding, and 256K context support. Qwen Qwen: Qwen3 VL 8B Instruct https://developer.puter.com/ai/qwen/qwen3-vl-8b-instruct/ https://developer.puter.com/ai/qwen/qwen3-vl-8b-instruct/ Tue, 22 Jul 2025 00:00:00 GMT Qwen3 VL 8B Instruct is a compact vision-language model matching flagship text performance while supporting image/video understanding, visual coding, and 256K context length. Qwen OpenAI: GPT-OSS 120B https://developer.puter.com/ai/openai/gpt-oss-120b/ https://developer.puter.com/ai/openai/gpt-oss-120b/ Tue, 22 Jul 2025 00:00:00 GMT GPT-OSS 120B is OpenAI's most powerful open-weight model under Apache 2.0 license, achieving near-parity with o4-mini on reasoning benchmarks. It has 117B total parameters with 5.1B active, fitting on a single H100 GPU. OpenAI OpenAI: GPT-OSS 20B https://developer.puter.com/ai/openai/gpt-oss-20b/ https://developer.puter.com/ai/openai/gpt-oss-20b/ Tue, 22 Jul 2025 00:00:00 GMT GPT-OSS 20B is OpenAI's smaller open-weight model for lower latency and local inference, matching o3-mini on common benchmarks. It requires only 16GB of memory and runs on consumer hardware. OpenAI ByteDance: UI-TARS 7B https://developer.puter.com/ai/bytedance/ui-tars-1.5-7b/ https://developer.puter.com/ai/bytedance/ui-tars-1.5-7b/ Tue, 22 Jul 2025 00:00:00 GMT UI-TARS 1.5 7B is a multimodal vision-language agent by ByteDance optimized for GUI automation across desktop, web, mobile, and game environments. It uses reinforcement learning-based reasoning to plan and execute actions on graphical interfaces. The model achieves state-of-the-art results on benchmarks like OSWorld, WebVoyager, and AndroidWorld. ByteDance Qwen: Qwen3 235B A22B Instruct 2507 https://developer.puter.com/ai/qwen/qwen3-235b-a22b-2507/ https://developer.puter.com/ai/qwen/qwen3-235b-a22b-2507/ Mon, 21 Jul 2025 00:00:00 GMT Qwen3 235B A22B (2507) is the July 2025 updated version with significant improvements in instruction following, reasoning, coding, tool usage, and 256K long-context understanding. Qwen Mistral AI: Voxtral Small https://developer.puter.com/ai/mistralai/voxtral-small-2507/ https://developer.puter.com/ai/mistralai/voxtral-small-2507/ Tue, 15 Jul 2025 00:00:00 GMT Voxtral Small is a 24B parameter speech understanding model built on Mistral Small 3.1 under Apache 2.0. It supports 30-minute transcription, 40-minute audio understanding, Q&A, summarization, and function calling from voice in 8+ languages. Mistral AI SwitchPoint: Router https://developer.puter.com/ai/switchpoint/router/ https://developer.puter.com/ai/switchpoint/router/ Fri, 11 Jul 2025 00:00:00 GMT Switchpoint Router is an intelligent LLM routing system by Switchpoint AI that automatically analyzes each request and directs it to the optimal model from a continuously updated library of LLMs. Rather than being a single model, it acts as a smart proxy — using a cascading approach that attempts lower-cost models first and escalates to more capable ones only when needed. The underlying pool includes models like DeepSeek, Claude, GPT, and Mixtral, selected based on a cost-performance balance. It offers a 131,072-token context window and flat-rate pricing at $0.85 per million input tokens and $3.40 per million output tokens. As new models are released, the router incorporates them automatically, so your integration stays current without code changes. Ideal for developers who want cost-efficient inference without manually selecting or switching between models. SwitchPoint Moonshot AI: Kimi K2 0711 https://developer.puter.com/ai/moonshotai/kimi-k2/ https://developer.puter.com/ai/moonshotai/kimi-k2/ Fri, 11 Jul 2025 00:00:00 GMT Kimi K2 is a trillion-parameter Mixture-of-Experts model by Moonshot AI, activating 32 billion parameters per token. Designed as a non-thinking model optimized for agentic capabilities, it excels at tool use, code generation, and autonomous problem-solving with a 128K token context window. On benchmarks, K2 scored 65.8% on SWE-bench Verified, 75.1% on GPQA-Diamond, 49.5% on AIME 2025, and 66.1 on Tau2-bench — surpassing most open- and closed-source models in non-thinking settings. It ranked as the #1 open-source model on the LMSYS Arena leaderboard upon release in July 2025. K2 is well suited for developers building AI agents and tool-calling pipelines who need strong coding and reasoning without extended thinking overhead. Moonshot AI Mistral AI: Devstral Medium https://developer.puter.com/ai/mistralai/devstral-medium-2507/ https://developer.puter.com/ai/mistralai/devstral-medium-2507/ Thu, 10 Jul 2025 00:00:00 GMT Devstral Medium is a high-performance agentic coding model achieving 61.6% on SWE-Bench Verified. It excels at complex software engineering tasks across entire codebases, surpassing GPT-4.1 and Gemini 2.5 Pro in code-related tasks at a fraction of the cost. Mistral AI Mistral AI: Devstral Small 1.1 https://developer.puter.com/ai/mistralai/devstral-small-2507/ https://developer.puter.com/ai/mistralai/devstral-small-2507/ Thu, 10 Jul 2025 00:00:00 GMT Devstral Small is a 24B parameter agentic coding model built with All Hands AI, achieving 46.8% on SWE-Bench Verified. Released under Apache 2.0, it can run locally on a single RTX 4090 or 32GB RAM Mac for autonomous software development. Mistral AI Mistral AI: Devstral Medium https://developer.puter.com/ai/mistralai/devstral-medium/ https://developer.puter.com/ai/mistralai/devstral-medium/ Thu, 10 Jul 2025 00:00:00 GMT Devstral Medium is a high-performance agentic coding model for complex software engineering tasks, achieving 61.6% on SWE-Bench Verified. It's designed for generalization across prompt styles and tool use in code agents and frameworks. Mistral AI Mistral AI: Devstral Small https://developer.puter.com/ai/mistralai/devstral-small/ https://developer.puter.com/ai/mistralai/devstral-small/ Thu, 10 Jul 2025 00:00:00 GMT Devstral Small is a 24B parameter agentic LLM for software engineering, achieving 46.8% on SWE-Bench Verified. Released under Apache 2.0, it runs locally on consumer GPUs and excels at solving real-world GitHub issues autonomously. Mistral AI xAI: Grok 4 https://developer.puter.com/ai/x-ai/grok-4/ https://developer.puter.com/ai/x-ai/grok-4/ Wed, 09 Jul 2025 00:00:00 GMT Grok 4 is xAI's flagship reasoning model released July 2025, trained with unprecedented reinforcement learning scale on 200,000 GPUs. It features native tool use, real-time search integration, and Grok 4 Heavy achieves 50% on Humanity's Last Exam benchmark. xAI xAI: Grok 4 0709 https://developer.puter.com/ai/x-ai/grok-4-0709/ https://developer.puter.com/ai/x-ai/grok-4-0709/ Wed, 09 Jul 2025 00:00:00 GMT Grok 4 0709 is the July 9, 2025 snapshot of xAI's flagship reasoning model, trained with reinforcement learning to use tools like a code interpreter and web browsing. It features a 256K context window, native tool use, parallel tool calling, and support for both image and text inputs. xAI Cognitive Computations: Dolphin Mistral 24B Venice Edition (Uncensored) https://developer.puter.com/ai/cognitivecomputations/dolphin-mistral-24b-venice-edition/ https://developer.puter.com/ai/cognitivecomputations/dolphin-mistral-24b-venice-edition/ Wed, 09 Jul 2025 00:00:00 GMT Dolphin Mistral 24B Venice Edition is an uncensored, general-purpose language model fine-tuned from Mistral Small 24B (Instruct-2501), developed by Cognitive Computations (the Dolphin project, founded by Eric Hartford) in collaboration with Venice.ai. It features a 32K context window and 24 billion parameters. The model is specifically designed to remove default safety filters and content refusals, giving developers full control over system prompts, alignment, and model behavior. On Venice's censorship benchmark suite, it achieved a refusal rate of just 2.2%, the lowest among tested models. While the base Mistral Small 24B leaned STEM-heavy, this fine-tune adds strong creative writing and storytelling capabilities with consistent character and narrative memory across long interactions. It also features improved tone control — neutral and polite by default, but fully steerable via prompting. Best suited for developers building applications that require maximum output flexibility, custom ethical frameworks, or unrestricted content generation where typical model refusals would be a blocker. Cognitive Computations Tencent: Hunyuan A13B Instruct https://developer.puter.com/ai/tencent/hunyuan-a13b-instruct/ https://developer.puter.com/ai/tencent/hunyuan-a13b-instruct/ Tue, 08 Jul 2025 00:00:00 GMT Hunyuan A13B Instruct is an open-source large language model from Tencent built on a fine-grained Mixture-of-Experts (MoE) architecture, with 80B total parameters and 13B active during inference. It natively supports a 256K-token context window. It performs competitively with OpenAI o1 and DeepSeek R1 across math, science, and reasoning benchmarks, scoring 87.3 on AIME 2024, 89.1 on BBH, and 84.7 on ZebraLogic. Hunyuan A13B particularly excels at agentic tasks and tool use, leading on benchmarks like BFCL-v3 (78.3) and ComplexFuncBench (61.2). It's a strong choice for developers building agent workflows, long-context applications, or cost-sensitive reasoning pipelines. Tencent Mistral AI: Voxtral Mini https://developer.puter.com/ai/mistralai/voxtral-mini-2507/ https://developer.puter.com/ai/mistralai/voxtral-mini-2507/ Tue, 08 Jul 2025 00:00:00 GMT Voxtral Mini is a 3B parameter open-source speech model built on Ministral 3B under Apache 2.0. It handles transcription, Q&A from audio, and multilingual speech understanding for up to 40 minutes of audio, optimized for edge deployment. Mistral AI Morph: Morph V3 Fast https://developer.puter.com/ai/morph/morph-v3-fast/ https://developer.puter.com/ai/morph/morph-v3-fast/ Mon, 07 Jul 2025 00:00:00 GMT Morph V3 Fast is a specialized code-editing model built by Morph, designed to serve as the execution layer in AI-assisted development workflows. Rather than generating code from scratch, it applies edits suggested by frontier reasoning models like Claude or GPT-4o to existing code files. It processes at approximately 10,500 tokens per second with around 96% accuracy on code transformations, making it one of the fastest options for automated code apply tasks. The model supports an 81,920-token context window and up to 38,000 output tokens. Morph V3 Fast is built for high-volume, latency-sensitive pipelines where code edits need to be applied rapidly and cheaply. Morph Morph: Morph V3 Large https://developer.puter.com/ai/morph/morph-v3-large/ https://developer.puter.com/ai/morph/morph-v3-large/ Mon, 07 Jul 2025 00:00:00 GMT Morph V3 Large is Morph's high-accuracy code apply model, optimized for complex and precise code transformations. Like its faster sibling, it acts as the execution layer in agentic coding workflows — taking edit suggestions from reasoning models and merging them into existing code. It achieves approximately 98% accuracy on code transformations at speeds around 4,500 tokens per second. The model supports a 262,144-token context window with up to 131,100 output tokens, allowing it to process entire codebases or large files in a single request. Morph V3 Large is the better choice when edit correctness matters more than raw speed — particularly for production codebases or complex multi-file changes where a 2% accuracy gap can mean hundreds of broken edits at scale. Morph Baidu: ERNIE 4.5 300B A47B https://developer.puter.com/ai/baidu/ernie-4.5-300b-a47b/ https://developer.puter.com/ai/baidu/ernie-4.5-300b-a47b/ Mon, 30 Jun 2025 00:00:00 GMT ERNIE 4.5 300B A47B is Baidu's flagship text-only large language model featuring 300B total parameters with 47B active per token via MoE architecture. It demonstrates state-of-the-art performance on instruction following and knowledge benchmarks like IFEval, SimpleQA, and ChineseSimpleQA. The model supports 131K context length and excels at text understanding, generation, reasoning, and coding. Baidu Baidu: ERNIE 4.5 VL 424B A47B https://developer.puter.com/ai/baidu/ernie-4.5-vl-424b-a47b/ https://developer.puter.com/ai/baidu/ernie-4.5-vl-424b-a47b/ Mon, 30 Jun 2025 00:00:00 GMT ERNIE 4.5 VL 424B A47B is Baidu's largest multimodal vision-language model with 424B total parameters and 47B active per token. It supports up to 131K context tokens and excels at visual reasoning, document/chart understanding, and visual question answering with both thinking and non-thinking modes. In thinking mode, it approaches or surpasses OpenAI o1 on reasoning benchmarks like MathVista, MMMU, and VisualPuzzle. Baidu Google: Gemma 3n 2B https://developer.puter.com/ai/google/gemma-3n-e2b-it/ https://developer.puter.com/ai/google/gemma-3n-e2b-it/ Wed, 25 Jun 2025 00:00:00 GMT Gemma 3n E2B Instruct (Free) is Google's mobile-first open model with an effective 2B parameter memory footprint using Per-Layer Embeddings. It's optimized for on-device AI with audio, text, image, and video understanding. Google Google: Gemma 3n 4B https://developer.puter.com/ai/google/gemma-3n-e4b-it/ https://developer.puter.com/ai/google/gemma-3n-e4b-it/ Wed, 25 Jun 2025 00:00:00 GMT Gemma 3n E4B Instruct is Google's mobile-optimized model with a 4B active memory footprint containing a nested 2B submodel for flexible quality-latency tradeoffs. It supports real-time multimodal processing on edge devices. Google ByteDance: Seedance 1.0 Lite https://developer.puter.com/ai/bytedance/seedance-1.0-lite/ https://developer.puter.com/ai/bytedance/seedance-1.0-lite/ Wed, 25 Jun 2025 00:00:00 GMT Seedance 1.0 Lite is ByteDance's speed-optimized AI video generation model that creates 5 second videos at 480p-720p resolution from text prompts or images. It supports text-to-video and image-to-video generation with smooth motion and multi-shot narrative capabilities, designed for fast iteration and experimentation. ByteDance ByteDance: Seedance 1.0 Pro https://developer.puter.com/ai/bytedance/seedance-1.0-pro/ https://developer.puter.com/ai/bytedance/seedance-1.0-pro/ Wed, 25 Jun 2025 00:00:00 GMT Seedance 1.0 Pro is ByteDance's professional-grade AI video generation model that produces cinematic 1080p videos from text or images. It excels at multi-shot storytelling with consistent subjects and visual style across scenes, featuring smooth motion, rich details, and advanced prompt following for production-quality content. ByteDance Mistral AI: Mistral Small 3.2 https://developer.puter.com/ai/mistralai/mistral-small-2506/ https://developer.puter.com/ai/mistralai/mistral-small-2506/ Fri, 20 Jun 2025 00:00:00 GMT Mistral Small 3.2 is a 24B parameter multimodal model with 128K context, improved instruction following, and reduced repetition. It handles text and images, runs on single RTX 4090 when quantized, and delivers 150 tokens/second under Apache 2.0. Mistral AI Mistral AI: Mistral Small 3.2 https://developer.puter.com/ai/mistralai/mistral-small-3.2-24b-instruct/ https://developer.puter.com/ai/mistralai/mistral-small-3.2-24b-instruct/ Fri, 20 Jun 2025 00:00:00 GMT Mistral Small 3.2 improves on 3.1 with better instruction following (84.78% vs 82.75%), reduced infinite generations (1.29% vs 2.11%), and more robust function calling. It maintains the 24B/128K context architecture under Apache 2.0. Mistral AI MiniMax: MiniMax M1 https://developer.puter.com/ai/minimax/minimax-m1/ https://developer.puter.com/ai/minimax/minimax-m1/ Tue, 17 Jun 2025 00:00:00 GMT MiniMax-M1 is the world's first open-source hybrid-attention reasoning model, featuring a 1 million token context window and 80K reasoning output budget. It excels in software engineering, long-context tasks, and complex reasoning while being trained with an efficient CISPO reinforcement learning algorithm. MiniMax Google: Gemini 2.5 Flash-Lite https://developer.puter.com/ai/google/gemini-2.5-flash-lite/ https://developer.puter.com/ai/google/gemini-2.5-flash-lite/ Tue, 17 Jun 2025 00:00:00 GMT Gemini 2.5 Flash-Lite is Google's cost-optimized version of 2.5 Flash, designed for high-volume tasks like classification, translation, and intelligent routing. It delivers efficient performance for cost-sensitive, high-scale operations. Google Black Forest Labs: FLUX.1 Kontext [dev] https://developer.puter.com/ai/black-forest-labs/flux.1-kontext-dev/ https://developer.puter.com/ai/black-forest-labs/flux.1-kontext-dev/ Thu, 12 Jun 2025 00:00:00 GMT FLUX.1 Kontext Dev is an open-weight 12B parameter model for in-context image generation and editing, allowing prompting with both text and images to modify visual concepts. It was the first open model to deliver proprietary-level image editing performance and runs on consumer hardware. Black Forest Labs Black Forest Labs: FLUX.1 Kontext [max] https://developer.puter.com/ai/black-forest-labs/flux.1-kontext-max/ https://developer.puter.com/ai/black-forest-labs/flux.1-kontext-max/ Thu, 12 Jun 2025 00:00:00 GMT FLUX.1 Kontext Max is the highest-quality model in the Kontext series, optimized for iteratively modifying existing images via text prompts with maximum fidelity. It offers the best editing consistency and prompt following among Kontext variants. Black Forest Labs Black Forest Labs: FLUX.1 Kontext [pro] https://developer.puter.com/ai/black-forest-labs/flux.1-kontext-pro/ https://developer.puter.com/ai/black-forest-labs/flux.1-kontext-pro/ Thu, 12 Jun 2025 00:00:00 GMT FLUX.1 Kontext Pro is a production-grade in-context image generation and editing model that balances quality and speed. It powers integrations in Adobe Photoshop's Generative Fill and Meta's platforms. Black Forest Labs OpenAI: OpenAI o3 Pro https://developer.puter.com/ai/openai/o3-pro/ https://developer.puter.com/ai/openai/o3-pro/ Tue, 10 Jun 2025 00:00:00 GMT OpenAI o3 Pro is a version of o3 designed to think longer and provide the most reliable responses for challenging questions. It's recommended when reliability matters more than speed. OpenAI Google: Gemini 2.5 Pro Preview 06-05 https://developer.puter.com/ai/google/gemini-2.5-pro-preview/ https://developer.puter.com/ai/google/gemini-2.5-pro-preview/ Thu, 05 Jun 2025 00:00:00 GMT Gemini 2.5 Pro Preview is the preview version of Google's most advanced reasoning model with state-of-the-art coding and complex task performance. It features Deep Think mode, 1M token context, and advanced multimodal capabilities. Google TNG Technology: DeepSeek R1T2 Chimera https://developer.puter.com/ai/tngtech/deepseek-r1t2-chimera/ https://developer.puter.com/ai/tngtech/deepseek-r1t2-chimera/ Sun, 01 Jun 2025 00:00:00 GMT DeepSeek R1T2 Chimera is TNG Tech's second-generation 671B parameter tri-parent model assembled from DeepSeek R1-0528, R1, and V3-0324. It runs ~20% faster than R1 and 2x faster than R1-0528 while scoring higher on benchmarks like GPQA and AIME-24, with improved think-token consistency. TNG Technology TNG Technology: R1T Chimera https://developer.puter.com/ai/tngtech/tng-r1t-chimera/ https://developer.puter.com/ai/tngtech/tng-r1t-chimera/ Sun, 01 Jun 2025 00:00:00 GMT TNG R1T Chimera is an experimental LLM from TNG Tech optimized for creative storytelling and character interaction. It's a derivative of the original DeepSeek-R1T-Chimera with improved think-token consistency, better tool calling, and an EQ-Bench3 score of ~1305. TNG Technology Liquid AI: LFM2-8B-A1B https://developer.puter.com/ai/liquid/lfm2-8b-a1b/ https://developer.puter.com/ai/liquid/lfm2-8b-a1b/ Sun, 01 Jun 2025 00:00:00 GMT LFM2-8B-A1B is a sparse Mixture-of-Experts language model from Liquid AI with 8.3B total parameters but only 1.5B active per token, using 32 experts per MoE block with top-4 active per token. This design delivers 3-4B dense model quality at the compute cost of a 1.5B model, making it faster than Qwen3-1.7B in practice. Verified benchmarks include GSM8K 84.4%, MATH500 74.2%, IFEval 77.6%, and MMLU-Pro 37.4%. For API developers, it is a strong choice for latency-sensitive applications requiring larger-model quality at minimal compute cost — ideal for high-throughput pipelines where speed and efficiency are priorities. Liquid AI DeepSeek: R1 0528 https://developer.puter.com/ai/deepseek/deepseek-r1-0528/ https://developer.puter.com/ai/deepseek/deepseek-r1-0528/ Wed, 28 May 2025 00:00:00 GMT DeepSeek R1-0528 is the May 2025 major update to R1, featuring dramatically improved reasoning depth with nearly double the thinking tokens (23K vs 12K average) and approaching performance of O3 and Gemini 2.5 Pro. It adds function calling support, reduced hallucinations, and improved AIME accuracy from 70% to 87.5%. DeepSeek Anthropic: Claude Opus 4 https://developer.puter.com/ai/anthropic/claude-opus-4/ https://developer.puter.com/ai/anthropic/claude-opus-4/ Thu, 22 May 2025 00:00:00 GMT Claude Opus 4 is the flagship model from the May 2025 Claude 4 launch, designed for complex long-running tasks. It can work continuously for several hours (7+ hour coding sessions demonstrated) and leads on coding benchmarks at 72.5% SWE-bench. Anthropic Anthropic: Claude Sonnet 4 https://developer.puter.com/ai/anthropic/claude-sonnet-4/ https://developer.puter.com/ai/anthropic/claude-sonnet-4/ Thu, 22 May 2025 00:00:00 GMT Claude Sonnet 4 is the May 2025 successor to Sonnet 3.7 with enhanced steerability and coding (72.7% SWE-bench). It excels at following complex instructions precisely and autonomous multi-feature app development with near-zero navigation errors. Anthropic Google: Imagen 4 Preview https://developer.puter.com/ai/google/imagen-4.0-preview/ https://developer.puter.com/ai/google/imagen-4.0-preview/ Tue, 20 May 2025 00:00:00 GMT Imagen 4 Preview is the preview version of Google's flagship text-to-image diffusion model featuring photorealistic detail, improved typography, and support for up to 2K resolution. It balances quality and cost at $0.04 per image, making it suitable for a wide variety of creative tasks. Google Google: Veo 3 https://developer.puter.com/ai/google/veo-3.0/ https://developer.puter.com/ai/google/veo-3.0/ Tue, 20 May 2025 00:00:00 GMT Google Veo 3 is Google DeepMind's advanced AI video model that generates high-quality videos with native synchronized audio including dialogue, sound effects, and ambient noise directly from text prompts. It delivers state-of-the-art results in physics, realism, and prompt adherence with cinematic quality 8-second clips at up to 1080p resolution. Google Google: Veo 3 with Audio https://developer.puter.com/ai/google/veo-3.0-audio/ https://developer.puter.com/ai/google/veo-3.0-audio/ Tue, 20 May 2025 00:00:00 GMT Google Veo 3 with Audio is the audio-enabled configuration of Veo 3 that generates synchronized sound effects, dialogue, ambient noise, and music natively alongside video content. It produces complete audiovisual experiences from text prompts, eliminating the need for separate audio post-production. Google Google: Veo 3 Fast https://developer.puter.com/ai/google/veo-3.0-fast/ https://developer.puter.com/ai/google/veo-3.0-fast/ Tue, 20 May 2025 00:00:00 GMT Google Veo 3 Fast is a speed-optimized variant of Veo 3 that generates videos approximately 2x faster at 60-80% lower cost while maintaining high visual quality. It's designed for rapid iteration, prototyping, and cost-efficient production workflows at 720p resolution. Google Google: Veo 3 Fast with Audio https://developer.puter.com/ai/google/veo-3.0-fast-audio/ https://developer.puter.com/ai/google/veo-3.0-fast-audio/ Tue, 20 May 2025 00:00:00 GMT Google Veo 3 Fast with Audio is the audio-enabled version of the speed-optimized Veo 3 Fast model, combining faster generation times and lower costs with native synchronized audio generation. It delivers sound effects, dialogue, and ambient audio while optimizing for speed and affordability in production workflows. Google Moonshot AI: Kimi Dev 72B https://developer.puter.com/ai/moonshotai/kimi-dev-72b/ https://developer.puter.com/ai/moonshotai/kimi-dev-72b/ Thu, 15 May 2025 00:00:00 GMT Kimi Dev 72B is a 72-billion-parameter coding model by Moonshot AI, purpose-built for software engineering tasks like bug fixing, code generation, and unit test creation. It is based on the Qwen 2.5-72B architecture and fine-tuned with large-scale reinforcement learning on real-world GitHub issues and pull requests. The model achieved 60.4% on SWE-bench Verified, setting a state-of-the-art result among open-source models at the time of its June 2025 release. It uses a two-stage framework — file localization followed by precise code editing — that mirrors how human developers approach issue resolution. Kimi Dev 72B is a strong pick for automated code repair and test generation workflows where a specialized coding model outperforms general-purpose alternatives. Moonshot AI Black Forest Labs: FLUX.2 [dev] https://developer.puter.com/ai/black-forest-labs/flux-2-dev/ https://developer.puter.com/ai/black-forest-labs/flux-2-dev/ Thu, 15 May 2025 00:00:00 GMT FLUX.2 Dev is a 32B parameter open-weight flow matching transformer for text-to-image generation and multi-reference image editing. It uses Mistral-3 24B as its vision-language backbone and features a new VAE, improved typography, and support for up to 10 reference images. Black Forest Labs Black Forest Labs: FLUX.2 [flex] https://developer.puter.com/ai/black-forest-labs/flux.2-flex/ https://developer.puter.com/ai/black-forest-labs/flux.2-flex/ Thu, 15 May 2025 00:00:00 GMT FLUX.2 Flex is a specialized FLUX.2 variant focused on typography accuracy and fine detail preservation, with a configurable 'steps' parameter for trading off between quality and speed. It exposes advanced controls like guidance and step count for precise diffusion tuning. Black Forest Labs Black Forest Labs: FLUX.2 [max] https://developer.puter.com/ai/black-forest-labs/flux.2-max/ https://developer.puter.com/ai/black-forest-labs/flux.2-max/ Thu, 15 May 2025 00:00:00 GMT FLUX.2 Max is the most capable model in the FLUX.2 family, delivering the highest editing consistency, strongest prompt following, and best photorealism. It uniquely supports grounded generation with real-time web search to visualize current events and trending content. Black Forest Labs Black Forest Labs: FLUX.2 [pro] https://developer.puter.com/ai/black-forest-labs/flux-2-pro/ https://developer.puter.com/ai/black-forest-labs/flux-2-pro/ Thu, 15 May 2025 00:00:00 GMT FLUX.2 Pro is the production-grade FLUX.2 model balancing high quality and affordability, designed for professional image generation and editing workflows. It uses fixed optimal inference parameters for consistent output without manual tuning. Black Forest Labs ByteDance Seed: Seedream 3.0 https://developer.puter.com/ai/bytedance-seed/seedream-3.0/ https://developer.puter.com/ai/bytedance-seed/seedream-3.0/ Wed, 14 May 2025 00:00:00 GMT ByteDance Seed Vidu: Vidu Q1 https://developer.puter.com/ai/vidu/vidu-q1/ https://developer.puter.com/ai/vidu/vidu-q1/ Tue, 13 May 2025 00:00:00 GMT Vidu Q1 is a high-performance generative video model from ShengShu Technology that produces cinematic 1080p videos up to 5 seconds with integrated AI-generated audio. It features a First-to-Last Frame system for seamless transitions between unrelated images and supports up to 7 reference images for multi-character consistency. The model excels at anime-style content and includes built-in 48kHz sound effects and background music generation from text prompts. Vidu Mistral AI: Mistral Medium 3 https://developer.puter.com/ai/mistralai/mistral-medium-3/ https://developer.puter.com/ai/mistralai/mistral-medium-3/ Wed, 07 May 2025 00:00:00 GMT Mistral Medium 3 delivers frontier performance at $0.4/$2 per million tokens, performing at 90%+ of Claude Sonnet 3.7 across benchmarks. It's deployable on 4+ GPUs and surpasses Llama 4 Maverick and Cohere Command A. Mistral AI Google: Gemini 2.5 Pro Preview 05-06 https://developer.puter.com/ai/google/gemini-2.5-pro-preview-05-06/ https://developer.puter.com/ai/google/gemini-2.5-pro-preview-05-06/ Wed, 07 May 2025 00:00:00 GMT Gemini 2.5 Pro Preview (May 6) is a dated preview snapshot of Google's flagship reasoning model with improvements in code and function calling. It offers advanced reasoning capabilities for complex enterprise use cases. Google Arcee AI: Coder Large https://developer.puter.com/ai/arcee-ai/coder-large/ https://developer.puter.com/ai/arcee-ai/coder-large/ Mon, 05 May 2025 00:00:00 GMT Arcee Coder Large is a 32-billion-parameter code-generation model from Arcee AI, fine-tuned from Qwen2.5-Instruct on permissively-licensed GitHub data, CodeSearchNet, and synthetic bug-fix corpora. It generates compilable code, explains implementations, reviews diffs, and fixes bugs across 30+ programming languages, with particular strength in TypeScript, Go, and Terraform. A reinforcement learning stage specifically rewards compilable outputs, making it more reliable than general-purpose models on real developer prompts. The 32k context window supports multi-file refactoring and long diff review in a single API call. A strong choice for code-heavy pipelines where output correctness and structured explanations matter. Arcee AI Arcee AI: Maestro Reasoning https://developer.puter.com/ai/arcee-ai/maestro-reasoning/ https://developer.puter.com/ai/arcee-ai/maestro-reasoning/ Mon, 05 May 2025 00:00:00 GMT Arcee Maestro Reasoning is a 32-billion-parameter analytical reasoning model from Arcee AI, derived from Qwen2.5-32B and post-trained with DPO and chain-of-thought reinforcement learning to produce step-by-step logical reasoning traces. It targets complex problem-solving, abstract reasoning, multi-step scenario modeling, and tasks requiring transparent, auditable inference chains — a natural fit for legal, financial, and scientific applications. The 128k context window allows reasoning over long documents in a single call. On Yupp's high-reasoning benchmark, Maestro Reasoning ranks among the top five models overall, competing with significantly larger frontier models. It delivers strong reasoning quality at a mid-tier parameter count. Arcee AI Arcee AI: Spotlight https://developer.puter.com/ai/arcee-ai/spotlight/ https://developer.puter.com/ai/arcee-ai/spotlight/ Mon, 05 May 2025 00:00:00 GMT Arcee Spotlight is a 7-billion-parameter vision-language model from Arcee AI, derived from Qwen2.5-VL and fine-tuned for tight image-text grounding tasks including visual question answering, image captioning, and diagram analysis. At 7B parameters it is designed for fast inference, making it practical for real-time or high-volume multimodal API workloads where latency and cost are constraints. Early benchmarks show it matching or outscoring larger VLMs such as LLaVA-1.6 13B on VQA and POPE alignment tests. A strong choice for developers who need capable vision-language understanding without the cost overhead of larger multimodal models — well suited for document parsing, visual QA pipelines, and image-grounded chat. Arcee AI Arcee AI: Virtuoso Large https://developer.puter.com/ai/arcee-ai/virtuoso-large/ https://developer.puter.com/ai/arcee-ai/virtuoso-large/ Mon, 05 May 2025 00:00:00 GMT Arcee Virtuoso Large is a 72-billion-parameter general-purpose language model from Arcee AI, built on Qwen2.5-72B and post-trained using DeepSeek R1 distillation, multi-epoch supervised fine-tuning, and DPO/RLHF alignment. It is designed for cross-domain reasoning, enterprise question answering, creative writing, and long-document comprehension, with a 128k context window that enables processing entire codebases or lengthy documents in a single API call. Virtuoso Large is Arcee's flagship dense general-purpose model — a solid default choice for developers who need reliable, broad-capability performance without the routing complexity of MoE architectures. Arcee AI Deep Cogito: Cogito V2 Preview Llama 109B https://developer.puter.com/ai/deepcogito/cogito-v2-preview-llama-109b-moe/ https://developer.puter.com/ai/deepcogito/cogito-v2-preview-llama-109b-moe/ Thu, 01 May 2025 00:00:00 GMT Cogito V2 Preview Llama 109B MoE is a sparse Mixture-of-Experts language model built on Llama architecture, developed by DeepCogito using their Iterated Distillation and Amplification (IDA) training method. The MoE design activates only a subset of expert networks per token, delivering strong reasoning at lower per-token compute cost compared to dense models of the same size. It supports dual-mode operation: standard response generation or self-reflective reasoning mode via system prompt. Optimized for coding, STEM, instruction following, multilingual tasks (30+ languages), and tool calling with a 128K context window. A cost-effective option for API workloads that need strong reasoning without dense-model pricing. Deep Cogito Deep Cogito: Cogito V2 Preview Llama 405B https://developer.puter.com/ai/deepcogito/cogito-v2-preview-llama-405b/ https://developer.puter.com/ai/deepcogito/cogito-v2-preview-llama-405b/ Thu, 01 May 2025 00:00:00 GMT Cogito V2 Preview Llama 405B is a dense large language model built on Llama architecture and developed by DeepCogito using Iterated Distillation and Amplification (IDA), a training approach that internalizes reasoning capabilities directly into model weights. As DeepCogito's largest dense offering in the v2 preview series, it delivers near-frontier performance among open models across coding, STEM, general instruction following, and multilingual tasks (30+ languages). It supports a 128K context window. The model operates in both standard and self-reflective reasoning modes, with reasoning chains notably shorter than DeepSeek R1 by approximately 60%. Well-suited for high-accuracy API use cases where latency is less constrained. Deep Cogito Deep Cogito: Cogito V2 Preview Llama 70B https://developer.puter.com/ai/deepcogito/cogito-v2-preview-llama-70b/ https://developer.puter.com/ai/deepcogito/cogito-v2-preview-llama-70b/ Thu, 01 May 2025 00:00:00 GMT Cogito V2 Preview Llama 70B is a dense language model built on Llama architecture and trained by DeepCogito using Iterated Distillation and Amplification (IDA), which embeds reasoning ability into model weights to improve standard-mode performance without requiring extended chain-of-thought. It supports dual-mode operation — direct response or self-reflective reasoning — controlled via the system prompt. In standard mode, it achieves 91.73% on MMLU, outperforming Llama 3.3 70B by 6.4 points. It covers 30+ languages and supports a 128K context window with tool calling in both modes. Well-suited for API deployments requiring a balance of speed, cost efficiency, and strong reasoning on coding, STEM, and instruction-following tasks. Deep Cogito Qwen: Qwen3 30B A3B https://developer.puter.com/ai/qwen/qwen3-30b-a3b/ https://developer.puter.com/ai/qwen/qwen3-30b-a3b/ Tue, 29 Apr 2025 00:00:00 GMT Qwen3 30B A3B is an efficient MoE model with 30B total and 3B active parameters, outperforming QwQ-32B while using 10x fewer active parameters. It offers hybrid thinking modes and 119 language support. Qwen Qwen: Qwen3 4B https://developer.puter.com/ai/qwen/qwen3-4b/ https://developer.puter.com/ai/qwen/qwen3-4b/ Tue, 29 Apr 2025 00:00:00 GMT Qwen3 4B is a compact model rivaling Qwen2.5-72B-Instruct performance, featuring hybrid thinking modes and 119 language support. Qwen Wan AI: Wan 2.2 Image-to-Video 14B https://developer.puter.com/ai/wan-ai/wan2.2-i2v-a14b/ https://developer.puter.com/ai/wan-ai/wan2.2-i2v-a14b/ Thu, 24 Apr 2025 00:00:00 GMT Wan 2.2 I2V A14B is an open-source image-to-video generation model that transforms static images into 5-second videos at 480P or 720P resolution. It uses a Mixture-of-Experts (MoE) architecture with dual 14B-parameter experts to achieve stable video synthesis with reduced unrealistic camera movements and enhanced support for diverse stylized scenes. Wan AI Wan AI: Wan 2.2 Text-to-Video 14B https://developer.puter.com/ai/wan-ai/wan2.2-t2v-a14b/ https://developer.puter.com/ai/wan-ai/wan2.2-t2v-a14b/ Thu, 24 Apr 2025 00:00:00 GMT Wan 2.2 T2V A14B is an open-source text-to-video generation model that creates 5-second videos at 480P or 720P resolution from text prompts. Built with a Mixture-of-Experts (MoE) architecture featuring specialized high-noise and low-noise experts, it delivers cinematic-quality output with granular control over lighting, composition, and motion. Wan AI OpenAI: GPT Image 1 https://developer.puter.com/ai/openai/gpt-image-1/ https://developer.puter.com/ai/openai/gpt-image-1/ Wed, 23 Apr 2025 00:00:00 GMT GPT Image 1 is OpenAI's natively multimodal image generation model released in April 2025, built on GPT-4o architecture to accept both text and image inputs. It excels at text rendering, detailed instruction following, and photorealistic output with support for image editing and inpainting. The model uses an autoregressive approach rather than diffusion, representing a significant advancement over the DALL·E series. OpenAI PixVerse: PixVerse V5 https://developer.puter.com/ai/pixverse/pixverse-v5/ https://developer.puter.com/ai/pixverse/pixverse-v5/ Tue, 22 Apr 2025 00:00:00 GMT PixVerse V5 is an AI video generation model that converts text or images into cinematic-quality videos with smooth motion, fast rendering speeds, and enhanced prompt adherence. It excels at creating high-fidelity videos with natural camera movements and consistent visual styling across frames. The model is ranked among the top performers in image-to-video and text-to-video benchmarks. PixVerse Liquid AI: LFM2-2.6B https://developer.puter.com/ai/liquid/lfm-2.2-6b/ https://developer.puter.com/ai/liquid/lfm-2.2-6b/ Tue, 22 Apr 2025 00:00:00 GMT LFM2-2.6B is a hybrid language model from Liquid AI, built on a novel architecture that alternates Grouped Query Attention blocks with gated short convolutional layers. Trained on 10 trillion tokens, it delivers fast inference with a significantly reduced KV cache footprint compared to pure-transformer models. Despite its 2.6B parameter count, it outperforms larger models in its class including Llama 3.2-3B-Instruct and Gemma-3-4b-it. Verified benchmarks include 82.41% on GSM8K and 79.56% on IFEval, surpassing Llama 3.2-3B's 71.43% on the latter. For API developers, it is well-suited for low-latency, cost-efficient inference tasks such as instruction following, Q&A, and math-related applications. Liquid AI Google: Gemini 2.5 Flash Image https://developer.puter.com/ai/google/gemini-2.5-flash-image/ https://developer.puter.com/ai/google/gemini-2.5-flash-image/ Thu, 17 Apr 2025 00:00:00 GMT Gemini 2.5 Flash Image (codenamed Nano Banana) is Google's state-of-the-art multimodal model for fast, conversational image generation and editing with low latency. It maintains character consistency across prompts, enables precise local edits via natural language, and supports multi-image composition and fusion. Google OpenAI: OpenAI o3 https://developer.puter.com/ai/openai/o3/ https://developer.puter.com/ai/openai/o3/ Wed, 16 Apr 2025 00:00:00 GMT OpenAI o3 is a powerful reasoning model that pushes the frontier in coding, math, science, and visual perception. It can agentically use all ChatGPT tools and makes 20% fewer major errors than o1 on difficult tasks. OpenAI OpenAI: OpenAI o4 Mini https://developer.puter.com/ai/openai/o4-mini/ https://developer.puter.com/ai/openai/o4-mini/ Wed, 16 Apr 2025 00:00:00 GMT OpenAI o4 Mini is a fast, cost-efficient reasoning model optimized for coding and visual tasks. It achieves remarkable performance for its size with full tool access in ChatGPT, succeeded by GPT-5 Mini. OpenAI OpenAI: OpenAI o4 Mini High https://developer.puter.com/ai/openai/o4-mini-high/ https://developer.puter.com/ai/openai/o4-mini-high/ Wed, 16 Apr 2025 00:00:00 GMT OpenAI o4 Mini High is a higher-intelligence version of o4-mini available in the ChatGPT model picker. It provides enhanced reasoning at the cost of longer response times. OpenAI Kling: Kling 2.1 Master https://developer.puter.com/ai/kwaivgi/kling-2.1-master/ https://developer.puter.com/ai/kwaivgi/kling-2.1-master/ Tue, 15 Apr 2025 00:00:00 GMT Kling 2.1 Master is Kuaishou's premium AI video generation model featuring 1080p output, advanced 3D spatiotemporal attention for cinematic-grade realism, and superior prompt adherence. It supports both text-to-video and image-to-video with refined facial modeling and complex motion dynamics ideal for professional filmmakers and advertisers. Kling Kling: Kling 2.1 Standard https://developer.puter.com/ai/kwaivgi/kling-2.1-standard/ https://developer.puter.com/ai/kwaivgi/kling-2.1-standard/ Tue, 15 Apr 2025 00:00:00 GMT Kling 2.1 Standard is a cost-effective 720p AI video generation model from Kuaishou designed for high-volume content creation. It currently supports image-to-video generation only, offering fast rendering speeds and solid quality for social media clips, quick ads, and personal projects at roughly 5x lower cost than Master mode. Kling Kling: Kling 2.1 Pro https://developer.puter.com/ai/kwaivgi/kling-2.1-pro/ https://developer.puter.com/ai/kwaivgi/kling-2.1-pro/ Tue, 15 Apr 2025 00:00:00 GMT Kling 2.1 Pro is the mid-tier 1080p AI video model from Kuaishou offering enhanced sharpness, realistic lighting, and both first and last frame conditioning for precise transitions. It focuses on image-to-video generation with refined camera tools, sitting between Standard and Master in terms of quality and pricing. Kling Z.AI: GLM 4 32B 0414 128K https://developer.puter.com/ai/z-ai/glm-4-32b-0414-128k/ https://developer.puter.com/ai/z-ai/glm-4-32b-0414-128k/ Mon, 14 Apr 2025 00:00:00 GMT GLM-4-32B-0414-128K is a 32B-parameter dense language model from Z.ai with an extended 128K-token context window. Pre-trained on 15 trillion tokens of high-quality data — including substantial reasoning-focused synthetic data — it was further refined with rejection sampling and reinforcement learning for instruction following, code generation, and function calling. It supports bilingual Chinese-English usage and is optimized for tasks like tool use, search-grounded Q&A, and structured output generation. Performance is competitive with models in the GPT and DeepSeek V3/R1 class at a fraction of the parameter count. A strong choice for cost-sensitive workloads that need long-context reasoning, multi-file code editing, or reliable JSON output without stepping up to the larger MoE models in the GLM family. Z.AI OpenAI: GPT-4.1 https://developer.puter.com/ai/openai/gpt-4.1/ https://developer.puter.com/ai/openai/gpt-4.1/ Mon, 14 Apr 2025 00:00:00 GMT GPT-4.1 is OpenAI's smartest non-reasoning model, excelling at instruction following and tool calling with a 1M token context window. It outperforms GPT-4o across coding and multimodal tasks with a June 2024 knowledge cutoff. OpenAI OpenAI: GPT-4.1 Mini https://developer.puter.com/ai/openai/gpt-4.1-mini/ https://developer.puter.com/ai/openai/gpt-4.1-mini/ Mon, 14 Apr 2025 00:00:00 GMT GPT-4.1 Mini is a smaller, faster version of GPT-4.1 that matches or exceeds GPT-4o performance while reducing latency by nearly half and cost by 83%. It features a 1M token context window and strong coding capabilities. OpenAI OpenAI: GPT-4.1 Nano https://developer.puter.com/ai/openai/gpt-4.1-nano/ https://developer.puter.com/ai/openai/gpt-4.1-nano/ Mon, 14 Apr 2025 00:00:00 GMT GPT-4.1 Nano is OpenAI's fastest and cheapest model, designed for low-latency tasks like classification and autocompletion. It features a 1M token context window and scores 80.1% on MMLU despite its small size. OpenAI HiDream: I1-Dev https://developer.puter.com/ai/hidream-ai/hidream-i1-dev/ https://developer.puter.com/ai/hidream-ai/hidream-i1-dev/ Mon, 14 Apr 2025 00:00:00 GMT HiDream I1 Dev is a guidance-distilled, 17-billion-parameter text-to-image model from HiDream AI, built on a sparse Diffusion Transformer architecture with dynamic Mixture-of-Experts layers. It runs in approximately 28 diffusion steps, placing it between the Full and Fast variants in the speed-quality tradeoff. Because it is distillation-trained, negative prompts are not required — the classifier-free guidance scale should be set to 1.0 during sampling. I1 Dev is well-suited for iterative development workflows, concept exploration, and production pipelines where response time matters but image fidelity cannot be fully sacrificed. It also supports LoRAs for style control. HiDream HiDream: I1-Fast https://developer.puter.com/ai/hidream-ai/hidream-i1-fast/ https://developer.puter.com/ai/hidream-ai/hidream-i1-fast/ Mon, 14 Apr 2025 00:00:00 GMT HiDream I1 Fast is the lowest-latency variant of HiDream AI's 17-billion-parameter text-to-image model family, completing generation in as few as 14-16 diffusion steps. Like the Dev variant, it is distillation-trained and does not require negative prompts. It is the best-fit variant for latency-sensitive API integrations, delivering strong image quality for its step count. I1 Fast is the right choice for real-time generation features, high-throughput batch pipelines, or user-facing products where responsiveness is the primary constraint. Maximum fidelity is reserved for the Full variant. HiDream HiDream: I1-Full https://developer.puter.com/ai/hidream-ai/hidream-i1-full/ https://developer.puter.com/ai/hidream-ai/hidream-i1-full/ Mon, 14 Apr 2025 00:00:00 GMT HiDream I1 Full is the flagship text-to-image model from HiDream AI, a 17-billion-parameter sparse Diffusion Transformer that delivers the highest output quality in the I1 family through 50+ diffusion steps. On verified benchmarks, it scores 0.83 on GenEval (vs. 0.67 for DALL-E 3 and 0.66 for FLUX.1-dev), 85.89 on DPG-Bench for prompt adherence, and 33.82 on HPS v2.1 (vs. 32.47 for FLUX.1-dev and 31.44 for DALL-E 3). It supports negative prompts with a guidance scale of 5.0 for finer output control. I1 Full is the best choice when image quality, prompt fidelity, and detail richness are the top priorities — ideal for asset generation, creative production, and high-stakes visual content pipelines. HiDream AlfredPros: CodeLLaMa 7B Instruct Solidity https://developer.puter.com/ai/alfredpros/codellama-7b-instruct-solidity/ https://developer.puter.com/ai/alfredpros/codellama-7b-instruct-solidity/ Mon, 14 Apr 2025 00:00:00 GMT CodeLLaMa 7B Instruct Solidity is a fine-tuned code generation model specialized in writing Solidity smart contracts from natural language instructions. Built by AlfredPros on top of Meta's CodeLlama 7B Instruct base, it was trained using 4-bit QLoRA on a curated dataset of 6,003 human instruction and Solidity source code pairs. The model is purpose-built for blockchain and Web3 development workflows — you describe what a smart contract should do in plain English, and it generates the corresponding Solidity code. This makes it a lightweight, focused option for teams building dApps, DAOs, or other on-chain tooling. With 7 billion parameters and a 4K context window, it's a compact model that prioritizes speed and efficiency over broad generality. Best suited for developers who need fast, domain-specific Solidity generation rather than general-purpose coding assistance. AlfredPros OpenGVLab: InternVL3 78B https://developer.puter.com/ai/opengvlab/internvl3-78b/ https://developer.puter.com/ai/opengvlab/internvl3-78b/ Thu, 10 Apr 2025 00:00:00 GMT InternVL3 78B is an open-source multimodal large language model developed by OpenGVLab, combining a 6B vision transformer with a 72.7B Qwen2.5 language backbone. It is the flagship of the InternVL3 series and achieves state-of-the-art performance among open-source multimodal models. The model excels at visual reasoning, document understanding, OCR, chart interpretation, and video comprehension. On the MMMU benchmark it scores 72.2%, surpassing GPT-4o (70.7%), and on MathVista it reaches approximately 79.0 compared to GPT-4o's 63.8. It also achieves an OCRBench score of 906. A key differentiator is its native multimodal pre-training approach, which trains vision and language capabilities together from the start rather than retrofitting vision onto a text-only model. This actually improves text performance over the base Qwen2.5, making it a strong choice for developers who need both visual and textual reasoning in a single model with a 32,768-token context window. OpenGVLab NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 https://developer.puter.com/ai/nvidia/llama-3.1-nemotron-ultra-253b-v1/ https://developer.puter.com/ai/nvidia/llama-3.1-nemotron-ultra-253b-v1/ Mon, 07 Apr 2025 00:00:00 GMT Llama 3.1 Nemotron Ultra 253B is a 253B parameter reasoning model derived from Llama 3.1 405B using Neural Architecture Search for improved efficiency, supporting 128K context and toggle ON/OFF reasoning modes. It excels at complex math, scientific reasoning, coding, RAG, and tool calling tasks while fitting on a single 8xH100 node. NVIDIA Essential AI: Rnj 1 Instruct https://developer.puter.com/ai/essentialai/rnj-1-instruct/ https://developer.puter.com/ai/essentialai/rnj-1-instruct/ Mon, 07 Apr 2025 00:00:00 GMT Rnj-1 Instruct is an 8B-parameter instruction-tuned model built by Essential AI, trained from scratch and optimized for code, STEM reasoning, and agentic workflows. It offers a 32K context window and is released under Apache 2.0. The model punches well above its weight class in agentic coding, scoring 20.8% on SWE-bench Verified — surpassing Gemini 2.0 Flash and Qwen2.5-Coder 32B Instruct under the same framework. It also posts strong marks across code generation (83.5% HumanEval+, 57.1% BigCodeBench) and function calling (62.2% BFCL v3). Math and science capabilities are equally competitive: 92.6% on GSM8K, 43.3% on AIME '25, and solid results on GPQA-Diamond. Its post-training was deliberately kept minimal, making it a strong base for further fine-tuning. A great fit for developers building coding agents, tool-use pipelines, or STEM-focused applications on a budget. Essential AI Meta Llama: Llama 4 Maverick https://developer.puter.com/ai/meta-llama/llama-4-maverick/ https://developer.puter.com/ai/meta-llama/llama-4-maverick/ Sat, 05 Apr 2025 00:00:00 GMT Llama 4 Maverick is Meta's 400 billion total parameter MoE model with 17B active parameters and 128 experts, supporting 1M token context. It's natively multimodal with state-of-the-art performance on coding, reasoning, and image understanding tasks. Meta Llama Meta Llama: Llama 4 Scout https://developer.puter.com/ai/meta-llama/llama-4-scout/ https://developer.puter.com/ai/meta-llama/llama-4-scout/ Sat, 05 Apr 2025 00:00:00 GMT Llama 4 Scout is Meta's efficient 109 billion parameter MoE model with 17B active parameters and 16 experts, featuring an industry-leading 10M token context window. It fits on a single H100 GPU and handles multimodal text and image inputs. Meta Llama Meta Llama: Llama Guard 4 12B https://developer.puter.com/ai/meta-llama/llama-guard-4-12b/ https://developer.puter.com/ai/meta-llama/llama-guard-4-12b/ Sat, 05 Apr 2025 00:00:00 GMT Llama Guard 4 12B is Meta's 12 billion parameter multimodal safety model that moderates both text and image inputs across 12 languages. It was built from Llama 4 Scout and detects violations based on the MLCommons hazard taxonomy. Meta Llama Qwen: Qwen3 14B https://developer.puter.com/ai/qwen/qwen3-14b/ https://developer.puter.com/ai/qwen/qwen3-14b/ Tue, 01 Apr 2025 00:00:00 GMT Qwen3 14B is a dense language model with hybrid thinking/non-thinking modes, matching Qwen2.5-32B performance. It supports 119 languages and excels in math, coding, and reasoning tasks. Qwen Qwen: Qwen3 235B A22B https://developer.puter.com/ai/qwen/qwen3-235b-a22b/ https://developer.puter.com/ai/qwen/qwen3-235b-a22b/ Tue, 01 Apr 2025 00:00:00 GMT Qwen3 235B A22B is the flagship MoE model with 235B total and 22B active parameters, rivaling DeepSeek-R1 and o1. It features hybrid thinking modes and supports 119 languages with strong agentic capabilities. Qwen Qwen: Qwen3 32B https://developer.puter.com/ai/qwen/qwen3-32b/ https://developer.puter.com/ai/qwen/qwen3-32b/ Tue, 01 Apr 2025 00:00:00 GMT Qwen3 32B is a dense language model matching Qwen2.5-72B performance with hybrid thinking/non-thinking modes. It excels in STEM, coding, and reasoning while supporting 119 languages. Qwen Qwen: Qwen3 8B https://developer.puter.com/ai/qwen/qwen3-8b/ https://developer.puter.com/ai/qwen/qwen3-8b/ Tue, 01 Apr 2025 00:00:00 GMT Qwen3 8B is a dense model matching Qwen2.5-14B performance with hybrid thinking modes and 128K context. It offers strong reasoning, coding, and multilingual capabilities in a mid-sized package. Qwen Qwen: Qwen3 Coder 480B A35B https://developer.puter.com/ai/qwen/qwen3-coder-480b-a35b-instruct/ https://developer.puter.com/ai/qwen/qwen3-coder-480b-a35b-instruct/ Tue, 01 Apr 2025 00:00:00 GMT Qwen3 Coder is the most agentic code model in the Qwen series, available in 30B and 480B MoE variants. It achieves SOTA on SWE-Bench with 256K native context, extendable to 1M tokens. Qwen Qwen: Qwen3 Coder 30B A3B Instruct https://developer.puter.com/ai/qwen/qwen3-coder-30b-a3b-instruct/ https://developer.puter.com/ai/qwen/qwen3-coder-30b-a3b-instruct/ Tue, 01 Apr 2025 00:00:00 GMT Qwen3 Coder 30B A3B Instruct is an efficient MoE coding model with 30B total and 3.3B active parameters, offering strong agentic coding capabilities with 256K context support. Qwen Qwen: Qwen3 VL 235B A22B Instruct https://developer.puter.com/ai/qwen/qwen3-vl-235b-a22b/ https://developer.puter.com/ai/qwen/qwen3-vl-235b-a22b/ Tue, 01 Apr 2025 00:00:00 GMT Qwen3 VL 235B A22B Instruct is the flagship vision-language MoE model with 256K context, offering superior visual coding, spatial understanding, and long video comprehension up to 20 minutes. Qwen Qwen: Qwen3-VL 30B-A3B https://developer.puter.com/ai/qwen/qwen3-vl-30b-a3b/ https://developer.puter.com/ai/qwen/qwen3-vl-30b-a3b/ Tue, 01 Apr 2025 00:00:00 GMT Qwen3-VL 30B-A3B is a compact mixture-of-experts vision-language model from Alibaba's Qwen team, with 30B total parameters and only 3B active per token for efficient inference. It supports image and text inputs with a 131K context window and delivers strong multimodal performance on benchmarks including MMMU and visual-math evaluations. Capabilities include document and chart understanding, OCR, visual coding (generating HTML/CSS/JS from images), 2D spatial grounding, and GUI agent tasks across desktop and mobile interfaces. The MoE architecture gives it the knowledge breadth of a much larger model while matching the latency and cost profile of a 3B dense model — making it a practical choice for developers who need reliable vision-language capabilities without the compute cost of the 235B flagship variant. Supports tool calling. Qwen Inception: Mercury Coder https://developer.puter.com/ai/inception/mercury-coder/ https://developer.puter.com/ai/inception/mercury-coder/ Mon, 31 Mar 2025 00:00:00 GMT Mercury Coder is a code-specialized diffusion language model from Inception Labs, built on the same parallel token refinement architecture as Mercury. It is available in Mini and Small sizes. On fill-in-the-middle tasks, Mercury Coder Small scored 84.8% average accuracy, exceeding Codestral 2501 (82.5%). On MultiPL-E, it reaches 82.0% in C++, 83.9% in JavaScript, and 82.6% in TypeScript. In Copilot Arena human evaluations, Mercury Coder Mini ranked second in user preference with an average latency of just 25 milliseconds. It is the go-to choice for real-time code completion, autocomplete, and apply-edit workflows where both speed and accuracy are critical. Inception MiniMax: MiniMax Video-01 Director https://developer.puter.com/ai/minimax/video-01-director/ https://developer.puter.com/ai/minimax/video-01-director/ Fri, 28 Mar 2025 00:00:00 GMT MiniMax Video-01 Director is an AI video generation model that specializes in creating HD videos with precise cinematic camera control. It supports 720p resolution at 25fps and generates clips up to 5 seconds, allowing users to specify camera movements like pans, zooms, and tracking shots through natural language or bracketed commands. The model significantly reduces movement randomness compared to standard video models, enabling more accurate and intentional storytelling. MiniMax Qwen: QVQ Max https://developer.puter.com/ai/qwen/qvq-max/ https://developer.puter.com/ai/qwen/qvq-max/ Tue, 25 Mar 2025 00:00:00 GMT QVQ Max is Alibaba's flagship visual reasoning model, built by the Qwen team to combine deep multimodal understanding with rigorous logical inference. Unlike standard vision-language models, QVQ Max is designed to think through what it sees — analyzing charts, diagrams, math problems, and everyday images step by step before responding. It scores 70.3% on MMMU and 71.4% on MathVista (mini), placing it among the top multimodal reasoning models available via API. The model handles text and image inputs across a 131K token context window and supports tool calling for agentic workflows. Ideal for developers building tutoring tools, visual data analysis pipelines, document understanding systems, or any application that requires both image comprehension and structured reasoning. Qwen OpenAI: Sora 2 https://developer.puter.com/ai/openai/sora-2/ https://developer.puter.com/ai/openai/sora-2/ Tue, 25 Mar 2025 00:00:00 GMT Sora 2 is OpenAI's video and audio generation model designed for speed and flexibility, ideal for rapid iteration, concepting, and social media content where quick turnaround matters more than ultra-high fidelity. It generates videos from text prompts or images with synchronized dialogue and sound effects. OpenAI OpenAI: Sora 2 Pro https://developer.puter.com/ai/openai/sora-2-pro/ https://developer.puter.com/ai/openai/sora-2-pro/ Tue, 25 Mar 2025 00:00:00 GMT Sora 2 Pro is OpenAI's state-of-the-art, most advanced media generation model that produces higher quality, more polished and stable video results with synced audio. It takes longer to render and costs more, but is best for high-resolution cinematic footage, marketing assets, and production-quality output where visual precision is critical. OpenAI AI21 Labs: Jamba Mini 1.7 https://developer.puter.com/ai/ai21/jamba-mini-1.7/ https://developer.puter.com/ai/ai21/jamba-mini-1.7/ Tue, 25 Mar 2025 00:00:00 GMT Jamba Mini 1.7 is a compact, efficiency-focused model from AI21 Labs, sharing the same hybrid SSM-Transformer architecture as its larger sibling but with just 12B active parameters (52B total) in a Mixture of Experts configuration. It retains the full 256K-token context window and supports function calling, making it capable of handling long-document tasks at a fraction of the cost — priced at $0.20 per million input tokens and $0.40 per million output tokens. Like Jamba Large 1.7, this version improves on grounding and instruction-following over earlier releases. It's a practical choice for cost-sensitive production workloads, high-volume pipelines, and use cases where speed and low latency matter more than peak reasoning power. AI21 Labs Qwen: Qwen2.5 VL 32B Instruct https://developer.puter.com/ai/qwen/qwen2.5-vl-32b-instruct/ https://developer.puter.com/ai/qwen/qwen2.5-vl-32b-instruct/ Mon, 24 Mar 2025 00:00:00 GMT Qwen 2.5 VL 32B Instruct is a mid-sized vision-language model offering enhanced image/video understanding with better alignment to human preferences. It bridges the gap between 7B and 72B variants. Qwen DeepSeek: DeepSeek V3 0324 https://developer.puter.com/ai/deepseek/deepseek-chat-v3-0324/ https://developer.puter.com/ai/deepseek/deepseek-chat-v3-0324/ Mon, 24 Mar 2025 00:00:00 GMT DeepSeek V3-0324 is the March 2025 update to DeepSeek V3, incorporating reinforcement learning techniques from R1 to significantly improve reasoning, coding, and frontend development capabilities. It became the first open-source model to outperform all proprietary non-reasoning models on benchmarks, exceeding GPT-4.5 in math and coding tasks. DeepSeek Google: Gemini 2.5 Flash https://developer.puter.com/ai/google/gemini-2.5-flash/ https://developer.puter.com/ai/google/gemini-2.5-flash/ Thu, 20 Mar 2025 00:00:00 GMT Gemini 2.5 Flash is Google's hybrid reasoning model balancing speed, cost, and intelligence with controllable thinking capabilities. It supports up to 1M tokens and excels at summarization, chat applications, and data extraction at scale. Google Google: Gemini 2.5 Pro https://developer.puter.com/ai/google/gemini-2.5-pro/ https://developer.puter.com/ai/google/gemini-2.5-pro/ Thu, 20 Mar 2025 00:00:00 GMT Gemini 2.5 Pro is Google's most capable reasoning model with state-of-the-art performance on coding and complex tasks. It features a 1M token context window, advanced multimodal understanding, and Deep Think mode for enhanced reasoning. Google OpenAI: OpenAI o1 Pro https://developer.puter.com/ai/openai/o1-pro/ https://developer.puter.com/ai/openai/o1-pro/ Wed, 19 Mar 2025 00:00:00 GMT OpenAI o1 Pro is a version of o1 with more compute for better responses, designed to think longer and provide the most reliable answers. It's the most expensive model at $150/1M input tokens. OpenAI Mistral AI: Mistral Small 3.1 https://developer.puter.com/ai/mistralai/mistral-small-3.1-24b-instruct/ https://developer.puter.com/ai/mistralai/mistral-small-3.1-24b-instruct/ Mon, 17 Mar 2025 00:00:00 GMT Mistral Small 3.1 is a 24B multimodal model with 128K context, supporting text and image inputs. It outperforms GPT-4o Mini and Gemma 3 while delivering 150 tokens/second, released under Apache 2.0 for commercial use. Mistral AI Ideogram: Ideogram 3.0 https://developer.puter.com/ai/ideogram/ideogram-3.0/ https://developer.puter.com/ai/ideogram/ideogram-3.0/ Thu, 13 Mar 2025 00:00:00 GMT Ideogram 3.0 is a text-to-image generation model from Ideogram AI, built by a team of ex-Google engineers and launched in March 2025. It specializes in photorealistic image generation with industry-leading text rendering — producing accurate, stylized typography within images that competing models like Midjourney and DALL-E 3 struggle to match. The model excels at graphic design tasks including posters, logos, marketing visuals, and layouts with complex or lengthy text compositions. It also supports Style References, allowing up to three reference images to guide output aesthetics for consistent branding across batches. In human evaluations, Ideogram 3.0 achieved the highest ELO ratings against other text-to-image models across diverse prompts covering varied subjects, styles, and composition difficulty. It's a strong fit for developers building design, advertising, or content-generation pipelines where typographic accuracy and prompt adherence are critical. Ideogram Google: Gemma 3 12B https://developer.puter.com/ai/google/gemma-3-12b-it/ https://developer.puter.com/ai/google/gemma-3-12b-it/ Thu, 13 Mar 2025 00:00:00 GMT Gemma 3 12B Instruct is Google's mid-sized open multimodal model supporting text and image input with a 128K token context window. It supports 140+ languages and offers strong performance for single-GPU deployment. Google Cohere: Command A https://developer.puter.com/ai/cohere/command-a/ https://developer.puter.com/ai/cohere/command-a/ Thu, 13 Mar 2025 00:00:00 GMT Command A is Cohere's flagship enterprise language model with 111 billion parameters and a 256K token context window, released in March 2025. Built for complex agentic workflows, it leads on tool-use benchmarks including BFCL-v3 and Tau-bench, and performs on par with GPT-4o on MMLU and SQL tasks. It is particularly strong at multi-step tool calling — including knowing when not to invoke a tool, a critical quality for production agents. Supporting 23 languages with 150% higher throughput than Command R+, it's a strong choice for developers building RAG pipelines, autonomous agents, or multilingual enterprise applications. Cohere Reka AI: Reka Flash 3 https://developer.puter.com/ai/rekaai/reka-flash-3/ https://developer.puter.com/ai/rekaai/reka-flash-3/ Wed, 12 Mar 2025 00:00:00 GMT Reka Flash 3 is a 21-billion-parameter reasoning model developed by Reka AI, designed as a compact but capable general-purpose LLM. It excels at chat, coding, instruction following, and function calling. The model uses chain-of-thought reasoning via explicit thinking tags, and supports a "budget forcing" mechanism that lets you cap reasoning steps to control latency. It offers a 130K-token context window and is text-only (no image input). Reka AI positions it as competitive with OpenAI's o1-mini while being significantly smaller. It scores 65.0 on MMLU-Pro — modest for knowledge-heavy tasks, so pairing it with search or retrieval is recommended. It's primarily English-focused. Priced at $0.20 per million input tokens and $0.80 per million output tokens via the Reka API, it's a cost-effective option for developers who need solid reasoning at low cost. Reka AI OpenAI: GPT-4o Mini Search Preview https://developer.puter.com/ai/openai/gpt-4o-mini-search-preview/ https://developer.puter.com/ai/openai/gpt-4o-mini-search-preview/ Wed, 12 Mar 2025 00:00:00 GMT GPT-4o Mini Search Preview is a specialized model trained for web search queries in the Chat Completions API. It's a fast, affordable option for search-enabled applications. OpenAI OpenAI: GPT-4o Search Preview https://developer.puter.com/ai/openai/gpt-4o-search-preview/ https://developer.puter.com/ai/openai/gpt-4o-search-preview/ Wed, 12 Mar 2025 00:00:00 GMT GPT-4o Search Preview is a specialized model for web search in Chat Completions, trained to understand and execute search queries. It returns responses with embedded citations and source references. OpenAI Google: Gemma 3 27B https://developer.puter.com/ai/google/gemma-3-27b-it/ https://developer.puter.com/ai/google/gemma-3-27b-it/ Wed, 12 Mar 2025 00:00:00 GMT Gemma 3 27B Instruct is Google's most capable single-GPU open model with multimodal support, 128K context, and 140+ language support. It outperforms many larger models and offers state-of-the-art open-weight performance. Google Google: Gemma 3 4B https://developer.puter.com/ai/google/gemma-3-4b-it/ https://developer.puter.com/ai/google/gemma-3-4b-it/ Wed, 12 Mar 2025 00:00:00 GMT Gemma 3 4B Instruct is Google's compact multimodal open model supporting text and images with a 128K token context window. It's optimized for deployment on laptops and edge devices while maintaining strong capabilities. Google Allen AI: Olmo 2 32B Instruct https://developer.puter.com/ai/allenai/olmo-2-0325-32b-instruct/ https://developer.puter.com/ai/allenai/olmo-2-0325-32b-instruct/ Wed, 12 Mar 2025 00:00:00 GMT OLMo 2 32B Instruct is a fully open, 32-billion-parameter instruction-tuned language model from the Allen Institute for AI (AI2), post-trained using supervised fine-tuning, DPO, and RLVR. It is the first fully open model to outperform both GPT-3.5 Turbo and GPT-4o mini across popular multi-skill academic benchmarks including GSM8K, MATH, and IFEval. The model supports a 128K token context window and targets math reasoning, instruction-following, and general chat. Released under Apache 2.0 with full transparency into training data, code, and weights, it's a strong choice for developers who need a capable, commercially permissive instruction model. Allen AI TheDrummer: Skyfall 36B V2 https://developer.puter.com/ai/thedrummer/skyfall-36b-v2/ https://developer.puter.com/ai/thedrummer/skyfall-36b-v2/ Mon, 10 Mar 2025 00:00:00 GMT Skyfall 36B v2 is a 36-billion parameter model that upscales Mistral Small 2501 with specialized training for creativity, roleplay, and coherent storytelling. Users report it rivals or exceeds 70B parameter models in creative writing quality while remaining accessible for local deployment with strong chain-of-thought reasoning and tool use capabilities. It features a 32K token context window and supports Mistral v7 Tekken, Metharme, and Alpaca chat templates. TheDrummer Perplexity: Sonar Deep Research https://developer.puter.com/ai/perplexity/sonar-deep-research/ https://developer.puter.com/ai/perplexity/sonar-deep-research/ Fri, 07 Mar 2025 00:00:00 GMT Sonar Deep Research is Perplexity's expert-level research model designed for exhaustive multi-step retrieval, synthesizing hundreds of sources into comprehensive reports. It autonomously searches, reads, and evaluates sources while refining its approach for in-depth analysis across domains like finance, technology, and health. Ideal for detailed market analyses, literature reviews, and projects requiring synthesis of multiple information sources. Perplexity Perplexity: Sonar Pro https://developer.puter.com/ai/perplexity/sonar-pro/ https://developer.puter.com/ai/perplexity/sonar-pro/ Fri, 07 Mar 2025 00:00:00 GMT Sonar Pro is Perplexity's advanced search model with a 200K token context window, delivering 2x more citations and search results than standard Sonar for complex queries. It handles in-depth, multi-step queries with enhanced content understanding and supports longer, more nuanced follow-up conversations. Best for enterprise applications requiring deeper research and comprehensive source attribution. Perplexity Perplexity: Sonar Reasoning Pro https://developer.puter.com/ai/perplexity/sonar-reasoning-pro/ https://developer.puter.com/ai/perplexity/sonar-reasoning-pro/ Fri, 07 Mar 2025 00:00:00 GMT Sonar Reasoning Pro is a premier reasoning model powered by DeepSeek R1 with Chain of Thought (CoT), designed for complex multi-step analysis and logical problem-solving. It excels at tasks requiring step-by-step thinking, strict instruction adherence, and information synthesis across sources with built-in web search. Ranked among the top models in Search Arena evaluations, statistically tied with Gemini-2.5-Pro-Grounding. Perplexity TNG Technology: DeepSeek R1T Chimera https://developer.puter.com/ai/tngtech/deepseek-r1t-chimera/ https://developer.puter.com/ai/tngtech/deepseek-r1t-chimera/ Thu, 06 Mar 2025 00:00:00 GMT DeepSeek R1T Chimera is a 685B parameter model created by TNG Tech that merges DeepSeek-R1's reasoning capabilities with DeepSeek-V3's token efficiency. It uses 40% fewer output tokens than R1 while maintaining similar intelligence, constructed via a novel Assembly-of-Experts method rather than fine-tuning. TNG Technology Qwen: QwQ 32B https://developer.puter.com/ai/qwen/qwq-32b/ https://developer.puter.com/ai/qwen/qwq-32b/ Thu, 06 Mar 2025 00:00:00 GMT QwQ 32B is a 32B parameter reasoning model rivaling DeepSeek-R1 (671B) through scaled reinforcement learning. It excels in math, coding, and complex reasoning with 131K context and agent capabilities. Qwen xAI: Grok 3 Beta https://developer.puter.com/ai/x-ai/grok-3-beta/ https://developer.puter.com/ai/x-ai/grok-3-beta/ Wed, 05 Mar 2025 00:00:00 GMT Grok 3 Beta is the API-accessible version of Grok 3, xAI's most advanced model with superior reasoning, mathematics, coding, and world knowledge capabilities refined through large-scale reinforcement learning. It supports enterprise data extraction, coding, and text summarization tasks. xAI xAI: Grok 3 Mini Beta https://developer.puter.com/ai/x-ai/grok-3-mini-beta/ https://developer.puter.com/ai/x-ai/grok-3-mini-beta/ Wed, 05 Mar 2025 00:00:00 GMT Grok 3 Mini Beta is the API version of Grok 3 Mini, a cost-efficient lightweight reasoning model with configurable reasoning effort (low/high) parameters. It excels at STEM tasks and logic-based problems while exposing transparent thinking traces. xAI Qwen: QwQ Plus https://developer.puter.com/ai/qwen/qwq-plus/ https://developer.puter.com/ai/qwen/qwq-plus/ Wed, 05 Mar 2025 00:00:00 GMT QwQ Plus is a proprietary reasoning model from Alibaba's Qwen team, serving as the hosted API counterpart to the open-weight QwQ-32B release. Like QwQ-32B, it uses reinforcement learning to develop extended chain-of-thought reasoning, excelling at math competition problems, scientific reasoning, and complex coding tasks. QwQ-32B achieved 79.5% on AIME 2024, 90.6% on MATH-500, and 63.4% on LiveCodeBench — rivaling much larger models. QwQ Plus exposes these capabilities through a managed API endpoint with a 131K token context window and tool call support. Best suited for developers building applications that require step-by-step mathematical reasoning, algorithmic problem-solving, or multi-step logical inference. Qwen Nous Research: DeepHermes 3 Mistral 24B Preview https://developer.puter.com/ai/nousresearch/deephermes-3-mistral-24b-preview/ https://developer.puter.com/ai/nousresearch/deephermes-3-mistral-24b-preview/ Sun, 02 Mar 2025 00:00:00 GMT DeepHermes 3 Mistral 24B Preview is a 24B parameter instruction-tuned model based on Mistral-Small-24B, featuring a dual-mode system that toggles between intuitive chat responses and deep reasoning mode with extended chains of thought. It excels at function calling, structured JSON outputs, and multi-turn reasoning with the ability to use up to 13,000 tokens for complex problems. Nous Research OpenAI: GPT-4.5 Preview (Deprecated) https://developer.puter.com/ai/openai/gpt-4.5-preview/ https://developer.puter.com/ai/openai/gpt-4.5-preview/ Thu, 27 Feb 2025 00:00:00 GMT GPT-4.5 Preview was OpenAI's largest pre-trained model focused on scaling unsupervised learning for improved creativity and reduced hallucinations. It has been deprecated in favor of GPT-4.1 and GPT-5 models. OpenAI Google: Gemini 2.0 Flash Lite https://developer.puter.com/ai/google/gemini-2.0-flash-lite-001/ https://developer.puter.com/ai/google/gemini-2.0-flash-lite-001/ Tue, 25 Feb 2025 00:00:00 GMT Gemini 2.0 Flash-Lite 001 is a stable versioned release of Google's most cost-efficient model. It's optimized for large-scale text tasks with simplified pricing and consistent behavior for production use. Google Inception: Mercury https://developer.puter.com/ai/inception/mercury/ https://developer.puter.com/ai/inception/mercury/ Mon, 24 Feb 2025 00:00:00 GMT Mercury is the world's first commercial-scale diffusion large language model from Inception Labs. It generates text through iterative parallel refinement rather than sequential token prediction, enabling dramatically higher throughput without sacrificing output quality. It matches the performance of frontier speed-optimized models such as GPT-4o Mini and Gemini 1.5 Flash across knowledge, coding, instruction-following, and math benchmarks, while running up to 10x faster. It is OpenAI API-compatible for straightforward integration. Mercury is well-suited for API use cases that demand high concurrency, fast response times, or cost efficiency — including chat, summarization, and general-purpose text generation at scale. Inception Anthropic: Claude 3.7 Sonnet (Thinking) https://developer.puter.com/ai/anthropic/claude-3.7-sonnet:thinking/ https://developer.puter.com/ai/anthropic/claude-3.7-sonnet:thinking/ Mon, 24 Feb 2025 00:00:00 GMT Claude 3.7 Sonnet (Thinking Mode) is Claude 3.7 Sonnet with extended thinking enabled by default. It excels at advanced math, competitive programming, and complex problem-solving by showing visible step-by-step reasoning. Anthropic Kling: Kling 2.0 Master https://developer.puter.com/ai/kwaivgi/kling-2.0-master/ https://developer.puter.com/ai/kwaivgi/kling-2.0-master/ Thu, 20 Feb 2025 00:00:00 GMT Kling 2.0 Master is Kuaishou's flagship model from the 2.0 generation, delivering 1080p cinema-grade video with 3D spatiotemporal joint attention for realistic motion and physics simulation. It marked a major leap in visual realism and semantic understanding, supporting up to 5-second videos at 24fps with a multi-elements editor for flexible scene control. Kling Anthropic: Claude 3.7 Sonnet https://developer.puter.com/ai/anthropic/claude-3-7-sonnet/ https://developer.puter.com/ai/anthropic/claude-3-7-sonnet/ Wed, 19 Feb 2025 00:00:00 GMT Claude 3.7 Sonnet is Anthropic's first hybrid reasoning model, released February 2025. It combines instant responses with an extended thinking mode where users can control the "thinking budget" to balance speed vs. depth. Anthropic xAI: Grok 3 https://developer.puter.com/ai/x-ai/grok-3/ https://developer.puter.com/ai/x-ai/grok-3/ Mon, 17 Feb 2025 00:00:00 GMT Grok 3 is xAI's flagship model launched February 2025, trained with 10x more compute on the Colossus supercluster with 200,000 GPUs. It features advanced reasoning through reinforcement learning, deep domain knowledge in finance/healthcare/law/science, and a 131K token context window. xAI xAI: Grok 3 Fast https://developer.puter.com/ai/x-ai/grok-3-fast/ https://developer.puter.com/ai/x-ai/grok-3-fast/ Mon, 17 Feb 2025 00:00:00 GMT Grok 3 Fast is a latency-optimized variant of Grok 3 using the same underlying model but served on faster infrastructure. It delivers quicker response times for latency-sensitive applications while maintaining equivalent reasoning quality and 131K context window. xAI xAI: Grok 3 Mini https://developer.puter.com/ai/x-ai/grok-3-mini/ https://developer.puter.com/ai/x-ai/grok-3-mini/ Mon, 17 Feb 2025 00:00:00 GMT Grok 3 Mini is a lightweight, cost-efficient reasoning model that thinks before responding, ideal for logic-based tasks that don't require deep domain knowledge. It features configurable reasoning effort and exposes accessible thinking traces for transparency. xAI xAI: Grok 3 Mini Fast https://developer.puter.com/ai/x-ai/grok-3-mini-fast/ https://developer.puter.com/ai/x-ai/grok-3-mini-fast/ Mon, 17 Feb 2025 00:00:00 GMT Grok 3 Mini Fast is the speed-optimized variant of Grok 3 Mini, running on faster infrastructure for significantly quicker response times. It provides identical reasoning quality to Grok 3 Mini but is designed for latency-sensitive applications. xAI Mistral AI: Mistral Saba https://developer.puter.com/ai/mistralai/mistral-saba/ https://developer.puter.com/ai/mistralai/mistral-saba/ Mon, 17 Feb 2025 00:00:00 GMT Mistral Saba is a 24B parameter regional model trained for Arabic and South Asian languages including Tamil and Malayalam. It outperforms models 5x its size on Arabic benchmarks while providing culturally relevant responses. Mistral AI OpenAI: OpenAI o3 Mini High https://developer.puter.com/ai/openai/o3-mini-high/ https://developer.puter.com/ai/openai/o3-mini-high/ Wed, 12 Feb 2025 00:00:00 GMT OpenAI o3 Mini High is a higher-intelligence version of o3-mini that takes longer to generate more accurate responses. It uses high reasoning effort for complex STEM and coding tasks. OpenAI Meta Llama: Llama Guard 3 8B https://developer.puter.com/ai/meta-llama/llama-guard-3-8b/ https://developer.puter.com/ai/meta-llama/llama-guard-3-8b/ Wed, 12 Feb 2025 00:00:00 GMT Llama Guard 3 8B is Meta's enhanced safety moderation model providing content classification in 8 languages with support for tool call safety. It detects 14 hazard categories and integrates with Llama 3.1 for comprehensive AI safety. Meta Llama Google: Gemini 2.0 Flash https://developer.puter.com/ai/google/gemini-2.0-flash-001/ https://developer.puter.com/ai/google/gemini-2.0-flash-001/ Wed, 05 Feb 2025 00:00:00 GMT Gemini 2.0 Flash 001 is a stable versioned release of Gemini 2.0 Flash, Google's fast multimodal workhorse model. It provides consistent behavior for production deployments with native tool use and 1M token context support. Google Aion Labs: Aion-1.0 https://developer.puter.com/ai/aion-labs/aion-1.0/ https://developer.puter.com/ai/aion-labs/aion-1.0/ Tue, 04 Feb 2025 00:00:00 GMT Aion 1.0 is AionLabs' most powerful reasoning model, a multi-model system built on DeepSeek-R1 and augmented with Tree of Thoughts (ToT) and Mixture of Experts (MoE) techniques. It supports a 131K context window with up to 32K output tokens and includes vision capabilities. The model excels at reasoning and coding tasks, scoring 96.0% on coding, 99.5% on general knowledge, and achieving perfect accuracy on reasoning and email classification benchmarks (Benchable). It also ranks among the fastest models at its price point. Best suited for developers who need strong reasoning, coding assistance, and classification at competitive throughput. Aion Labs Aion Labs: Aion-1.0-Mini https://developer.puter.com/ai/aion-labs/aion-1.0-mini/ https://developer.puter.com/ai/aion-labs/aion-1.0-mini/ Tue, 04 Feb 2025 00:00:00 GMT Aion 1.0 Mini is a 32B-parameter reasoning model from AionLabs, distilled from DeepSeek-R1 and based on a modified FuseAI variant. It is designed for strong performance in mathematics, coding, and logic at a fraction of the cost of full-scale models. It delivers standout speed and pricing, consistently ranking among the fastest and most affordable options available. On Benchable, it scored 99.0% on email classification and 82.0% on reasoning tasks. This model is a good fit for developers who need fast, budget-friendly reasoning for structured tasks and can work around its instruction-following limitations. Aion Labs Aion Labs: Aion-RP 1.0 (8B) https://developer.puter.com/ai/aion-labs/aion-rp-llama-3.1-8b/ https://developer.puter.com/ai/aion-labs/aion-rp-llama-3.1-8b/ Tue, 04 Feb 2025 00:00:00 GMT Aion RP 1.0 8B is an uncensored roleplay and creative writing model from AionLabs, fine-tuned from the Llama 3.1 8B base model rather than an instruct variant. This base-model approach is designed to produce more natural and varied writing. It ranks highest in the character evaluation portion of RPBench-Auto, a roleplaying-specific benchmark derived from Arena-Hard-Auto where LLMs evaluate each other's responses. The model supports the full 131K context window and multi-turn conversations. Best suited for character-driven chat applications, interactive storytelling, and persona-consistent dialogue. A recommended temperature of 0.7 is advised, as higher values can degrade output quality. Aion Labs Mistral AI: Mistral Small 3 https://developer.puter.com/ai/mistralai/mistral-small-24b-instruct-2501/ https://developer.puter.com/ai/mistralai/mistral-small-24b-instruct-2501/ Thu, 30 Jan 2025 00:00:00 GMT Mistral Small 3 is a 24B parameter latency-optimized model achieving ~81% MMLU accuracy at 150 tokens/second. It's designed for fast-response conversational agents and low-latency function calling under Apache 2.0. Mistral AI DeepSeek: R1 Distill Qwen 32B https://developer.puter.com/ai/deepseek/deepseek-r1-distill-qwen-32b/ https://developer.puter.com/ai/deepseek/deepseek-r1-distill-qwen-32b/ Wed, 29 Jan 2025 00:00:00 GMT DeepSeek R1 Distill Qwen 32B is a 32 billion parameter dense model fine-tuned from Qwen 2.5 using R1-generated reasoning data, achieving state-of-the-art results for dense models. It outperforms OpenAI o1-mini on various benchmarks while being efficient enough for local deployment. DeepSeek Perplexity: Sonar https://developer.puter.com/ai/perplexity/sonar/ https://developer.puter.com/ai/perplexity/sonar/ Mon, 27 Jan 2025 00:00:00 GMT Sonar is Perplexity's lightweight, cost-effective search model built on Llama 3.3 70B, optimized for speed (1200 tokens/second) and quick factual queries. It provides real-time web search with grounding and citations, ideal for simple Q&A and straightforward integrations. Best for everyday use cases where fast, accurate answers are needed without complex reasoning. Perplexity Qwen: Qwen2.5 VL 72B Instruct https://developer.puter.com/ai/qwen/qwen2.5-vl-72b-instruct/ https://developer.puter.com/ai/qwen/qwen2.5-vl-72b-instruct/ Sun, 26 Jan 2025 00:00:00 GMT Qwen 2.5 VL 72B Instruct is the flagship open-source vision-language model excelling in document understanding, visual reasoning, and long video comprehension up to 1 hour with event pinpointing. Qwen DeepSeek: DeepSeek Reasoner https://developer.puter.com/ai/deepseek/deepseek-reasoner/ https://developer.puter.com/ai/deepseek/deepseek-reasoner/ Mon, 20 Jan 2025 00:00:00 GMT DeepSeek Reasoner is the API alias for DeepSeek's reasoning models (R1 series), which use chain-of-thought reasoning to solve complex math, coding, and logic problems. It displays its thinking process before arriving at answers and achieves performance comparable to OpenAI o1. DeepSeek DeepSeek: R1 https://developer.puter.com/ai/deepseek/deepseek-r1/ https://developer.puter.com/ai/deepseek/deepseek-r1/ Mon, 20 Jan 2025 00:00:00 GMT DeepSeek R1 is DeepSeek's first-generation reasoning model released January 2025, trained via large-scale reinforcement learning to achieve performance comparable to OpenAI o1 on math, code, and reasoning tasks. It pioneered open-source reasoning capabilities with self-verification and reflection behaviors. DeepSeek DeepSeek: R1 Distill Llama 70B https://developer.puter.com/ai/deepseek/deepseek-r1-distill-llama-70b/ https://developer.puter.com/ai/deepseek/deepseek-r1-distill-llama-70b/ Mon, 20 Jan 2025 00:00:00 GMT DeepSeek R1 Distill Llama 70B is a 70 billion parameter dense model fine-tuned from Llama 3.3-70B-Instruct using 800K reasoning samples generated by DeepSeek R1. It brings R1's reasoning capabilities to a more accessible size while maintaining strong performance on math and coding benchmarks. DeepSeek Qwen: Qwen-Omni Turbo https://developer.puter.com/ai/qwen/qwen-omni-turbo/ https://developer.puter.com/ai/qwen/qwen-omni-turbo/ Sun, 19 Jan 2025 00:00:00 GMT Qwen-Omni Turbo is Alibaba's cost-optimized omnimodal API model, built to process text, image, audio, and video inputs and return text responses in a single unified interface. It is the lighter, faster tier in the Qwen-Omni family, designed for developers who need full multimodal coverage at lower latency and cost than the flagship Qwen-Omni model. Audio files up to 40 seconds and video files up to 150 MB are supported, spanning common formats such as MP3, WAV, MP4, and MOV. The model handles tool calling natively and is accessible via an OpenAI-compatible API. Best suited for developers building applications that need to reason across mixed media inputs — such as audio transcription pipelines, video understanding workflows, or multimodal chatbots — where throughput and cost efficiency matter. Qwen Moonshot AI: Moonshot v1 8K Vision (Preview) https://developer.puter.com/ai/moonshotai/moonshot-v1-8k-vision-preview/ https://developer.puter.com/ai/moonshotai/moonshot-v1-8k-vision-preview/ Wed, 15 Jan 2025 00:00:00 GMT Moonshot V1 8K Vision Preview is a multimodal variant of Moonshot AI's V1 model that accepts both image and text inputs within an 8,000-token context window. It can interpret screenshots, charts, UI mockups, and photos, returning text-based analysis. This makes it useful for tasks like image captioning, visual Q&A, and lightweight document understanding where the source material includes visual elements. As a preview model, it may see changes before a stable release. The API follows the OpenAI-compatible content array format with image_url blocks, making integration straightforward for developers already using similar patterns. Moonshot AI Moonshot AI: Moonshot v1 32K Vision (Preview) https://developer.puter.com/ai/moonshotai/moonshot-v1-32k-vision-preview/ https://developer.puter.com/ai/moonshotai/moonshot-v1-32k-vision-preview/ Wed, 15 Jan 2025 00:00:00 GMT Moonshot V1 32K Vision Preview is a multimodal model from Moonshot AI that processes both images and text within a 32,000-token context window. It extends the base 32K model with the ability to interpret visual inputs — including screenshots, diagrams, charts, and scanned documents — and return text-based responses. This is useful for workflows that combine visual context with moderate-length text, such as analyzing annotated documents or explaining UI designs. As a preview release, the vision capabilities may evolve. The API accepts the standard OpenAI-compatible content array format for multimodal inputs. Moonshot AI Moonshot AI: Moonshot v1 128K Vision (Preview) https://developer.puter.com/ai/moonshotai/moonshot-v1-128k-vision-preview/ https://developer.puter.com/ai/moonshotai/moonshot-v1-128k-vision-preview/ Wed, 15 Jan 2025 00:00:00 GMT Moonshot V1 128K Vision Preview is Moonshot AI's largest-context multimodal model in the V1 series, supporting both image and text inputs within a 128,000-token context window. It combines the long-context strength of the 128K text model with visual understanding capabilities. This makes it well-suited for processing large multimodal documents — think lengthy reports with embedded charts, multi-page scanned PDFs, or extensive UI review sessions. As a preview model, vision features may be refined over time. The API uses the standard OpenAI-compatible format for multimodal content, making it a drop-in addition to existing workflows. Moonshot AI MiniMax: MiniMax-01 https://developer.puter.com/ai/minimax/minimax-01/ https://developer.puter.com/ai/minimax/minimax-01/ Wed, 15 Jan 2025 00:00:00 GMT MiniMax-01 is a 456B parameter foundation model (45.9B activated) using a hybrid Lightning Attention + MoE architecture, achieving top-tier performance on reasoning, math, and coding benchmarks. It supports up to 4 million tokens of context, making it especially strong for long-context tasks and AI agent applications. MiniMax Microsoft: Phi 4 https://developer.puter.com/ai/microsoft/phi-4/ https://developer.puter.com/ai/microsoft/phi-4/ Fri, 10 Jan 2025 00:00:00 GMT Phi-4 is a 14B parameter small language model from Microsoft that excels at complex reasoning tasks, especially mathematics, outperforming many larger models on math competition benchmarks while being efficient enough for edge deployment. Microsoft Sao10k: Llama 3.1 70B Hanami x1 https://developer.puter.com/ai/sao10k/l3.1-70b-hanami-x1/ https://developer.puter.com/ai/sao10k/l3.1-70b-hanami-x1/ Wed, 08 Jan 2025 00:00:00 GMT Llama 3.1 70B Hanami x1 is an experimental 70B model built on top of Euryale v2.2 by Sao10K, offering a different feel with enhanced creativity and logical reasoning. The creator considers it an improvement over both Euryale v2.1 and v2.2. Sao10k Vidu: Vidu 2.0 https://developer.puter.com/ai/vidu/vidu-2.0/ https://developer.puter.com/ai/vidu/vidu-2.0/ Tue, 07 Jan 2025 00:00:00 GMT Vidu 2.0 is an AI video generation model by ShengShu Technology that creates high-quality videos from text or images, supporting resolutions up to 1080p. It offers smoother motion, better frame consistency, and start/end frame control compared to its predecessor. The model is significantly faster and more affordable. Vidu Qwen: Qwen-MT Plus https://developer.puter.com/ai/qwen/qwen-mt-plus/ https://developer.puter.com/ai/qwen/qwen-mt-plus/ Wed, 01 Jan 2025 00:00:00 GMT Qwen-MT Plus is a specialized machine translation model from Alibaba's Qwen team, purpose-built for high-quality text translation across 92 languages covering over 95% of the world's population. Unlike general-purpose language models, Qwen-MT Plus is fine-tuned specifically for translation tasks, offering term intervention, domain prompting, and translation memory features that give developers fine-grained control over output. It supports translation between major languages including Chinese, English, Japanese, Korean, French, Spanish, German, Arabic, Thai, Indonesian, and Vietnamese. Best suited for developers building multilingual applications, content localization pipelines, or customer-facing translation features where accuracy, terminology consistency, and domain fidelity matter more than general conversational ability. Qwen Qwen: Qwen-MT Turbo https://developer.puter.com/ai/qwen/qwen-mt-turbo/ https://developer.puter.com/ai/qwen/qwen-mt-turbo/ Wed, 01 Jan 2025 00:00:00 GMT Qwen-MT Turbo is a fast, cost-effective machine translation model from Alibaba's Qwen team, designed for high-volume text translation across 92 languages. As the Turbo tier of the Qwen-MT family, it trades some of the output fidelity of Qwen-MT Plus for significantly lower cost and faster throughput — making it the practical choice for latency-sensitive or budget-constrained translation workflows. Like its sibling, it supports term intervention, domain prompting, and translation memory, giving developers control over terminology and style. Best suited for developers building high-volume localization pipelines, real-time translation features, or cost-sensitive multilingual applications where speed and price efficiency matter more than maximum output quality. Qwen DeepSeek: DeepSeek Chat https://developer.puter.com/ai/deepseek/deepseek-chat/ https://developer.puter.com/ai/deepseek/deepseek-chat/ Thu, 26 Dec 2024 00:00:00 GMT DeepSeek Chat is the general-purpose conversational alias that points to the latest DeepSeek V3 chat model, a 671B parameter Mixture-of-Experts LLM optimized for everyday conversations, coding assistance, and general tasks. It supports 128K context and provides fast, direct responses without explicit reasoning chains. DeepSeek OpenAI: OpenAI o3 Mini https://developer.puter.com/ai/openai/o3-mini/ https://developer.puter.com/ai/openai/o3-mini/ Fri, 20 Dec 2024 00:00:00 GMT OpenAI o3 Mini is a cost-efficient reasoning model specialized for STEM domains requiring precision and speed. It features three reasoning effort levels (low, medium, high) and supports function calling. OpenAI Sao10k: Llama 3.3 Euryale 70B https://developer.puter.com/ai/sao10k/l3.3-euryale-70b/ https://developer.puter.com/ai/sao10k/l3.3-euryale-70b/ Wed, 18 Dec 2024 00:00:00 GMT Llama 3.3 Euryale 70B v2.3 is the latest in Sao10K's Euryale series, built on Llama 3.3 Instruct with a 131K context window and 16K output limit. It's a direct successor to v2.2, trained without LoRA extraction for more robust creative roleplay and storywriting performance. Sao10k Kling: Kling 1.6 Standard https://developer.puter.com/ai/kwaivgi/kling-1.6-standard/ https://developer.puter.com/ai/kwaivgi/kling-1.6-standard/ Wed, 18 Dec 2024 00:00:00 GMT Kling 1.6 Standard is Kuaishou's accessible 720p AI video model released in December 2024, offering a 195% improvement over Kling 1.5 in image-to-video quality. It provides fast, consistent video generation with enhanced prompt adherence and natural motion, ideal for beginners and creators needing quick social media content. Kling Kling: Kling 1.6 Pro https://developer.puter.com/ai/kwaivgi/kling-1.6-pro/ https://developer.puter.com/ai/kwaivgi/kling-1.6-pro/ Wed, 18 Dec 2024 00:00:00 GMT Kling 1.6 Pro is Kuaishou's professional-tier 1080p video model featuring superior motion fluidity, enhanced character realism, and unique first-and-last frame conditioning for 5-second clips. It delivers videos with greater storytelling control, making it ideal for marketing videos and cinematic short-form content. Kling Google: Veo 2 https://developer.puter.com/ai/google/veo-2.0/ https://developer.puter.com/ai/google/veo-2.0/ Mon, 16 Dec 2024 00:00:00 GMT Google Veo 2 is Google DeepMind's video generation model that creates 5-second, 720p-4K resolution videos from text or image prompts with realistic physics simulation and cinematic quality. It excels at following complex instructions, simulating real-world physics, and supporting diverse visual styles without native audio generation. Google Cohere: Command R7B (12-2024) https://developer.puter.com/ai/cohere/command-r7b-12-2024/ https://developer.puter.com/ai/cohere/command-r7b-12-2024/ Sat, 14 Dec 2024 00:00:00 GMT Command R7B is Cohere's smallest and fastest model in the R series, with 7 billion parameters and a 128K token context window. Despite its compact size, it ranked first among similarly-sized open-weights models on the HuggingFace Open LLM Leaderboard, leading across IFEval, BBH, GPQA, MuSR, and MMLU. It supports native tool use, multi-step agentic workflows, and RAG across 23 languages, with particular strength in code tasks including SQL and code translation. For API developers, it's the best option when latency and cost are priorities and a full-scale model isn't required. Cohere Google: Gemini 2.0 Flash https://developer.puter.com/ai/google/gemini-2.0-flash/ https://developer.puter.com/ai/google/gemini-2.0-flash/ Wed, 11 Dec 2024 00:00:00 GMT Gemini 2.0 Flash is Google's fast multimodal model with native tool use, 1M token context window, and support for text, images, video, and audio input. It's optimized for agentic workflows with low latency and cost-efficient inference. Google Google: Gemini 2.0 Flash-Lite https://developer.puter.com/ai/google/gemini-2.0-flash-lite/ https://developer.puter.com/ai/google/gemini-2.0-flash-lite/ Wed, 11 Dec 2024 00:00:00 GMT Gemini 2.0 Flash-Lite is Google's most cost-efficient model, optimized for large-scale text output tasks. It offers simplified pricing and lower costs than Flash while maintaining solid performance for high-volume workloads. Google Meta Llama: Llama 3.3 70B Instruct https://developer.puter.com/ai/meta-llama/llama-3.3-70b-instruct/ https://developer.puter.com/ai/meta-llama/llama-3.3-70b-instruct/ Fri, 06 Dec 2024 00:00:00 GMT Llama 3.3 70B Instruct is Meta's refined 70 billion parameter multilingual model with improved instruction following and tool use capabilities. It supports 8 languages and offers enhanced reasoning performance over previous versions. Meta Llama xAI: Grok 2 Image https://developer.puter.com/ai/x-ai/grok-2-image/ https://developer.puter.com/ai/x-ai/grok-2-image/ Thu, 05 Dec 2024 00:00:00 GMT Grok 2 Image is xAI's flagship text-to-image generation model powered by their Aurora engine, producing photorealistic visuals from text prompts. It excels at rendering precise visual details, legible text, logos, and realistic human portraits. The model supports generating up to 10 image variations per request and handles diverse styles from photorealism to illustration. xAI OpenAI: OpenAI o1 https://developer.puter.com/ai/openai/o1/ https://developer.puter.com/ai/openai/o1/ Thu, 05 Dec 2024 00:00:00 GMT OpenAI o1 is a reasoning model that thinks before answering using chain-of-thought, excelling at complex science and mathematics tasks. It was the first in OpenAI's "o" series designed for step-by-step logical reasoning. OpenAI Amazon: Nova Lite 1.0 https://developer.puter.com/ai/amazon/nova-lite-v1/ https://developer.puter.com/ai/amazon/nova-lite-v1/ Thu, 05 Dec 2024 00:00:00 GMT Amazon Nova Lite is a very low-cost, lightning-fast multimodal model that processes text, images, and video inputs to generate text output. It supports up to 300K input tokens and can analyze multiple images or up to 30 minutes of video in a single request. Ideal for real-time customer interactions, document analysis, and visual question-answering tasks. Amazon Amazon: Nova Micro 1.0 https://developer.puter.com/ai/amazon/nova-micro-v1/ https://developer.puter.com/ai/amazon/nova-micro-v1/ Thu, 05 Dec 2024 00:00:00 GMT Amazon Nova Micro is a text-only model that delivers the lowest latency responses at the lowest cost in the Nova family. With a 128K token context window, it excels at text summarization, translation, content classification, interactive chat, and basic coding tasks. It's the fastest and most economical option when multimodal capabilities aren't needed. Amazon Amazon: Nova Pro 1.0 https://developer.puter.com/ai/amazon/nova-pro-v1/ https://developer.puter.com/ai/amazon/nova-pro-v1/ Thu, 05 Dec 2024 00:00:00 GMT Amazon Nova Pro is a highly capable multimodal model offering the best combination of accuracy, speed, and cost for a wide range of tasks. It supports up to 300K input tokens, excels at video summarization, financial document analysis, agentic workflows, and can process code bases with over 15,000 lines of code. It also serves as a teacher model for distilling custom variants of Nova Micro and Lite. Amazon Qwen: Qwen2.5-Omni 7B https://developer.puter.com/ai/qwen/qwen2-5-omni-7b/ https://developer.puter.com/ai/qwen/qwen2-5-omni-7b/ Sun, 01 Dec 2024 00:00:00 GMT Qwen2.5-Omni 7B is Alibaba's end-to-end omni-modal model capable of perceiving text, images, audio, and video simultaneously while generating text and natural speech in real time. Built on a Thinker-Talker architecture with TMRoPE (Time-aligned Multimodal RoPE) for synchronizing audio and video streams, the 7B model achieves strong benchmark results across all modalities. It ranked first on the MMAU audio understanding leaderboard, scored 59.2 on MMMU image reasoning (near GPT-4o-mini's 60.0), and achieved 64.3 on Video-MME for video understanding without subtitles. On OmniBench, which tests cross-modal integration, it reached 56.13%. The model supports tool/function calling and targets developers building voice assistants, video analysis tools, and multimodal pipelines that require a single model to handle diverse input types. Qwen OpenAI: GPT-4o 2024-11-20 https://developer.puter.com/ai/openai/gpt-4o-2024-11-20/ https://developer.puter.com/ai/openai/gpt-4o-2024-11-20/ Wed, 20 Nov 2024 00:00:00 GMT GPT-4o 2024-11-20 is a November 2024 snapshot of GPT-4o providing the latest improvements at that time. It's useful for applications requiring locked model behavior. OpenAI Mistral AI: Pixtral Large https://developer.puter.com/ai/mistralai/pixtral-large-2411/ https://developer.puter.com/ai/mistralai/pixtral-large-2411/ Tue, 19 Nov 2024 00:00:00 GMT Pixtral Large is a 124B parameter open-weights multimodal model built on Mistral Large 2, achieving frontier-level image understanding. It processes up to 30 high-resolution images per input with 128K context, excelling in document and chart analysis. Mistral AI Mistral AI: Mistral Large 2 (July 2024) https://developer.puter.com/ai/mistralai/mistral-large-2407/ https://developer.puter.com/ai/mistralai/mistral-large-2407/ Tue, 19 Nov 2024 00:00:00 GMT Mistral Large 2 (24.07) is a 123B parameter model with 128K context, significantly upgraded for long context understanding and function calling. It delivers top-tier performance for enterprise use cases including knowledge exploration and automation. Mistral AI Mistral AI: Mistral Large 2 (November 2024) https://developer.puter.com/ai/mistralai/mistral-large-2411/ https://developer.puter.com/ai/mistralai/mistral-large-2411/ Tue, 19 Nov 2024 00:00:00 GMT Mistral Large 2 (24.11) includes improvements in long context understanding, system prompts, and function calling accuracy. Released alongside Pixtral Large, it's optimized for RAG and agentic workflows in enterprise deployments. Mistral AI Qwen: Qwen2.5 Coder 32B Instruct https://developer.puter.com/ai/qwen/qwen-2.5-coder-32b-instruct/ https://developer.puter.com/ai/qwen/qwen-2.5-coder-32b-instruct/ Mon, 11 Nov 2024 00:00:00 GMT Qwen 2.5 Coder 32B Instruct is a code-specialized model matching GPT-4o's coding capabilities, supporting 40+ programming languages. It excels in code generation, repair, and reasoning with 128K context support. Qwen TheDrummer: UnslopNemo 12B https://developer.puter.com/ai/thedrummer/unslopnemo-12b/ https://developer.puter.com/ai/thedrummer/unslopnemo-12b/ Fri, 08 Nov 2024 00:00:00 GMT UnslopNemo 12B is a 12-billion parameter model where TheDrummer removed repetitive patterns ('slop') from roughly 90% of the roleplay training dataset to make outputs more expressive and varied. It's designed for adventure writing and roleplay scenarios with a 32K token context window. The model aims to generate more natural, less formulaic creative content compared to standard fine-tuned models. TheDrummer xAI: Grok Beta https://developer.puter.com/ai/x-ai/grok-beta/ https://developer.puter.com/ai/x-ai/grok-beta/ Fri, 01 Nov 2024 00:00:00 GMT Grok Beta was xAI's initial public API model released in late 2024, offering foundational chat and reasoning capabilities with 131K context window. It served as the enterprise API beta preview before being superseded by versioned Grok 2 models. xAI xAI: Grok Vision Beta https://developer.puter.com/ai/x-ai/grok-vision-beta/ https://developer.puter.com/ai/x-ai/grok-vision-beta/ Fri, 01 Nov 2024 00:00:00 GMT Grok Vision Beta was the initial vision-enabled API model from xAI, providing image understanding and multimodal capabilities for processing text alongside visual inputs. It was released alongside grok-beta for enterprise API testing with an 8K context window. xAI Qwen: Qwen-Turbo https://developer.puter.com/ai/qwen/qwen-turbo/ https://developer.puter.com/ai/qwen/qwen-turbo/ Fri, 01 Nov 2024 00:00:00 GMT Qwen Turbo is a fast, cost-effective API model with up to 1M context length, ideal for simple tasks requiring quick responses. It supports multiple languages and offers flexible tiered pricing. Qwen Qwen: Qwen-VL OCR https://developer.puter.com/ai/qwen/qwen-vl-ocr/ https://developer.puter.com/ai/qwen/qwen-vl-ocr/ Mon, 28 Oct 2024 00:00:00 GMT Qwen-VL OCR is Alibaba's specialized vision-language model purpose-built for text extraction and document parsing, derived from the Qwen-VL series. Unlike general-purpose VL models, it's optimized for OCR across scanned documents, tables, receipts, exam papers, forms, and handwritten content. It supports multilingual recognition including English, Chinese, French, German, Japanese, Korean, Russian, Italian, and Arabic. Capabilities include skewed image recognition, text localization with bounding box coordinates, table-to-HTML parsing, document-to-LaTeX conversion, and formula transcription. Built-in task modes return structured output as plain text, JSON, HTML, or LaTeX depending on the workflow. It's the right Qwen API choice for developers building document digitization, receipt parsing, or information extraction pipelines that need OCR-focused accuracy rather than general visual reasoning. Qwen Anthropic: Claude 3.5 Sonnet https://developer.puter.com/ai/anthropic/claude-3-5-sonnet/ https://developer.puter.com/ai/anthropic/claude-3-5-sonnet/ Tue, 22 Oct 2024 00:00:00 GMT Claude 3.5 Sonnet balances intelligence and speed, and was the first Claude model to introduce computer use capabilities (screen navigation, clicking, typing) in public beta. It offered performance close to Claude 3 Opus at one-fifth the cost. Anthropic Anthracite: Magnum v4 72B https://developer.puter.com/ai/anthracite-org/magnum-v4-72b/ https://developer.puter.com/ai/anthracite-org/magnum-v4-72b/ Tue, 22 Oct 2024 00:00:00 GMT Magnum v4 72B is a 72-billion parameter creative writing and conversational model developed by Anthracite, fine-tuned on top of Qwen2.5-72B-Instruct. It was specifically trained to replicate the prose quality of Claude 3 Sonnet and Opus, making it one of the most popular open-weight models for narrative generation. The model excels at creative writing, interactive storytelling, roleplay, and character-driven dialogue. It actively drives narratives forward while maintaining consistent character personas across extended conversations. Multi-language support covers English, French, German, Spanish, Chinese, Japanese, and more. Magnum v4 72B supports up to 32,768 tokens of context and uses the ChatML prompt format. It's a strong choice for developers building applications where engaging, human-like prose matters more than raw benchmark performance. Anthracite Mistral AI: Ministral 3B https://developer.puter.com/ai/mistralai/ministral-3b/ https://developer.puter.com/ai/mistralai/ministral-3b/ Wed, 16 Oct 2024 00:00:00 GMT Ministral 3B is a compact 3B parameter model optimized for edge deployment on phones, laptops, and IoT devices. It delivers robust multimodal capabilities in a small footprint, suitable for low-resource environments under Apache 2.0. Mistral AI Mistral AI: Ministral 8B https://developer.puter.com/ai/mistralai/ministral-8b/ https://developer.puter.com/ai/mistralai/ministral-8b/ Wed, 16 Oct 2024 00:00:00 GMT Ministral 8B is an 8B parameter model offering best-in-class text and vision capabilities for single-GPU operation. It provides an excellent balance of performance and efficiency for edge deployment and embedded applications. Mistral AI Inflection AI: Inflection 3 Pi https://developer.puter.com/ai/inflection/inflection-3-pi/ https://developer.puter.com/ai/inflection/inflection-3-pi/ Fri, 11 Oct 2024 00:00:00 GMT Inflection 3 Pi is a conversational AI model by Inflection AI, designed to power emotionally intelligent interactions. It's the model behind Inflection's Pi chatbot, built with a focus on empathy, safety, and natural dialogue rather than pure task completion. The model adapts to each user's tone and communication style, making it well suited for customer support chatbots, roleplay scenarios, and applications where warmth and conversational nuance matter. It also has access to recent news for topical awareness. Inflection 3 Pi offers an 8K context window with a max output of 1,024 tokens. It's a strong pick when your use case prioritizes user experience and conversational quality over structured output or complex reasoning. Inflection AI Inflection AI: Inflection 3 Productivity https://developer.puter.com/ai/inflection/inflection-3-productivity/ https://developer.puter.com/ai/inflection/inflection-3-productivity/ Fri, 11 Oct 2024 00:00:00 GMT Inflection 3 Productivity is an enterprise-focused AI model by Inflection AI, optimized for precise instruction-following and structured output generation. Released alongside Inflection 3 Pi as part of the Inflection 3.0 suite, it trades its sibling's emotional intelligence for accuracy and compliance. The model is particularly suited for generating JSON, technical documentation, automated reports, and data extraction from unstructured text. It also has access to recent news. These strengths make it a fit for business automation and workflow integration where consistent, format-adherent output is critical. It shares the same 8K context window and 1,024-token max output as Inflection 3 Pi. Consider this model when your application demands reliable structured outputs and strict adherence to formatting guidelines. Inflection AI OpenAI: GPT-4o Extended https://developer.puter.com/ai/openai/gpt-4o:extended/ https://developer.puter.com/ai/openai/gpt-4o:extended/ Wed, 02 Oct 2024 00:00:00 GMT GPT-4o Extended is a variant of GPT-4o with extended capabilities or context for specific use cases. It provides enhanced features beyond the standard GPT-4o model. OpenAI RunDiffusion: Juggernaut Lightning Flux https://developer.puter.com/ai/rundiffusion/juggernaut-lightning-flux/ https://developer.puter.com/ai/rundiffusion/juggernaut-lightning-flux/ Tue, 01 Oct 2024 00:00:00 GMT RunDiffusion RunDiffusion: Juggernaut Pro Flux https://developer.puter.com/ai/rundiffusion/juggernaut-pro-flux/ https://developer.puter.com/ai/rundiffusion/juggernaut-pro-flux/ Tue, 01 Oct 2024 00:00:00 GMT RunDiffusion NVIDIA: Llama 3.1 Nemotron 70B Instruct https://developer.puter.com/ai/nvidia/llama-3.1-nemotron-70b-instruct/ https://developer.puter.com/ai/nvidia/llama-3.1-nemotron-70b-instruct/ Tue, 01 Oct 2024 00:00:00 GMT Llama 3.1 Nemotron 70B Instruct is a 70B parameter LLM customized by NVIDIA using RLHF to improve response helpfulness, achieving top rankings on alignment benchmarks like Arena Hard and AlpacaEval 2 LC. It supports a 128K token context and is optimized for conversational AI and instruction-following tasks. NVIDIA Black Forest Labs: FLUX1.1 [pro] https://developer.puter.com/ai/black-forest-labs/flux-1.1-pro/ https://developer.puter.com/ai/black-forest-labs/flux-1.1-pro/ Tue, 01 Oct 2024 00:00:00 GMT FLUX 1.1 Pro is an improved flagship model released October 2024, offering better quality and efficiency than the original FLUX.1 Pro. It added Ultra mode for 4x higher resolution (up to 4MP) and Raw mode for hyper-realistic candid photography-style images. Generation time is approximately 10 seconds per sample. Black Forest Labs TheDrummer: Rocinante 12B https://developer.puter.com/ai/thedrummer/rocinante-12b/ https://developer.puter.com/ai/thedrummer/rocinante-12b/ Mon, 30 Sep 2024 00:00:00 GMT Rocinante 12B is a 12-billion parameter creative writing model built on the Mistral architecture, designed for adventure-filled storytelling, roleplay, and imaginative text generation. Named after Don Quixote's horse, it produces rich, distinct prose with enhanced vocabulary and supports multiple chat templates including ChatML, Alpaca, and Mistral. The model offers a good balance between creative capability and computational efficiency for local deployment. TheDrummer Meta Llama: Llama 3.2 11B Vision Instruct https://developer.puter.com/ai/meta-llama/llama-3.2-11b-vision-instruct/ https://developer.puter.com/ai/meta-llama/llama-3.2-11b-vision-instruct/ Wed, 25 Sep 2024 00:00:00 GMT Llama 3.2 11B Vision Instruct is Meta's multimodal model that processes both text and images with 11 billion parameters. It excels at visual recognition, image reasoning, captioning, and answering questions about images. Meta Llama Meta Llama: Llama 3.2 1B Instruct https://developer.puter.com/ai/meta-llama/llama-3.2-1b-instruct/ https://developer.puter.com/ai/meta-llama/llama-3.2-1b-instruct/ Wed, 25 Sep 2024 00:00:00 GMT Llama 3.2 1B Instruct is Meta's ultra-lightweight 1 billion parameter model designed for edge and mobile devices. It supports 128K context and handles summarization, instruction following, and rewriting tasks locally. Meta Llama Meta Llama: Llama 3.2 3B Instruct https://developer.puter.com/ai/meta-llama/llama-3.2-3b-instruct/ https://developer.puter.com/ai/meta-llama/llama-3.2-3b-instruct/ Wed, 25 Sep 2024 00:00:00 GMT Llama 3.2 3B Instruct is a compact 3 billion parameter model optimized for on-device use cases with 128K context support. It outperforms comparable models on instruction following, summarization, and tool-use tasks. Meta Llama Qwen: Qwen2.5 Coder 7B Instruct https://developer.puter.com/ai/qwen/qwen2.5-coder-7b-instruct/ https://developer.puter.com/ai/qwen/qwen2.5-coder-7b-instruct/ Thu, 19 Sep 2024 00:00:00 GMT Qwen 2.5 Coder 7B Instruct is a compact code-specialized model with strong code generation, reasoning, and repair capabilities. It supports multiple programming languages while being deployable on consumer hardware. Qwen Mistral AI: Pixtral 12B https://developer.puter.com/ai/mistralai/pixtral-12b/ https://developer.puter.com/ai/mistralai/pixtral-12b/ Tue, 17 Sep 2024 00:00:00 GMT Pixtral 12B is Mistral's first multimodal model with 12B text decoder + 400M vision encoder under Apache 2.0. It processes images at native resolution with 128K context, excelling in document QA and visual reasoning without compromising text performance. Mistral AI OpenAI: OpenAI o1 Mini (Deprecated) https://developer.puter.com/ai/openai/o1-mini/ https://developer.puter.com/ai/openai/o1-mini/ Thu, 12 Sep 2024 00:00:00 GMT OpenAI o1 Mini was a faster, more affordable reasoning model alternative to o1, now deprecated in favor of o3-mini. It provided STEM-focused reasoning at lower cost and latency. OpenAI Raifle: SorcererLM 8x22B https://developer.puter.com/ai/raifle/sorcererlm-8x22b/ https://developer.puter.com/ai/raifle/sorcererlm-8x22b/ Mon, 09 Sep 2024 00:00:00 GMT SorcererLM 8x22B is a creative fiction and roleplay model by Raifle, built as a 16-bit LoRA fine-tune on top of Microsoft's WizardLM-2 8x22B (Mixtral-based mixture-of-experts architecture). It targets narrative storytelling and interactive roleplay, offering enhanced vocabulary, vivid prose with spatial and contextual awareness, and stronger emotional intelligence compared to its base model. The fine-tune was specifically designed to improve writing style and literary quality while retaining the underlying reasoning capabilities of WizardLM-2. SorcererLM supports a 16K context window. It's a strong pick for developers building interactive fiction apps, character-driven chatbots, or creative writing tools where prose quality and immersive narrative depth matter more than factual or analytical tasks. Raifle Qwen: Qwen2.5 72B Instruct https://developer.puter.com/ai/qwen/qwen2-5-72b-instruct/ https://developer.puter.com/ai/qwen/qwen2-5-72b-instruct/ Sun, 01 Sep 2024 00:00:00 GMT Qwen 2.5 72B Instruct is Alibaba's flagship open-source language model with 72 billion parameters, trained on 18 trillion tokens with 128K context support. It excels in coding, math, instruction following, and multilingual tasks across 29+ languages. Qwen Qwen: Qwen2.5 7B Instruct https://developer.puter.com/ai/qwen/qwen2-5-7b-instruct/ https://developer.puter.com/ai/qwen/qwen2-5-7b-instruct/ Sun, 01 Sep 2024 00:00:00 GMT Qwen 2.5 7B Instruct is a compact yet capable language model offering strong performance in coding, math, and general tasks. It supports 128K context length and 29+ languages while being efficient enough for smaller deployments. Qwen Qwen: Qwen2.5-VL 7B Instruct https://developer.puter.com/ai/qwen/qwen2-5-vl-7b-instruct/ https://developer.puter.com/ai/qwen/qwen2-5-vl-7b-instruct/ Sun, 01 Sep 2024 00:00:00 GMT Qwen 2.5 VL 7B Instruct is a vision-language model capable of understanding images, documents, charts, and videos up to 1 hour. It supports OCR, visual reasoning, and can act as a visual agent for computer/phone use. Qwen Qwen: Qwen2.5 14B Instruct https://developer.puter.com/ai/qwen/qwen2-5-14b-instruct/ https://developer.puter.com/ai/qwen/qwen2-5-14b-instruct/ Sun, 01 Sep 2024 00:00:00 GMT Qwen2.5 14B Instruct is a 14.7-billion-parameter open-weight model from Alibaba's Qwen team, trained on 18 trillion tokens and released under Apache 2.0. It hits a practical sweet spot in the Qwen2.5 lineup — outperforming both the 7B variant and models like Gemma 2 27B and GPT-4o mini on seven key benchmarks, while remaining far more efficient than the flagship 72B. Core strengths include strong instruction following, structured output (JSON) generation, math, and code. It reaches ~97% tool-call success across hardware, making it reliable for agentic workflows. Multilingual support spans 29+ languages with a 128K context window and up to 8K output tokens. A strong choice for developers who need GPT-4o-mini-class quality at a fraction of the cost of larger frontier models. Qwen Qwen: Qwen2.5 32B Instruct https://developer.puter.com/ai/qwen/qwen2-5-32b-instruct/ https://developer.puter.com/ai/qwen/qwen2-5-32b-instruct/ Sun, 01 Sep 2024 00:00:00 GMT Qwen2.5 32B Instruct is a general-purpose language model from Alibaba's Qwen team, sitting at the practical sweet spot between the 14B and 72B variants in the Qwen2.5 series — delivering stronger reasoning and language understanding than the 14B while remaining far more cost-efficient than the 72B. Trained on 18 trillion tokens, the model scores 57.7 on MATH and outperforms Qwen2-72B on comprehensive evaluations despite having fewer parameters. It excels at instruction following, multi-step reasoning, mathematics, coding assistance, and multilingual tasks across 29+ languages, with a 131K token context window and full tool-call support. A well-rounded choice for developers who need reliable general-purpose performance — complex enough for demanding workflows, light enough to keep inference costs manageable. Qwen Qwen: Qwen2.5-VL 72B Instruct https://developer.puter.com/ai/qwen/qwen2-5-vl-72b-instruct/ https://developer.puter.com/ai/qwen/qwen2-5-vl-72b-instruct/ Sun, 01 Sep 2024 00:00:00 GMT Qwen2.5-VL 72B Instruct is Alibaba's flagship open-source vision-language model, matching state-of-the-art closed models like GPT-4o and Claude 3.5 Sonnet on multimodal tasks. The model excels at document understanding (96.4 on DocVQA), OCR (88.8 on OCRBench), and structured data extraction from invoices, forms, tables, and charts. On MMMU it scores 70.2, and across 21 benchmarks it outperforms Gemini 2.0 Flash, GPT-4o, and Claude 3.5 Sonnet on 13 of them. Video understanding extends to over one hour of footage with second-level event pinpointing, enabled by dynamic FPS sampling and absolute time encoding. The model also functions as a visual agent capable of computer and phone use. A strong choice for developers building document pipelines, OCR workflows, visual Q&A systems, or multimodal agents. Qwen Cohere: Command R (08-2024) https://developer.puter.com/ai/cohere/command-r-08-2024/ https://developer.puter.com/ai/cohere/command-r-08-2024/ Fri, 30 Aug 2024 00:00:00 GMT Command R 08-2024 is a 32-billion-parameter generative language model from Cohere, optimized for complex reasoning, retrieval-augmented generation, multilingual tasks, and tool use across a 128K token context window. Compared to its predecessor, this version delivers approximately 50% higher throughput and 20% lower latency while showing competitive performance on math, code, and reasoning tasks. It supports 23 languages. For API developers, it is a practical mid-tier option that balances capability and cost — well-suited for question answering, summarization, and RAG-based applications. Cohere Cohere: Command R+ (08-2024) https://developer.puter.com/ai/cohere/command-r-plus-08-2024/ https://developer.puter.com/ai/cohere/command-r-plus-08-2024/ Fri, 30 Aug 2024 00:00:00 GMT Command R+ 08-2024 is Cohere's 104-billion-parameter enterprise-grade language model, updated in August 2024 with enhanced multi-step tool use, improved instruction following, and stronger structured data analysis. Benchmark scores include 80 on MMLU, 50 on HumanEval, and 88 on GSM8K. On public tool-use benchmarks, the Command R+ line has outperformed GPT-4-Turbo. It supports a 128K context window and 23 languages. Developers building complex pipelines that require reliable tool orchestration and citation-quality RAG will find it a strong fit for demanding agentic and enterprise use cases. Cohere Sao10k: Llama 3.1 Euryale 70B v2.2 https://developer.puter.com/ai/sao10k/l3.1-euryale-70b/ https://developer.puter.com/ai/sao10k/l3.1-euryale-70b/ Wed, 28 Aug 2024 00:00:00 GMT Llama 3.1 Euryale 70B v2.2 is Sao10K's creative roleplay model built on Meta's Llama 3.1 architecture with improved multi-turn coherency, system prompt handling, and reasoning capabilities. It features a 32K context window and excels at immersive storytelling with strong prompt adherence. Sao10k Moonshot AI: Moonshot v1 Auto https://developer.puter.com/ai/moonshotai/moonshot-v1-auto/ https://developer.puter.com/ai/moonshotai/moonshot-v1-auto/ Wed, 28 Aug 2024 00:00:00 GMT Moonshot V1 Auto is a smart routing layer from Moonshot AI that automatically selects the most cost-efficient context window — 8K, 32K, or 128K — based on the token count of each request. It uses the same underlying Moonshot V1 model as the fixed-context variants, so there is no difference in output quality. The routing simply ensures you're billed at the lowest applicable tier for each call, eliminating the need to manually choose a context size or overpay for unused capacity. Usage is identical to the other Moonshot V1 models — just set the model ID to `moonshot-v1-auto` and the platform handles the rest. Ideal for applications with variable-length inputs. Moonshot AI xAI: Grok 2 Vision https://developer.puter.com/ai/x-ai/grok-2-vision/ https://developer.puter.com/ai/x-ai/grok-2-vision/ Tue, 20 Aug 2024 00:00:00 GMT Grok 2 Vision is a multimodal AI model that combines text and visual understanding capabilities, excelling at object recognition, visual math reasoning (MathVista), and document-based question answering (DocVQA). It supports image analysis with a 32K context window. xAI xAI: Grok 2 Vision 1212 https://developer.puter.com/ai/x-ai/grok-2-vision-1212/ https://developer.puter.com/ai/x-ai/grok-2-vision-1212/ Tue, 20 Aug 2024 00:00:00 GMT Grok 2 Vision 1212 is xAI's updated multimodal vision model released December 2024, featuring improved accuracy, instruction-following, and multilingual capabilities over the original Grok 2 Vision. It combines advanced visual comprehension with text understanding, excelling at object recognition, style analysis, and document-based question answering with a 32K context window. xAI Nous Research: Hermes 3 70B Instruct https://developer.puter.com/ai/nousresearch/hermes-3-llama-3.1-70b/ https://developer.puter.com/ai/nousresearch/hermes-3-llama-3.1-70b/ Sun, 18 Aug 2024 00:00:00 GMT Hermes 3 Llama 3.1 70B is a 70B parameter fine-tune of Llama-3.1-70B offering advanced agentic capabilities, improved roleplaying, reasoning, and multi-turn conversation. It provides reliable function calling and structured outputs while being competitive with Llama-3.1 Instruct models at a more accessible size. Nous Research Nous Research: Hermes 3 405B Instruct https://developer.puter.com/ai/nousresearch/hermes-3-llama-3.1-405b/ https://developer.puter.com/ai/nousresearch/hermes-3-llama-3.1-405b/ Fri, 16 Aug 2024 00:00:00 GMT Hermes 3 Llama 3.1 405B is a frontier-level 405B parameter full fine-tune of Llama-3.1-405B, focused on user alignment with powerful steering capabilities. It features advanced agentic capabilities, roleplaying, reasoning, multi-turn conversation, and improved code generation, competitive with or superior to Llama-3.1 Instruct models. Nous Research xAI: Grok 2 https://developer.puter.com/ai/x-ai/grok-2/ https://developer.puter.com/ai/x-ai/grok-2/ Tue, 13 Aug 2024 00:00:00 GMT Grok 2 is xAI's frontier language model released in August 2024, featuring advanced capabilities in chat, coding, and reasoning with competitive performance against GPT-4 and Claude 3.5 Sonnet. It integrates real-time information from the X platform and offers improved reasoning over Grok 1.5. xAI Sao10k: Llama 3 8B Lunaris https://developer.puter.com/ai/sao10k/l3-lunaris-8b/ https://developer.puter.com/ai/sao10k/l3-lunaris-8b/ Tue, 13 Aug 2024 00:00:00 GMT Llama 3 8B Lunaris is a versatile 8B parameter generalist and roleplaying model created by merging five different Llama 3-based models. It balances creativity with improved logical reasoning and general knowledge, serving as an evolution of Stheno v3.2. Sao10k OpenAI: ChatGPT-4o Latest https://developer.puter.com/ai/openai/chatgpt-4o-latest/ https://developer.puter.com/ai/openai/chatgpt-4o-latest/ Thu, 08 Aug 2024 00:00:00 GMT ChatGPT-4o Latest is the GPT-4o model variant used in ChatGPT, not recommended for API use. It's optimized for ChatGPT's conversational interface rather than developer applications. OpenAI OpenAI: GPT-4o 2024-08-06 https://developer.puter.com/ai/openai/gpt-4o-2024-08-06/ https://developer.puter.com/ai/openai/gpt-4o-2024-08-06/ Tue, 06 Aug 2024 00:00:00 GMT GPT-4o 2024-08-06 is an August 2024 snapshot of GPT-4o with improvements and Structured Outputs support. It offers enhanced reliability for applications needing specific version behavior. OpenAI Black Forest Labs: FLUX.1 [dev] https://developer.puter.com/ai/black-forest-labs/flux.1-dev/ https://developer.puter.com/ai/black-forest-labs/flux.1-dev/ Thu, 01 Aug 2024 00:00:00 GMT FLUX.1 Dev is a 12B parameter open-weight text-to-image model released under a non-commercial license. It offers quality comparable to DALL-E 3 and Midjourney 6 in prompt fidelity and photorealism, and is the most popular open image model globally. It's designed for developers and researchers to run on consumer hardware. Black Forest Labs Black Forest Labs: FLUX.1 [dev] LoRA https://developer.puter.com/ai/black-forest-labs/flux.1-dev-lora/ https://developer.puter.com/ai/black-forest-labs/flux.1-dev-lora/ Thu, 01 Aug 2024 00:00:00 GMT FLUX.1 Dev LoRA is a fine-tuning adapter layer built on top of FLUX.1 Dev, enabling customization of image generation for specific styles, subjects, or concepts. It allows developers to train lightweight adaptations without retraining the full model. Black Forest Labs Black Forest Labs: FLUX.1 Krea [dev] https://developer.puter.com/ai/black-forest-labs/flux.1-krea-dev/ https://developer.puter.com/ai/black-forest-labs/flux.1-krea-dev/ Thu, 01 Aug 2024 00:00:00 GMT FLUX.1 Krea Dev is an open-weight text-to-image model developed in collaboration with Krea AI, trained to achieve better photorealism and more varied aesthetics than standard FLUX.1 models. It overcomes the oversaturated 'AI look' common in other generators. Black Forest Labs Black Forest Labs: FLUX.1 [pro] https://developer.puter.com/ai/black-forest-labs/flux.1-pro/ https://developer.puter.com/ai/black-forest-labs/flux.1-pro/ Thu, 01 Aug 2024 00:00:00 GMT FLUX.1 Pro is Black Forest Labs' original flagship proprietary text-to-image model, offering high prompt fidelity and photorealistic output comparable to Midjourney 6. It supports fine-tuning via the FLUX Pro Finetuning API for enterprise customization. Black Forest Labs Black Forest Labs: FLUX.1 [schnell] https://developer.puter.com/ai/black-forest-labs/flux-schnell/ https://developer.puter.com/ai/black-forest-labs/flux-schnell/ Thu, 01 Aug 2024 00:00:00 GMT FLUX.1 Schnell (German for 'fast') is the speed-optimized variant of FLUX.1, designed for rapid image generation with lower latency at the cost of some quality. It is the most permissively licensed model in the FLUX.1 family, released under Apache 2.0. Ideal for real-time applications and high-throughput workflows. Black Forest Labs Mistral AI: Mistral Nemo 12B https://developer.puter.com/ai/mistralai/open-mistral-nemo-2407/ https://developer.puter.com/ai/mistralai/open-mistral-nemo-2407/ Thu, 25 Jul 2024 00:00:00 GMT Mistral Nemo 12B is a 12B parameter model developed in collaboration with NVIDIA, released under Apache 2.0 with a 128K context window. It uses the Tekken tokenizer trained on 100+ languages, which compresses source code and multilingual text ~30% more efficiently than previous Mistral tokenizers. Mistral Nemo 12B is state-of-the-art in its size category for reasoning, world knowledge, and coding, significantly outperforming Mistral 7B on instruction following, multi-turn conversations, and code generation. Benchmark scores include 68.0% on MMLU (5-shot), 83.5% on HellaSwag, and 76.8% on Winogrande. It supports function calling and is an ideal drop-in replacement for Mistral 7B where stronger multilingual and reasoning capabilities are needed. Mistral AI NeverSleep: Lumimaid v0.2 8B https://developer.puter.com/ai/neversleep/llama-3.1-lumimaid-8b/ https://developer.puter.com/ai/neversleep/llama-3.1-lumimaid-8b/ Wed, 24 Jul 2024 00:00:00 GMT Lumimaid v0.2 8B is a roleplay and creative writing model created by NeverSleep (IkariDev and Undi), fine-tuned on Meta's Llama 3.1 8B Instruct. It offers a 32,768-token context window. Version 0.2 represents a significant dataset overhaul from v0.1, with aggressive cleanup of low-quality and repetitive outputs. Roughly 40% of its training data is non-roleplay, giving it solid general conversational ability alongside its creative strengths. The model is best suited for interactive fiction, character-driven dialogue, and long-form creative text generation. Its 8B parameter size keeps inference costs low while delivering expressive, stylistically consistent output. A good pick for developers building chat-based storytelling or companion apps who need a capable small model with a generous context window. NeverSleep Meta Llama: Llama 3.1 405B (base) https://developer.puter.com/ai/meta-llama/llama-3.1-405b/ https://developer.puter.com/ai/meta-llama/llama-3.1-405b/ Tue, 23 Jul 2024 00:00:00 GMT Llama 3.1 405B is Meta's flagship open-source large language model with 405 billion parameters, supporting 128K context length and 8 languages. It offers capabilities comparable to leading closed models for advanced reasoning, coding, and multilingual tasks. Meta Llama Meta Llama: Llama 3.1 405B Instruct https://developer.puter.com/ai/meta-llama/llama-3.1-405b-instruct/ https://developer.puter.com/ai/meta-llama/llama-3.1-405b-instruct/ Tue, 23 Jul 2024 00:00:00 GMT Llama 3.1 405B Instruct is the instruction-tuned version of Meta's largest open model, optimized for multilingual dialogue, tool use, and complex reasoning. It supports 8 languages with 128K context and serves as a foundation for enterprise-level AI applications. Meta Llama Meta Llama: Llama 3.1 70B Instruct https://developer.puter.com/ai/meta-llama/llama-3.1-70b-instruct/ https://developer.puter.com/ai/meta-llama/llama-3.1-70b-instruct/ Tue, 23 Jul 2024 00:00:00 GMT Llama 3.1 70B Instruct is a multilingual 70 billion parameter model with 128K context length, optimized for dialogue, tool use, and coding tasks. It balances strong performance with resource efficiency across 8 supported languages. Meta Llama Meta Llama: Llama 3.1 8B Instruct https://developer.puter.com/ai/meta-llama/llama-3.1-8b-instruct/ https://developer.puter.com/ai/meta-llama/llama-3.1-8b-instruct/ Tue, 23 Jul 2024 00:00:00 GMT Llama 3.1 8B Instruct is Meta's efficient 8 billion parameter multilingual model supporting 128K context and 8 languages. It's ideal for resource-constrained deployments requiring summarization, classification, and translation capabilities. Meta Llama OpenAI: GPT-4o Mini https://developer.puter.com/ai/openai/gpt-4o-mini/ https://developer.puter.com/ai/openai/gpt-4o-mini/ Thu, 18 Jul 2024 00:00:00 GMT GPT-4o Mini is a fast, affordable small model that scores 82% on MMLU and accepts text and image inputs. It's over 60% cheaper than GPT-3.5 Turbo while offering superior reasoning and coding capabilities. OpenAI OpenAI: GPT-4o Mini 2024-07-18 https://developer.puter.com/ai/openai/gpt-4o-mini-2024-07-18/ https://developer.puter.com/ai/openai/gpt-4o-mini-2024-07-18/ Thu, 18 Jul 2024 00:00:00 GMT GPT-4o Mini 2024-07-18 is the initial release snapshot of GPT-4o Mini from July 2024. It provides version-locked behavior for consistent performance in production applications. OpenAI Mistral AI: Mistral Nemo https://developer.puter.com/ai/mistralai/mistral-nemo/ https://developer.puter.com/ai/mistralai/mistral-nemo/ Thu, 18 Jul 2024 00:00:00 GMT Mistral Nemo is a 12B parameter model developed with NVIDIA featuring 128K context and the Tekken tokenizer. It's state-of-the-art in its class for reasoning, world knowledge, and coding in 11+ languages under Apache 2.0. Mistral AI Mistral AI: Mistral Nemo https://developer.puter.com/ai/mistralai/open-mistral-nemo/ https://developer.puter.com/ai/mistralai/open-mistral-nemo/ Mon, 01 Jul 2024 00:00:00 GMT Mistral Nemo is a 12B parameter model built with NVIDIA featuring 128K context and the Tekken tokenizer trained on 100+ languages. It excels in multilingual tasks, coding, and reasoning, serving as a drop-in replacement for Mistral 7B. Mistral AI Google: Gemma 2 27B https://developer.puter.com/ai/google/gemma-2-27b-it/ https://developer.puter.com/ai/google/gemma-2-27b-it/ Thu, 27 Jun 2024 00:00:00 GMT Gemma 2 27B Instruct is Google's open-weight instruction-tuned language model with 27 billion parameters, trained on 13 trillion tokens. It offers competitive performance with models twice its size and runs on a single high-end GPU. Google Google: Gemma 2 9B https://developer.puter.com/ai/google/gemma-2-9b-it/ https://developer.puter.com/ai/google/gemma-2-9b-it/ Thu, 27 Jun 2024 00:00:00 GMT Gemma 2 9B Instruct is Google's efficient open-weight language model with 9 billion parameters, trained using knowledge distillation from the 27B model. It delivers strong performance for text generation while running on consumer hardware. Google Sao10k: Llama 3 Euryale 70B v2.1 https://developer.puter.com/ai/sao10k/l3-euryale-70b/ https://developer.puter.com/ai/sao10k/l3-euryale-70b/ Tue, 18 Jun 2024 00:00:00 GMT Llama 3 Euryale 70B v2.1 is a 70-billion parameter model by Sao10K focused on creative roleplay and storytelling, featuring strong prompt adherence, spatial awareness, and non-restrictive creative writing capabilities. It adapts well to custom formatting and produces highly varied, creative outputs. Sao10k Stability AI: Stable Diffusion 3 Medium https://developer.puter.com/ai/stabilityai/stable-diffusion-3-medium/ https://developer.puter.com/ai/stabilityai/stable-diffusion-3-medium/ Wed, 12 Jun 2024 00:00:00 GMT Stable Diffusion 3 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model by Stability AI featuring improved image quality, typography, and complex prompt understanding. It uses three pretrained text encoders and was trained on over 1 billion images. The model is optimized for resource efficiency, making it suitable for both consumer hardware and enterprise GPUs. Stability AI Nous Research: Hermes 2 Pro - Llama-3 8B https://developer.puter.com/ai/nousresearch/hermes-2-pro-llama-3-8b/ https://developer.puter.com/ai/nousresearch/hermes-2-pro-llama-3-8b/ Mon, 27 May 2024 00:00:00 GMT Hermes 2 Pro Llama 3 8B is an 8B parameter model fine-tuned on Meta's Llama 3, optimized for function calling (90% accuracy) and structured JSON outputs (84% accuracy). It features dedicated tool-call parsing tokens for agentic capabilities and outperforms Llama-3 8B Instruct on AGIEval, TruthfulQA, and BigBench benchmarks. Nous Research Mistral AI: Mistral 7B Instruct v0.3 https://developer.puter.com/ai/mistralai/mistral-7b-instruct-v0.3/ https://developer.puter.com/ai/mistralai/mistral-7b-instruct-v0.3/ Wed, 22 May 2024 00:00:00 GMT Mistral 7B Instruct v0.3 features an extended vocabulary with v3 Tokenizer and function calling support. It enhances language understanding and generation while maintaining the efficient 7B parameter architecture under Apache 2.0. Mistral AI OpenAI: GPT-4o https://developer.puter.com/ai/openai/gpt-4o/ https://developer.puter.com/ai/openai/gpt-4o/ Mon, 13 May 2024 00:00:00 GMT GPT-4o ("omni") is OpenAI's multimodal model capable of processing text, audio, images, and video inputs while generating text and images. It offers 4x faster responses than GPT-4 with superior non-English language and vision performance. OpenAI OpenAI: GPT-4o 2024-05-13 https://developer.puter.com/ai/openai/gpt-4o-2024-05-13/ https://developer.puter.com/ai/openai/gpt-4o-2024-05-13/ Mon, 13 May 2024 00:00:00 GMT GPT-4o 2024-05-13 is the initial release snapshot of GPT-4o from May 2024. It provides version-locked behavior for applications requiring consistent model performance. OpenAI Meta Llama: Llama 3 70B Instruct https://developer.puter.com/ai/meta-llama/llama-3-70b-instruct/ https://developer.puter.com/ai/meta-llama/llama-3-70b-instruct/ Thu, 18 Apr 2024 00:00:00 GMT Llama 3 70B Instruct is a 70 billion parameter instruction-tuned language model from Meta, optimized for dialogue and assistant-like chat in English. It uses an optimized transformer architecture with grouped-query attention and was trained on over 15 trillion tokens. Meta Llama Meta Llama: Llama 3 8B Instruct https://developer.puter.com/ai/meta-llama/llama-3-8b-instruct/ https://developer.puter.com/ai/meta-llama/llama-3-8b-instruct/ Thu, 18 Apr 2024 00:00:00 GMT Llama 3 8B Instruct is Meta's compact 8 billion parameter instruction-tuned model for dialogue use cases in English. It offers strong performance on common benchmarks while being more efficient to deploy than its larger sibling. Meta Llama Meta Llama: LlamaGuard 2 8B https://developer.puter.com/ai/meta-llama/llama-guard-2-8b/ https://developer.puter.com/ai/meta-llama/llama-guard-2-8b/ Thu, 18 Apr 2024 00:00:00 GMT Llama Guard 2 8B is Meta's 8 billion parameter safety classifier built on Llama 3, designed to moderate both user prompts and AI responses. It classifies content across 11 hazard categories based on the MLCommons taxonomy. Meta Llama Mistral AI: Mixtral 8x22B Instruct https://developer.puter.com/ai/mistralai/mixtral-8x22b-instruct/ https://developer.puter.com/ai/mistralai/mixtral-8x22b-instruct/ Wed, 17 Apr 2024 00:00:00 GMT Mixtral 8x22B is a sparse MoE model with 141B total / 39B active parameters, 64K context, and native function calling. It outperforms Llama 2 70B and matches GPT-3.5 while being cost-efficient under Apache 2.0. Mistral AI Microsoft: WizardLM-2 8x22B https://developer.puter.com/ai/microsoft/wizardlm-2-8x22b/ https://developer.puter.com/ai/microsoft/wizardlm-2-8x22b/ Tue, 16 Apr 2024 00:00:00 GMT WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model, a Mixture of Experts LLM fine-tuned from Mixtral 8x22B that delivers near-GPT-4 performance on complex chat, multilingual, reasoning, and coding tasks while remaining open-source. Microsoft OpenAI: GPT-4 Turbo https://developer.puter.com/ai/openai/gpt-4-turbo/ https://developer.puter.com/ai/openai/gpt-4-turbo/ Tue, 09 Apr 2024 00:00:00 GMT GPT-4 Turbo is an older high-intelligence model designed as a cheaper, faster version of GPT-4 with a 128K context window. OpenAI now recommends using newer models like GPT-4o instead. OpenAI Qwen: Qwen VL Max https://developer.puter.com/ai/qwen/qwen-vl-max/ https://developer.puter.com/ai/qwen/qwen-vl-max/ Mon, 08 Apr 2024 00:00:00 GMT Qwen VL Max is Alibaba's most capable vision-language API model based on Qwen2.5-VL, offering superior image/video understanding, OCR, document analysis, and visual reasoning capabilities. Qwen Qwen: Qwen-Max https://developer.puter.com/ai/qwen/qwen-max/ https://developer.puter.com/ai/qwen/qwen-max/ Wed, 03 Apr 2024 00:00:00 GMT Qwen Max is Alibaba's most powerful proprietary API model, a large-scale MoE with hundreds of billions of parameters. It delivers top-tier performance in reasoning, coding, math, and multilingual tasks via Alibaba Cloud Model Studio. Qwen Anthropic: Claude 3 Haiku https://developer.puter.com/ai/anthropic/claude-3-haiku/ https://developer.puter.com/ai/anthropic/claude-3-haiku/ Wed, 13 Mar 2024 00:00:00 GMT Claude 3 Haiku is the fastest and most compact model from the Claude 3 family. It's optimized for near-instant responses and cost-efficiency, ideal for real-time chatbots, content moderation, and high-volume tasks. Anthropic Mistral AI: Mistral Large https://developer.puter.com/ai/mistralai/mistral-large/ https://developer.puter.com/ai/mistralai/mistral-large/ Mon, 26 Feb 2024 00:00:00 GMT Mistral Large is Mistral's flagship large model for high-complexity enterprise tasks with strong reasoning, knowledge, and coding capabilities. It supports function calling and excels in RAG and agentic workflows across multiple languages. Mistral AI Moonshot AI: Moonshot v1 8K https://developer.puter.com/ai/moonshotai/moonshot-v1-8k/ https://developer.puter.com/ai/moonshotai/moonshot-v1-8k/ Wed, 31 Jan 2024 00:00:00 GMT Moonshot V1 8K is a general-purpose text generation model from Moonshot AI, the Beijing-based company behind the Kimi assistant. It supports an 8,000-token context window, making it the most lightweight option in the Moonshot V1 family. All Moonshot V1 models share the same underlying capabilities — the only difference is the maximum context length. This variant is best suited for short-form tasks like single-turn Q&A, classification, and concise summaries where you want to minimize token costs. The API is OpenAI-compatible, so you can integrate it by swapping the base URL and API key in any existing OpenAI SDK setup. The model handles both English and Chinese well. Moonshot AI Moonshot AI: Moonshot v1 32K https://developer.puter.com/ai/moonshotai/moonshot-v1-32k/ https://developer.puter.com/ai/moonshotai/moonshot-v1-32k/ Wed, 31 Jan 2024 00:00:00 GMT Moonshot V1 32K is a general-purpose text generation model from Moonshot AI with a 32,000-token context window. It sits in the middle of the Moonshot V1 family, balancing context capacity with cost. All Moonshot V1 variants share the same model quality — only the context length differs. The 32K window is well-suited for multi-turn conversations, medium-length document summarization, and tasks where inputs and outputs together exceed 8K tokens but don't require the full 128K capacity. The API is fully OpenAI-compatible, supporting streaming, tool calling, and standard chat completion parameters. The model performs well in both English and Chinese. Moonshot AI Moonshot AI: Moonshot v1 128K https://developer.puter.com/ai/moonshotai/moonshot-v1-128k/ https://developer.puter.com/ai/moonshotai/moonshot-v1-128k/ Wed, 31 Jan 2024 00:00:00 GMT Moonshot V1 128K is a long-context text generation model from Moonshot AI, offering a 128,000-token context window. Moonshot AI was one of the first companies to ship native 128K-token context support when the Kimi chatbot launched in 2023. This variant is designed for tasks that demand large input windows: processing entire codebases, analyzing lengthy legal or financial documents, or maintaining very long conversation histories. It shares the same model quality as the 8K and 32K variants — context length is the only differentiator. The API is OpenAI-compatible and supports streaming, tool calling, and context caching for reduced latency and cost on repeated prompts. Moonshot AI Qwen: Qwen-Plus https://developer.puter.com/ai/qwen/qwen-plus/ https://developer.puter.com/ai/qwen/qwen-plus/ Thu, 25 Jan 2024 00:00:00 GMT Qwen Plus is a high-performance proprietary API model balancing capability and cost, suitable for complex tasks requiring strong reasoning and multilingual support. Available through Alibaba Cloud Model Studio. Qwen Qwen: Qwen VL Plus https://developer.puter.com/ai/qwen/qwen-vl-plus/ https://developer.puter.com/ai/qwen/qwen-vl-plus/ Thu, 25 Jan 2024 00:00:00 GMT Qwen VL Plus is a balanced vision-language API model offering good performance at lower cost, suitable for image understanding, OCR, and multimodal tasks without requiring maximum capability. Qwen OpenAI: GPT-3.5 Turbo 0613 https://developer.puter.com/ai/openai/gpt-3.5-turbo-0613/ https://developer.puter.com/ai/openai/gpt-3.5-turbo-0613/ Thu, 25 Jan 2024 00:00:00 GMT GPT-3.5 Turbo 0613 is a snapshot of GPT-3.5 Turbo from June 2023, providing consistent behavior for applications requiring a locked model version. It's a legacy model with limited support. OpenAI OpenAI: GPT-4 Turbo Preview (Deprecated) https://developer.puter.com/ai/openai/gpt-4-turbo-preview/ https://developer.puter.com/ai/openai/gpt-4-turbo-preview/ Thu, 25 Jan 2024 00:00:00 GMT GPT-4 Turbo Preview is a deprecated research preview of GPT-4 Turbo. It was an early fast GPT model that has been superseded by production releases. OpenAI Mistral AI: Mistral 7B Instruct v0.2 https://developer.puter.com/ai/mistralai/mistral-7b-instruct-v0.2/ https://developer.puter.com/ai/mistralai/mistral-7b-instruct-v0.2/ Mon, 15 Jan 2024 00:00:00 GMT Mistral 7B Instruct v0.2 introduces a 32K context window and improved performance over v0.1. It outperforms Llama 2 13B and Llama 1 34B on most benchmarks while remaining efficient for local deployment under Apache 2.0. Mistral AI Mistral AI: Mistral Tiny https://developer.puter.com/ai/mistralai/mistral-tiny/ https://developer.puter.com/ai/mistralai/mistral-tiny/ Mon, 11 Dec 2023 00:00:00 GMT Mistral Tiny is an earlier lightweight Mistral model optimized for speed and efficiency. It provides basic language capabilities for simple tasks where minimal latency and resource usage are prioritized over maximum performance. Mistral AI Mistral AI: Mixtral 8x7B Instruct https://developer.puter.com/ai/mistralai/mixtral-8x7b-instruct/ https://developer.puter.com/ai/mistralai/mixtral-8x7b-instruct/ Mon, 11 Dec 2023 00:00:00 GMT Mixtral 8x7B is a sparse MoE model with 45B total / 13B active parameters using 8 experts per layer. It outperforms Llama 2 70B and GPT-3.5 while running 6x faster, mastering English, French, German, Spanish, and Italian. Mistral AI NeverSleep: Noromaid 20B https://developer.puter.com/ai/neversleep/noromaid-20b/ https://developer.puter.com/ai/neversleep/noromaid-20b/ Thu, 16 Nov 2023 00:00:00 GMT Noromaid 20B is a 20-billion-parameter roleplay and conversation model created by NeverSleep (IkariDev and Undi), built on the Llama 2 architecture. It supports a context window of up to 8,192 tokens. The model was trained on a mix of curated datasets, including the no_robots dataset for more natural, human-like output and the Aesir private RP dataset contributed by the MinervaAI team. This combination aims to produce responses that feel less formulaic than typical merge-based community models. Noromaid 20B targets interactive roleplay, character simulation, and open-ended creative dialogue. Its larger parameter count compared to 7–13B alternatives gives it better coherence in longer exchanges, making it a reasonable mid-size option for developers building narrative or conversational applications. NeverSleep OpenAI: GPT-4 1106 Preview https://developer.puter.com/ai/openai/gpt-4-1106-preview/ https://developer.puter.com/ai/openai/gpt-4-1106-preview/ Mon, 06 Nov 2023 00:00:00 GMT GPT-4 1106 Preview is a November 2023 preview of GPT-4 Turbo with improved instruction following and JSON mode. It's a deprecated preview version superseded by GPT-4 Turbo's general release. OpenAI Alpindale: Goliath 120B https://developer.puter.com/ai/alpindale/goliath-120b/ https://developer.puter.com/ai/alpindale/goliath-120b/ Sun, 05 Nov 2023 00:00:00 GMT Goliath 120B is a community-created large language model built by Alpindale by merging two fine-tuned Llama-2 70B models — Xwin and Euryale — into a single 120-billion-parameter model using the mergekit framework. It was one of the earliest and most notable examples of the model-merging technique in the open-source LLM community, demonstrating that interleaving layers from two complementary fine-tunes could produce a capable larger model without traditional training. It supports Vicuna and Alpaca prompt formats, with Vicuna generally recommended. Goliath 120B is primarily suited for creative writing, storytelling, and open-ended text generation. Its context window is limited to around 4–6K tokens, and no official benchmark scores have been published. Developers should consider it an experimental community model best fit for creative and conversational use cases rather than production workloads requiring verified performance. Alpindale EleutherAI: Llemma 7b https://developer.puter.com/ai/eleutherai/llemma_7b/ https://developer.puter.com/ai/eleutherai/llemma_7b/ Mon, 16 Oct 2023 00:00:00 GMT Llemma 7B is an open-source language model purpose-built for mathematics, developed by EleutherAI. It was created by continuing pretraining of Code Llama 7B on the Proof-Pile-2, a 55-billion-token dataset of scientific papers, math-heavy web content, and mathematical code. The model excels at chain-of-thought mathematical reasoning and can leverage computational tools like Python interpreters and formal theorem provers (Lean, Isabelle) without additional fine-tuning. On the MATH benchmark, Llemma 7B scores 18.0% pass@1, and on GSM8k it achieves 36.4% — significantly outperforming Llama 2 and Code Llama, and surpassing Google's Minerva on an equal-parameter basis. Llemma is best suited as a specialized base model for math-heavy applications such as step-by-step problem solving, formal proof generation, and scientific reasoning. Its fully open weights, data, and training code make it a strong foundation for further fine-tuning. EleutherAI OpenAI: DALL·E 3 https://developer.puter.com/ai/openai/dall-e-3/ https://developer.puter.com/ai/openai/dall-e-3/ Tue, 03 Oct 2023 00:00:00 GMT DALL·E 3 is OpenAI's 2023 text-to-image model that generates higher-quality images at 1024x1024, 1024x1792, or 1792x1024 resolutions with improved prompt understanding and detail rendering. It integrates with ChatGPT for automatic prompt enhancement and offers 'vivid' and 'natural' style options. DALL·E 3 is now deprecated with support ending in May 2026. OpenAI OpenAI: GPT-3.5 Turbo Instruct https://developer.puter.com/ai/openai/gpt-3.5-turbo-instruct/ https://developer.puter.com/ai/openai/gpt-3.5-turbo-instruct/ Thu, 28 Sep 2023 00:00:00 GMT GPT-3.5 Turbo Instruct is an instruction-following model using the Completions API rather than Chat Completions. It's designed for single-turn instruction tasks rather than multi-turn conversations. OpenAI Mistral AI: Mistral 7B https://developer.puter.com/ai/mistralai/open-mistral-7b/ https://developer.puter.com/ai/mistralai/open-mistral-7b/ Wed, 27 Sep 2023 00:00:00 GMT Mistral 7B is Mistral's foundational 7.3B parameter open-source model under Apache 2.0, using sliding window attention and grouped-query attention. It outperforms Llama 2 13B on all benchmarks while being efficient enough for consumer hardware. Mistral AI Mistral AI: Mistral 7B Instruct https://developer.puter.com/ai/mistralai/mistral-7b-instruct/ https://developer.puter.com/ai/mistralai/mistral-7b-instruct/ Wed, 27 Sep 2023 00:00:00 GMT Mistral 7B Instruct is the instruction-tuned version of Mistral 7B, fine-tuned on publicly available datasets. It outperforms all 7B models on MT-Bench and competes with 13B chat models while maintaining Apache 2.0 licensing. Mistral AI Mistral AI: Mistral 7B Instruct v0.1 https://developer.puter.com/ai/mistralai/mistral-7b-instruct-v0.1/ https://developer.puter.com/ai/mistralai/mistral-7b-instruct-v0.1/ Wed, 27 Sep 2023 00:00:00 GMT Mistral 7B Instruct v0.1 is the original instruction-tuned version of Mistral 7B released September 2023. It demonstrates strong instruction-following capabilities while maintaining efficiency through sliding window and grouped-query attention. Mistral AI OpenAI: GPT-3.5 Turbo 16K https://developer.puter.com/ai/openai/gpt-3.5-turbo-16k/ https://developer.puter.com/ai/openai/gpt-3.5-turbo-16k/ Mon, 28 Aug 2023 00:00:00 GMT GPT-3.5 Turbo 16K is a variant with an extended 16,384 token context window, allowing processing of longer documents. It's a legacy model superseded by newer models with larger contexts. OpenAI Mancer: Weaver (alpha) https://developer.puter.com/ai/mancer/weaver/ https://developer.puter.com/ai/mancer/weaver/ Wed, 02 Aug 2023 00:00:00 GMT Weaver (alpha) is a LLaMA 2 13B fine-tune by Mancer, built specifically for roleplay and narrative text generation. The model aims to recreate Claude-style verbose, descriptive prose but in an unfiltered package — making it a niche pick for creative storytelling, character-driven dialogue, and interactive fiction. It supports an 8K context window and uses the Alpaca instruct format for best results. As an alpha release, Weaver lacks published benchmark scores and isn't intended for general-purpose tasks like coding or analysis. It's best suited for developers building narrative-focused applications — chatbots, text adventures, or collaborative fiction tools — where rich, detailed output matters more than factual precision. Mancer Stability AI: Stable Diffusion XL Base 1.0 https://developer.puter.com/ai/stabilityai/stable-diffusion-xl-base-1.0/ https://developer.puter.com/ai/stabilityai/stable-diffusion-xl-base-1.0/ Wed, 26 Jul 2023 00:00:00 GMT Stable Diffusion XL Base 1.0 is a text-to-image latent diffusion model by Stability AI that generates more photorealistic images with better composition and legible text compared to earlier SD versions. It uses a mixture-of-experts pipeline and can work standalone or with an optional refiner model for enhanced results. The model runs efficiently on consumer GPUs with 8GB VRAM. Stability AI Undi95: ReMM SLERP 13B https://developer.puter.com/ai/undi95/remm-slerp-l2-13b/ https://developer.puter.com/ai/undi95/remm-slerp-l2-13b/ Sat, 22 Jul 2023 00:00:00 GMT ReMM SLERP 13B is a community-built 13-billion-parameter language model created by Undi95 as an updated recreation of the popular MythoMax-L2-13B. Built on the Llama 2 architecture, it uses SLERP merging to combine ReML (itself a blend of Chronos-Beluga v2, Airoboros 2.1, and Nous-Hermes) with Huginn v1.2. The model is designed for roleplay, creative writing, and interactive storytelling. It inherits the MythoMax lineage's strength in maintaining consistent character voice, generating vivid prose, and sustaining coherent narratives across extended conversations. With a 4,096-token max output and a roughly 6K context window, it's best suited for creative and conversational use cases rather than reasoning, coding, or instruction-following tasks. Developers building character-driven chat experiences or interactive fiction on a budget will find it a lightweight, capable option in the MythoMax family. Undi95 Gryphe: MythoMax 13B https://developer.puter.com/ai/gryphe/mythomax-l2-13b/ https://developer.puter.com/ai/gryphe/mythomax-l2-13b/ Sun, 02 Jul 2023 00:00:00 GMT MythoMax L2 13B is a 13-billion-parameter language model created by Gryphe, built on Llama 2 and specialized for creative writing, storytelling, and character roleplay. Rather than being trained from scratch, it was produced by merging two models — MythoLogic-L2 and Huginn — using an experimental tensor-level blending technique. MythoLogic-L2 contributes strong comprehension at the input layers while Huginn drives expressive writing at the output layers, resulting in unusually coherent long-form narrative generation for its size. The model excels at maintaining consistent character voice across extended exchanges, producing dialogue and scene descriptions with natural pacing. It's a strong fit for interactive fiction, RPG dialogue generation, and narrative branching where frontier-model API costs would be prohibitive. Context length is 4,096 tokens. Not recommended for reasoning, coding, or factual tasks. Gryphe OpenAI: GPT-3.5 Turbo https://developer.puter.com/ai/openai/gpt-3.5-turbo/ https://developer.puter.com/ai/openai/gpt-3.5-turbo/ Sun, 28 May 2023 00:00:00 GMT GPT-3.5 Turbo is a legacy GPT model optimized for chat and non-chat tasks at low cost. As of July 2024, OpenAI recommends using GPT-4o Mini instead as it's cheaper, more capable, and multimodal. OpenAI OpenAI: GPT-4 https://developer.puter.com/ai/openai/gpt-4/ https://developer.puter.com/ai/openai/gpt-4/ Sun, 28 May 2023 00:00:00 GMT GPT-4 is an older high-intelligence GPT model that understands and generates complex text for creative writing, data analysis, and code generation. It has a 23,000-25,000 word context window. OpenAI OpenAI: GPT-4 0314 https://developer.puter.com/ai/openai/gpt-4-0314/ https://developer.puter.com/ai/openai/gpt-4-0314/ Sun, 28 May 2023 00:00:00 GMT GPT-4 0314 is a snapshot of GPT-4 from March 2023, providing consistent behavior for applications requiring a specific model version. It's a legacy snapshot with limited ongoing support. OpenAI Lykon: DreamShaper https://developer.puter.com/ai/lykon/dreamshaper/ https://developer.puter.com/ai/lykon/dreamshaper/ Sat, 01 Apr 2023 00:00:00 GMT DreamShaper is a community-developed text-to-image model by Lykon, fine-tuned on Stable Diffusion v1.5 and designed as a versatile, open-source alternative to MidJourney. It excels as a generalist image generator, handling artistic illustrations, photorealistic portraits, anime-style characters, and fantasy artwork without needing style-specific models. Its strength lies in producing painterly, natural-looking outputs rather than CG-heavy or over-filtered results. The model supports LoRA adapters, ControlNet, and inpainting variants, giving developers flexible control over outputs. An LCM (Latent Consistency Model) variant is also available for faster generation with fewer inference steps. DreamShaper is a strong fit for creative applications like character design, concept art, and artistic content generation where stylistic range matters more than narrow specialization. Lykon OpenAI: DALL·E 2 https://developer.puter.com/ai/openai/dall-e-2/ https://developer.puter.com/ai/openai/dall-e-2/ Wed, 06 Apr 2022 00:00:00 GMT DALL·E 2 is OpenAI's earlier text-to-image model released in 2022 that generates images up to 1024x1024 pixels and supports inpainting, outpainting, and image variations. It offers more control in prompting and allows multiple images per request but produces lower quality results than newer models. DALL·E 2 is now deprecated and will be discontinued in May 2026. OpenAI Leonardo.Ai: Lucid Origin https://developer.puter.com/ai/leonardoai/lucid-origin/ https://developer.puter.com/ai/leonardoai/lucid-origin/ Lucid Origin is a text-to-image generation model from Leonardo AI, designed to deliver high aesthetic fidelity and creative versatility. It produces Full HD renders with rich color saturation, sharp detail, and strong prompt adherence across multi-element scenes. The model excels at accurate in-image text rendering, making it well suited for branded content, graphic design, and promotional visuals. Its wide stylistic range — from hyper-realistic photography to stylized illustration and concept art — makes it a practical choice for APIs serving diverse creative workflows without requiring heavily engineered prompts. Leonardo.Ai Leonardo.Ai: Phoenix 1.0 https://developer.puter.com/ai/leonardoai/phoenix-1.0/ https://developer.puter.com/ai/leonardoai/phoenix-1.0/ Phoenix 1.0 is Leonardo AI's first foundational image generation model, built from the ground up rather than fine-tuned on an existing architecture. It delivers outputs up to approximately 5 megapixels (e.g. 2048x2048), making it one of the higher-resolution options available via API. Its core differentiator is prompt fidelity — reported at around 95% adherence, significantly above the 70-80% typical of standard models — enabling reliable execution of long, detailed prompts. It also features coherent text rendering for legible typography within generated images. Phoenix 1.0 targets professional publishing, marketing, and high-detail creative production pipelines where prompt accuracy and resolution matter. Leonardo.Ai