Moonshot AI API

Q: How much does it cost?

With the User-Pays model , users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

Access Moonshot AI instantly with Puter.js, and add AI to any app in a few lines of code without backend or API keys.

Get Started Read Tutorial

// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';

puter.ai.chat("Explain AI like I'm five!", {
    model: "moonshotai/kimi-k2"
}).then(response => {
    console.log(response);
});

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        puter.ai.chat("Explain AI like I'm five!", {
            model: "moonshotai/kimi-k2"
        }).then(response => {
            console.log(response);
        });
    </script>
</body>
</html>

List of Moonshot AI Models

Chat

Kimi K2.6

moonshotai/kimi-k2.6

Kimi K2.6 is Moonshot AI's latest open-weight multimodal model, built on a 1-trillion-parameter mixture-of-experts architecture with a 256K context window. It excels at agentic coding and long-horizon execution, supporting sustained autonomous workflows with 4,000+ tool calls across languages like Rust, Go, and Python. On key benchmarks, it scores 58.6 on SWE-Bench Pro, 54.0 on HLE with Tools, and 50.0 on Toolathlon — competitive with GPT-5.4 and Claude Opus 4.6 on coding and agent tasks, though trailing them on pure reasoning. The model accepts text, image, and video input, supports both thinking and non-thinking modes, and offers an OpenAI-compatible API. It's a strong pick for developers building multi-step agentic workflows and complex software engineering pipelines.

Chat

Kimi K2.5

moonshotai/kimi-k2.5

Kimi K2.5 is Moonshot AI's most capable open-source model, a natively multimodal (vision + text) trillion-parameter MoE with 32B active parameters released in January 2026. Built through continual pretraining on ~15 trillion mixed visual and text tokens atop the K2 base, it supports both thinking and instant modes with a 256K context window. It scored 76.8% on SWE-bench Verified, 96.1% on AIME 2025, and 50.2% on Humanity's Last Exam with tools — outperforming Claude Opus 4.5 and GPT-5.2 on the latter. Its standout feature is Agent Swarm, which coordinates up to 100 parallel sub-agents for complex tasks. K2.5 excels at vision-to-code generation, frontend development from screenshots, and large-scale agentic workflows, making it a strong choice for developers building multimodal AI agents.

Chat

Kimi K2 Thinking

moonshotai/kimi-k2-thinking

Kimi K2 Thinking is Moonshot AI's reasoning-enhanced variant of Kimi K2, trained to interleave step-by-step chain-of-thought with dynamic tool calls. It supports up to 200–300 sequential tool calls without drift, enabling deep autonomous research, coding, and analysis workflows. It achieves 71.3% on SWE-bench Verified, 44.9% on Humanity's Last Exam (with tools), 60.2% on BrowseComp, and 99.1% on AIME 2025 (with Python) — placing it among the top open-source thinking models. It uses native INT4 quantization and a 256K context window. K2 Thinking is designed for complex, multi-step tasks where extended reasoning and sustained tool orchestration matter more than low-latency responses.

Chat

Kimi K2 0905

moonshotai/kimi-k2-0905

Kimi K2 0905 is Moonshot AI's September 2025 update to the original Kimi K2, delivering enhanced coding performance and improved tool-calling reliability. It shares the same 1-trillion-parameter MoE architecture with 32B active parameters but doubles the context window from 128K to 256K tokens. Key improvements include stronger frontend development capabilities — producing cleaner, more polished UI code for frameworks like React, Vue, and Angular — along with better integration across popular agent scaffolds. It scored 53.7% Pass@1 on LiveCodeBench. This version is ideal for developers who want K2's agentic strengths with improved real-world coding quality and longer context support for large codebases.

Chat

Kimi K2 0711

moonshotai/kimi-k2

Kimi K2 is a trillion-parameter Mixture-of-Experts model by Moonshot AI, activating 32 billion parameters per token. Designed as a non-thinking model optimized for agentic capabilities, it excels at tool use, code generation, and autonomous problem-solving with a 128K token context window. On benchmarks, K2 scored 65.8% on SWE-bench Verified, 75.1% on GPQA-Diamond, 49.5% on AIME 2025, and 66.1 on Tau2-bench — surpassing most open- and closed-source models in non-thinking settings. It ranked as the #1 open-source model on the LMSYS Arena leaderboard upon release in July 2025. K2 is well suited for developers building AI agents and tool-calling pipelines who need strong coding and reasoning without extended thinking overhead.

Chat

Kimi Dev 72B

moonshotai/kimi-dev-72b

Kimi Dev 72B is a 72-billion-parameter coding model by Moonshot AI, purpose-built for software engineering tasks like bug fixing, code generation, and unit test creation. It is based on the Qwen 2.5-72B architecture and fine-tuned with large-scale reinforcement learning on real-world GitHub issues and pull requests. The model achieved 60.4% on SWE-bench Verified, setting a state-of-the-art result among open-source models at the time of its June 2025 release. It uses a two-stage framework — file localization followed by precise code editing — that mirrors how human developers approach issue resolution. Kimi Dev 72B is a strong pick for automated code repair and test generation workflows where a specialized coding model outperforms general-purpose alternatives.

Chat

Moonshot v1 8K Vision (Preview)

moonshotai/moonshot-v1-8k-vision-preview

Moonshot V1 8K Vision Preview is a multimodal variant of Moonshot AI's V1 model that accepts both image and text inputs within an 8,000-token context window. It can interpret screenshots, charts, UI mockups, and photos, returning text-based analysis. This makes it useful for tasks like image captioning, visual Q&A, and lightweight document understanding where the source material includes visual elements. As a preview model, it may see changes before a stable release. The API follows the OpenAI-compatible content array format with image_url blocks, making integration straightforward for developers already using similar patterns.

Chat

Moonshot v1 32K Vision (Preview)

moonshotai/moonshot-v1-32k-vision-preview

Moonshot V1 32K Vision Preview is a multimodal model from Moonshot AI that processes both images and text within a 32,000-token context window. It extends the base 32K model with the ability to interpret visual inputs — including screenshots, diagrams, charts, and scanned documents — and return text-based responses. This is useful for workflows that combine visual context with moderate-length text, such as analyzing annotated documents or explaining UI designs. As a preview release, the vision capabilities may evolve. The API accepts the standard OpenAI-compatible content array format for multimodal inputs.

Chat

Moonshot v1 128K Vision (Preview)

moonshotai/moonshot-v1-128k-vision-preview

Moonshot V1 128K Vision Preview is Moonshot AI's largest-context multimodal model in the V1 series, supporting both image and text inputs within a 128,000-token context window. It combines the long-context strength of the 128K text model with visual understanding capabilities. This makes it well-suited for processing large multimodal documents — think lengthy reports with embedded charts, multi-page scanned PDFs, or extensive UI review sessions. As a preview model, vision features may be refined over time. The API uses the standard OpenAI-compatible format for multimodal content, making it a drop-in addition to existing workflows.

Chat

Moonshot v1 Auto

moonshotai/moonshot-v1-auto

Moonshot V1 Auto is a smart routing layer from Moonshot AI that automatically selects the most cost-efficient context window — 8K, 32K, or 128K — based on the token count of each request. It uses the same underlying Moonshot V1 model as the fixed-context variants, so there is no difference in output quality. The routing simply ensures you're billed at the lowest applicable tier for each call, eliminating the need to manually choose a context size or overpay for unused capacity. Usage is identical to the other Moonshot V1 models — just set the model ID to `moonshot-v1-auto` and the platform handles the rest. Ideal for applications with variable-length inputs.

Chat

Moonshot v1 8K

moonshotai/moonshot-v1-8k

Moonshot V1 8K is a general-purpose text generation model from Moonshot AI, the Beijing-based company behind the Kimi assistant. It supports an 8,000-token context window, making it the most lightweight option in the Moonshot V1 family. All Moonshot V1 models share the same underlying capabilities — the only difference is the maximum context length. This variant is best suited for short-form tasks like single-turn Q&A, classification, and concise summaries where you want to minimize token costs. The API is OpenAI-compatible, so you can integrate it by swapping the base URL and API key in any existing OpenAI SDK setup. The model handles both English and Chinese well.

Chat

Moonshot v1 32K

moonshotai/moonshot-v1-32k

Moonshot V1 32K is a general-purpose text generation model from Moonshot AI with a 32,000-token context window. It sits in the middle of the Moonshot V1 family, balancing context capacity with cost. All Moonshot V1 variants share the same model quality — only the context length differs. The 32K window is well-suited for multi-turn conversations, medium-length document summarization, and tasks where inputs and outputs together exceed 8K tokens but don't require the full 128K capacity. The API is fully OpenAI-compatible, supporting streaming, tool calling, and standard chat completion parameters. The model performs well in both English and Chinese.

Chat

Moonshot v1 128K

moonshotai/moonshot-v1-128k

Moonshot V1 128K is a long-context text generation model from Moonshot AI, offering a 128,000-token context window. Moonshot AI was one of the first companies to ship native 128K-token context support when the Kimi chatbot launched in 2023. This variant is designed for tasks that demand large input windows: processing entire codebases, analyzing lengthy legal or financial documents, or maintaining very long conversation histories. It shares the same model quality as the 8K and 32K variants — context length is the only differentiator. The API is OpenAI-compatible and supports streaming, tool calling, and context caching for reduced latency and cost on repeated prompts.

Frequently Asked Questions

What is this Moonshot AI API about?

The Moonshot AI API gives you access to models for AI chat. Through Puter.js, you can start using Moonshot AI models instantly with zero setup or configuration.

Which Moonshot AI models can I use?

Puter.js supports a variety of Moonshot AI models, including Kimi K2.6, Kimi K2.5, Kimi K2 Thinking, and more. Find all AI models supported by Puter.js in the AI model list.

How much does it cost?

With the User-Pays model, users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

What is Puter.js?

Puter.js is a JavaScript library that provides access to AI, storage, and other cloud services directly from a single API. It handles authentication, infrastructure, and scaling so you can focus on building your app.

Does this work with React / Vue / Vanilla JS / Node / etc.?

Yes — the Moonshot AI API through Puter.js works with any JavaScript framework, Node.js, or plain HTML. Just include the library and start building. See the documentation for more details.

Moonshot AI API

List of Moonshot AI Models

Kimi K2.6

Kimi K2.5

Kimi K2 Thinking

Kimi K2 0905

Kimi K2 0711

Kimi Dev 72B

Moonshot v1 8K Vision (Preview)

Moonshot v1 32K Vision (Preview)

Moonshot v1 128K Vision (Preview)

Moonshot v1 Auto

Moonshot v1 8K

Moonshot v1 32K

Moonshot v1 128K

Frequently Asked Questions

Related Resources

Free, Unlimited Moonshot AI API

Free, Unlimited Kimi K2 API

Getting Started with Puter.js