Arcee AI API

Q: How much does it cost?

With the User-Pays model , users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

Access Arcee AI instantly with Puter.js, and add AI to any app in a few lines of code without backend or API keys.

Get Started Read Tutorial

// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';

puter.ai.chat("Explain AI like I'm five!", {
    model: "arcee-ai/coder-large"
}).then(response => {
    console.log(response);
});

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        puter.ai.chat("Explain AI like I'm five!", {
            model: "arcee-ai/coder-large"
        }).then(response => {
            console.log(response);
        });
    </script>
</body>
</html>

List of Arcee AI Models

Chat

Trinity Large Thinking

arcee-ai/trinity-large-thinking

Trinity Large Thinking is a 398-billion-parameter sparse Mixture-of-Experts reasoning model from Arcee AI, with approximately 13B active parameters per token, post-trained with extended chain-of-thought and agentic reinforcement learning. It generates explicit reasoning traces in thinking blocks before final responses, and its 262K context window accommodates long agentic reasoning chains. Benchmark results include 94.7% on τ²-Bench and 98.2% on LiveCodeBench, placing it at #2 on PinchBench behind only Claude Opus 4.6. Released under Apache 2.0, Trinity Large Thinking is the strongest option in the Trinity family for agentic pipelines, long-horizon planning, complex multi-step coding, and tasks that benefit from transparent reasoning traces.

Chat

Trinity Large Preview

arcee-ai/trinity-large-preview

Trinity Large Preview is a 400-billion-parameter sparse Mixture-of-Experts model from Arcee AI, with approximately 13B active parameters per token. It uses 256 experts with 4 active per token, trained on over 17 trillion tokens. On MMLU it scores 87.2, and it achieved 24.0 on AIME 2025, demonstrating strong mathematical reasoning alongside general knowledge. The 128k context window supports long-document analysis and complex reasoning workflows. Trinity Large Preview is suited for complex reasoning, math, and coding-adjacent workflows where developers want near-frontier quality through an API at substantially lower cost than dense models of equivalent scale.

Chat

Trinity Mini

arcee-ai/trinity-mini

Trinity Mini is a 26-billion-parameter sparse Mixture-of-Experts model from Arcee AI, with approximately 3B active parameters per token. It uses 128 experts with 8 active per token, blending global sparsity with gated attention techniques. Specifically tuned for multi-turn agent workflows, tool orchestration, function calling, and structured outputs, it scores 84.95 on MMLU and 59.67 on BFCL V3, with throughput exceeding 200 tokens per second. Released under Apache 2.0, the 128k context window and strong function-calling performance make Trinity Mini a practical choice for agentic systems, backend automation, and tool-use pipelines where inference speed and cost efficiency matter.

Chat

Virtuoso Large

arcee-ai/virtuoso-large

Arcee Virtuoso Large is a 72-billion-parameter general-purpose language model from Arcee AI, built on Qwen2.5-72B and post-trained using DeepSeek R1 distillation, multi-epoch supervised fine-tuning, and DPO/RLHF alignment. It is designed for cross-domain reasoning, enterprise question answering, creative writing, and long-document comprehension, with a 128k context window that enables processing entire codebases or lengthy documents in a single API call. Virtuoso Large is Arcee's flagship dense general-purpose model — a solid default choice for developers who need reliable, broad-capability performance without the routing complexity of MoE architectures.

Chat

Spotlight

arcee-ai/spotlight

Arcee Spotlight is a 7-billion-parameter vision-language model from Arcee AI, derived from Qwen2.5-VL and fine-tuned for tight image-text grounding tasks including visual question answering, image captioning, and diagram analysis. At 7B parameters it is designed for fast inference, making it practical for real-time or high-volume multimodal API workloads where latency and cost are constraints. Early benchmarks show it matching or outscoring larger VLMs such as LLaVA-1.6 13B on VQA and POPE alignment tests. A strong choice for developers who need capable vision-language understanding without the cost overhead of larger multimodal models — well suited for document parsing, visual QA pipelines, and image-grounded chat.

Chat

Coder Large

arcee-ai/coder-large

Arcee Coder Large is a 32-billion-parameter code-generation model from Arcee AI, fine-tuned from Qwen2.5-Instruct on permissively-licensed GitHub data, CodeSearchNet, and synthetic bug-fix corpora. It generates compilable code, explains implementations, reviews diffs, and fixes bugs across 30+ programming languages, with particular strength in TypeScript, Go, and Terraform. A reinforcement learning stage specifically rewards compilable outputs, making it more reliable than general-purpose models on real developer prompts. The 32k context window supports multi-file refactoring and long diff review in a single API call. A strong choice for code-heavy pipelines where output correctness and structured explanations matter.

Chat

Maestro Reasoning

arcee-ai/maestro-reasoning

Arcee Maestro Reasoning is a 32-billion-parameter analytical reasoning model from Arcee AI, derived from Qwen2.5-32B and post-trained with DPO and chain-of-thought reinforcement learning to produce step-by-step logical reasoning traces. It targets complex problem-solving, abstract reasoning, multi-step scenario modeling, and tasks requiring transparent, auditable inference chains — a natural fit for legal, financial, and scientific applications. The 128k context window allows reasoning over long documents in a single call. On Yupp's high-reasoning benchmark, Maestro Reasoning ranks among the top five models overall, competing with significantly larger frontier models. It delivers strong reasoning quality at a mid-tier parameter count.

Frequently Asked Questions

What is this Arcee AI API about?

The Arcee AI API gives you access to models for AI chat. Through Puter.js, you can start using Arcee AI models instantly with zero setup or configuration.

Which Arcee AI models can I use?

Puter.js supports a variety of Arcee AI models, including Trinity Large Thinking, Trinity Large Preview, Trinity Mini, and more. Find all AI models supported by Puter.js in the AI model list.

How much does it cost?

With the User-Pays model, users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

What is Puter.js?

Puter.js is a JavaScript library that provides access to AI, storage, and other cloud services directly from a single API. It handles authentication, infrastructure, and scaling so you can focus on building your app.

Does this work with React / Vue / Vanilla JS / Node / etc.?

Yes — the Arcee AI API through Puter.js works with any JavaScript framework, Node.js, or plain HTML. Just include the library and start building. See the documentation for more details.

Arcee AI API

List of Arcee AI Models

Trinity Large Thinking

Trinity Large Preview

Trinity Mini

Virtuoso Large

Spotlight

Coder Large

Maestro Reasoning

Frequently Asked Questions

Related Resources

Free, Unlimited Arcee AI API

Getting Started with Puter.js

Free, Unlimited DeepSeek API