Deep Cogito API

Q: How much does it cost?

With the User-Pays model , users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

Access Deep Cogito instantly with Puter.js, and add AI to any app in a few lines of code without backend or API keys.

Get Started Read Tutorial

// npm install @heyputer/puter.js
import { puter } from '@heyputer/puter.js';

puter.ai.chat("Explain AI like I'm five!", {
    model: "deepcogito/cogito-v2-preview-llama-70b"
}).then(response => {
    console.log(response);
});

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        puter.ai.chat("Explain AI like I'm five!", {
            model: "deepcogito/cogito-v2-preview-llama-70b"
        }).then(response => {
            console.log(response);
        });
    </script>
</body>
</html>

List of Deep Cogito Models

Chat

Cogito v2.1 671B

deepcogito/cogito-v2.1-671b

Cogito v2.1 671B is a 671-billion-parameter Mixture-of-Experts language model from DeepCogito, with 37 billion parameters active per forward pass. Built with reinforcement learning via self-play, it excels at instruction following, coding, complex reasoning, multi-turn dialogue, and creative writing. It features a hybrid reasoning mode that produces results comparable to or better than DeepSeek R1 while using roughly 60% fewer reasoning tokens. It supports a 128K context window and 30+ languages. Benchmark highlights include 98.57% on MATH-500, 77.72% on GPQA Diamond, 84.69% on MMLU Pro, and 89.47% on AIME 2025. A strong open-weight choice for agents, coding assistants, or math-heavy applications needing frontier-level performance with token-efficient reasoning.

Chat

Cogito V2 Preview Llama 109B

deepcogito/cogito-v2-preview-llama-109b-moe

Cogito V2 Preview Llama 109B MoE is a sparse Mixture-of-Experts language model built on Llama architecture, developed by DeepCogito using their Iterated Distillation and Amplification (IDA) training method. The MoE design activates only a subset of expert networks per token, delivering strong reasoning at lower per-token compute cost compared to dense models of the same size. It supports dual-mode operation: standard response generation or self-reflective reasoning mode via system prompt. Optimized for coding, STEM, instruction following, multilingual tasks (30+ languages), and tool calling with a 128K context window. A cost-effective option for API workloads that need strong reasoning without dense-model pricing.

Chat

Cogito V2 Preview Llama 405B

deepcogito/cogito-v2-preview-llama-405b

Cogito V2 Preview Llama 405B is a dense large language model built on Llama architecture and developed by DeepCogito using Iterated Distillation and Amplification (IDA), a training approach that internalizes reasoning capabilities directly into model weights. As DeepCogito's largest dense offering in the v2 preview series, it delivers near-frontier performance among open models across coding, STEM, general instruction following, and multilingual tasks (30+ languages). It supports a 128K context window. The model operates in both standard and self-reflective reasoning modes, with reasoning chains notably shorter than DeepSeek R1 by approximately 60%. Well-suited for high-accuracy API use cases where latency is less constrained.

Chat

Cogito V2 Preview Llama 70B

deepcogito/cogito-v2-preview-llama-70b

Cogito V2 Preview Llama 70B is a dense language model built on Llama architecture and trained by DeepCogito using Iterated Distillation and Amplification (IDA), which embeds reasoning ability into model weights to improve standard-mode performance without requiring extended chain-of-thought. It supports dual-mode operation — direct response or self-reflective reasoning — controlled via the system prompt. In standard mode, it achieves 91.73% on MMLU, outperforming Llama 3.3 70B by 6.4 points. It covers 30+ languages and supports a 128K context window with tool calling in both modes. Well-suited for API deployments requiring a balance of speed, cost efficiency, and strong reasoning on coding, STEM, and instruction-following tasks.

Frequently Asked Questions

What is this Deep Cogito API about?

The Deep Cogito API gives you access to models for AI chat. Through Puter.js, you can start using Deep Cogito models instantly with zero setup or configuration.

Which Deep Cogito models can I use?

Puter.js supports a variety of Deep Cogito models, including Cogito v2.1 671B, Cogito V2 Preview Llama 109B, Cogito V2 Preview Llama 405B, and more. Find all AI models supported by Puter.js in the AI model list.

How much does it cost?

With the User-Pays model, users cover their own AI costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.

What is Puter.js?

Puter.js is a JavaScript library that provides access to AI, storage, and other cloud services directly from a single API. It handles authentication, infrastructure, and scaling so you can focus on building your app.

Does this work with React / Vue / Vanilla JS / Node / etc.?

Yes — the Deep Cogito API through Puter.js works with any JavaScript framework, Node.js, or plain HTML. Just include the library and start building. See the documentation for more details.

Deep Cogito API

List of Deep Cogito Models

Cogito v2.1 671B

Cogito V2 Preview Llama 109B

Cogito V2 Preview Llama 405B

Cogito V2 Preview Llama 70B

Frequently Asked Questions

Related Resources

Free, Unlimited AI API

Getting Started with Puter.js

Free, Unlimited DeepSeek API