Tutorials

Claude API Pricing

On this page

This guide breaks down everything you need to know about Claude API pricing — every model, every tier, and every discount. Whether you're budgeting for a side project or planning enterprise-scale usage, you'll find the exact numbers here.

At the end, we'll also show you how to access Claude models for free using Puter.js: no API keys, no billing setup, no cost to you as a developer. Puter is the pioneer of the "User-Pays" model, which allows developers to incorporate AI capabilities into their applications while each user will cover their own usage costs.

How Claude API pricing works

Anthropic charges based on tokens — the pieces of text the model reads and generates. As a rough estimate, 1 token is approximately 4 characters or 0.75 words in English. You're billed separately for:

  • Input tokens: the text you send to the model (your prompt, system instructions, conversation history)
  • Output tokens: the text the model generates in response

All prices below are per million tokens (MTok) in USD.

Model pricing

Anthropic offers three tiers of Claude models, each with different price-to-performance tradeoffs:

Current models

Model Input Output Context Window Max Output
Claude Opus 4.6 $5 / MTok $25 / MTok 200K (1M beta) 128K tokens
Claude Sonnet 4.6 $3 / MTok $15 / MTok 200K (1M beta) 64K tokens
Claude Haiku 4.5 $1 / MTok $5 / MTok 200K 64K tokens

Legacy models (still available)

Model Input Output Context Window Max Output
Claude Opus 4.5 $5 / MTok $25 / MTok 200K 64K tokens
Claude Opus 4.1 $15 / MTok $75 / MTok 200K 32K tokens
Claude Opus 4 $15 / MTok $75 / MTok 200K 32K tokens
Claude Sonnet 4.5 $3 / MTok $15 / MTok 200K (1M beta) 64K tokens
Claude Sonnet 4 $3 / MTok $15 / MTok 200K (1M beta) 64K tokens
Claude Haiku 3.5 $0.80 / MTok $4 / MTok 200K
Claude Haiku 3 $0.25 / MTok $1.25 / MTok 200K 4K tokens

Which model should you choose?

  • Claude Opus 4.6 — Best for complex reasoning, research, multi-step analysis, and code generation. Worth the premium when accuracy matters more than speed.
  • Claude Sonnet 4.6 — The sweet spot for most applications. Strong reasoning at a moderate price. Best default choice for chatbots, content generation, and general-purpose tasks.
  • Claude Haiku 4.5 — Fastest and cheapest. Ideal for high-volume, low-complexity tasks like classification, entity extraction, short summaries, and routing decisions.

For most developers starting out, Sonnet 4.6 is the best value. It's 40% cheaper than Opus on input tokens and 40% cheaper on output tokens, while still being highly capable.

What does this cost in practice?

To give you a sense of real-world costs with Claude Sonnet 4.6 ($3 input / $15 output per MTok):

Use Case Approx. Tokens Estimated Cost
Single chat message (500 in / 500 out) 1,000 $0.009
Summarize a 10-page document ~5,000 in / 500 out $0.02
Analyze a 50-page PDF ~25,000 in / 2,000 out $0.11
Process 1,000 customer support tickets ~3.7M total $37.00
10,000 short API calls / day (30 days) ~300M/month ~$2,700/month

Costs scale linearly with usage. For high-volume applications, this adds up fast.

Claude vs GPT vs Gemini: price comparison

How does Claude stack up against competing models at similar capability tiers?

Model Input Output Context Window
Claude Sonnet 4.6 $3 / MTok $15 / MTok 200K
GPT-4o $2.50 / MTok $10 / MTok 128K
Gemini 2.5 Pro $1.25–$2.50 / MTok $10–$15 / MTok 1M
Claude Haiku 4.5 $1 / MTok $5 / MTok 200K
GPT-4o mini $0.15 / MTok $0.60 / MTok 128K
Gemini 2.5 Flash $0.15–$0.30 / MTok $0.60–$3.50 / MTok 1M
Claude Opus 4.6 $5 / MTok $25 / MTok 200K
GPT-o3 $10 / MTok $40 / MTok 200K

Claude's pricing is competitive with OpenAI and Google, though the best value depends on your specific use case. Claude tends to excel at nuanced writing, following complex instructions, and code generation.

Batch API pricing (50% off)

If your workload doesn't need real-time responses, the Batch API processes requests asynchronously at half the standard price:

Model Batch Input Batch Output
Claude Opus 4.6 $2.50 / MTok $12.50 / MTok
Claude Sonnet 4.6 $1.50 / MTok $7.50 / MTok
Claude Haiku 4.5 $0.50 / MTok $2.50 / MTok

The Batch API is ideal for bulk processing tasks like document analysis, data extraction, or content moderation where you can tolerate some latency.

Prompt caching pricing

Prompt caching reduces costs by reusing previously processed parts of your prompt (like system instructions or large documents) across API calls. Instead of re-reading the same content every time, Claude reads it from cache at a fraction of the price.

Cache Operation Price Multiplier Duration
5-minute cache write 1.25x base input price 5 minutes
1-hour cache write 2x base input price 1 hour
Cache read (hit) 0.1x base input price Same as write

A cache hit costs just 10% of the standard input price. For example, with Claude Sonnet 4.6:

  • Standard input: $3 / MTok
  • Cache hit: $0.30 / MTok

If you have a large system prompt that you send with every request, caching pays for itself after just one reuse (for 5-minute cache) or two reuses (for 1-hour cache).

Long context pricing

Claude Opus 4.6, Sonnet 4.6, Sonnet 4.5, and Sonnet 4 support a 1M token context window (beta). When your input exceeds 200K tokens, premium pricing kicks in:

Model Standard (≤ 200K input) Long Context (> 200K input)
Claude Opus 4.6 $5 in / $25 out $10 in / $37.50 out
Claude Sonnet 4.6 / 4.5 / 4 $3 in / $15 out $6 in / $22.50 out

The threshold is based on input tokens only. Once you cross 200K input tokens, all tokens in that request are charged at the premium rate.

Fast mode pricing

Claude Opus 4.6 offers a fast mode (research preview) with significantly faster output at 6x standard rates:

Input Output
Fast mode $30 / MTok $150 / MTok

Other costs to consider

Tool use

Tool definitions, tool calls, and tool results all count as tokens and are priced at standard rates. Each tool-enabled request adds ~346 tokens of system prompt overhead.

$10 per 1,000 searches, plus standard token costs for search results.

Data residency

Specifying US-only inference adds a 1.1x multiplier on all token pricing.

Rate limits

Anthropic uses a tiered rate limit system. Higher tiers unlock more capacity:

Tier How to Qualify
Tier 1 Entry-level (new accounts)
Tier 2 Growing applications
Tier 3 Established applications
Tier 4 High-volume (1M context beta access)
Enterprise Custom limits via sales

Rate limits are per-organization and apply to both requests per minute and tokens per minute. If you hit a rate limit, you'll get a 429 error and need to wait or upgrade your tier.

Billing and payment

For developers using the Claude API directly, the billing and payment is as follows:

  • Billed monthly based on actual usage
  • Payments in USD
  • Credit card required to start (no free tier for sustained use)
  • New accounts receive a small amount of free credits for testing
  • Enterprise customers can arrange invoicing
  • Usage tracking available in the Claude Platform

Tips to reduce your Claude API costs

If you're using the Claude API directly, here are practical ways to keep your bill down:

  1. Start with Haiku, upgrade as needed. Many tasks (classification, extraction, simple Q&A) don't need Opus-level reasoning. Use Haiku 4.5 at $1/$5 per MTok and only move up when quality demands it.

  2. Use prompt caching aggressively. If you're sending the same system prompt or reference documents with every request, enable caching. A 5-minute cache write costs 1.25x once, but every subsequent hit costs just 0.1x — a 90% discount on cached tokens.

  3. Batch non-urgent work. Document processing, data extraction, content moderation, and analytics can all use the Batch API at 50% off. If you don't need results in real-time, there's no reason to pay full price.

  4. Trim your inputs. Every token you send costs money. Remove unnecessary conversation history, compress system prompts, and avoid sending entire documents when a relevant excerpt will do.

  5. Set max output tokens. Use the max_tokens parameter to cap response length. This prevents the model from generating unnecessarily long responses and keeps costs predictable.

  6. Monitor usage in the Console. Anthropic's usage dashboard shows token consumption by model, so you can spot unexpected spikes before they become expensive surprises.

The free alternative: Puter.js

If you're a developer building an app that uses Claude, there's a way to skip all of the above — no API keys, no billing setup, no rate limit management, and no cost to you.

Puter.js is a JavaScript SDK that gives you access to Claude and 500+ other AI models directly from your frontend code. It uses a "User-Pays" model: each user of your app covers their own AI usage through their Puter account. You, the developer, pay nothing.

Here's what that means in practice:

Claude API (Direct) Puter.js
Cost to developer Pay per token Free
API key required Yes No
Billing setup Credit card required None
Rate limits Per-organization tiers Per-user (handled by Puter)
Backend required Yes (to protect your key) No
Models available Claude only Claude + GPT + Gemini + 500 more

Try it now

Add one script tag to your HTML and start using Claude immediately:

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        puter.ai.chat("Explain quantum computing in simple terms", {
            model: "claude-sonnet-4-6"
        }).then(response => {
            document.body.innerHTML = response.message.content[0].text;
        });
    </script>
</body>
</html>

No API key. No backend. No billing. You can also use Claude Opus 4.6, Claude Haiku 4.5, and every other Claude model the same way.

You can also stream responses for a better user experience:

<html>
<body>
    <div id="output"></div>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        async function streamResponse() {
            const response = await puter.ai.chat("Write a short poem about coding", {
                model: "claude-sonnet-4-6",
                stream: true
            });
            const output = document.getElementById('output');
            for await (const chunk of response) {
                if (chunk?.text) {
                    output.textContent += chunk.text;
                }
            }
        }
        streamResponse();
    </script>
</body>
</html>

Why developers choose Puter.js over direct API access

  • $0 infrastructure cost: Your users pay for their own usage, so your app costs nothing to run regardless of scale
  • No API key management: No keys to rotate, no secrets to protect, no backend needed to hide them
  • No rate limit headaches: Each user has their own limits, so one user's traffic never blocks another's
  • Access every AI provider: Switch between Claude, GPT, Gemini, Grok, DeepSeek, and more with one line of code — no separate accounts or billing for each
  • Ship faster: Go from idea to production in minutes, not days of billing setup and backend configuration


Free, Serverless AI and Cloud

Start creating powerful web applications with Puter.js in seconds!

Get Started Now

Read the Docs Try the Playground