Claude API Pricing
On this page
This guide breaks down everything you need to know about Claude API pricing — every model, every tier, and every discount. Whether you're budgeting for a side project or planning enterprise-scale usage, you'll find the exact numbers here.
At the end, we'll also show you how to access Claude models for free using Puter.js: no API keys, no billing setup, no cost to you as a developer. Puter is the pioneer of the "User-Pays" model, which allows developers to incorporate AI capabilities into their applications while each user will cover their own usage costs.
How Claude API pricing works
Anthropic charges based on tokens — the pieces of text the model reads and generates. As a rough estimate, 1 token is approximately 4 characters or 0.75 words in English. You're billed separately for:
- Input tokens: the text you send to the model (your prompt, system instructions, conversation history)
- Output tokens: the text the model generates in response
All prices below are per million tokens (MTok) in USD.
Model pricing
Anthropic offers three tiers of Claude models, each with different price-to-performance tradeoffs:
Current models
| Model | Input | Output | Context Window | Max Output |
|---|---|---|---|---|
| Claude Opus 4.6 | $5 / MTok | $25 / MTok | 200K (1M beta) | 128K tokens |
| Claude Sonnet 4.6 | $3 / MTok | $15 / MTok | 200K (1M beta) | 64K tokens |
| Claude Haiku 4.5 | $1 / MTok | $5 / MTok | 200K | 64K tokens |
Legacy models (still available)
| Model | Input | Output | Context Window | Max Output |
|---|---|---|---|---|
| Claude Opus 4.5 | $5 / MTok | $25 / MTok | 200K | 64K tokens |
| Claude Opus 4.1 | $15 / MTok | $75 / MTok | 200K | 32K tokens |
| Claude Opus 4 | $15 / MTok | $75 / MTok | 200K | 32K tokens |
| Claude Sonnet 4.5 | $3 / MTok | $15 / MTok | 200K (1M beta) | 64K tokens |
| Claude Sonnet 4 | $3 / MTok | $15 / MTok | 200K (1M beta) | 64K tokens |
| Claude Haiku 3.5 | $0.80 / MTok | $4 / MTok | 200K | — |
| Claude Haiku 3 | $0.25 / MTok | $1.25 / MTok | 200K | 4K tokens |
Which model should you choose?
- Claude Opus 4.6 — Best for complex reasoning, research, multi-step analysis, and code generation. Worth the premium when accuracy matters more than speed.
- Claude Sonnet 4.6 — The sweet spot for most applications. Strong reasoning at a moderate price. Best default choice for chatbots, content generation, and general-purpose tasks.
- Claude Haiku 4.5 — Fastest and cheapest. Ideal for high-volume, low-complexity tasks like classification, entity extraction, short summaries, and routing decisions.
For most developers starting out, Sonnet 4.6 is the best value. It's 40% cheaper than Opus on input tokens and 40% cheaper on output tokens, while still being highly capable.
What does this cost in practice?
To give you a sense of real-world costs with Claude Sonnet 4.6 ($3 input / $15 output per MTok):
| Use Case | Approx. Tokens | Estimated Cost |
|---|---|---|
| Single chat message (500 in / 500 out) | 1,000 | $0.009 |
| Summarize a 10-page document | ~5,000 in / 500 out | $0.02 |
| Analyze a 50-page PDF | ~25,000 in / 2,000 out | $0.11 |
| Process 1,000 customer support tickets | ~3.7M total | $37.00 |
| 10,000 short API calls / day (30 days) | ~300M/month | ~$2,700/month |
Costs scale linearly with usage. For high-volume applications, this adds up fast.
Claude vs GPT vs Gemini: price comparison
How does Claude stack up against competing models at similar capability tiers?
| Model | Input | Output | Context Window |
|---|---|---|---|
| Claude Sonnet 4.6 | $3 / MTok | $15 / MTok | 200K |
| GPT-4o | $2.50 / MTok | $10 / MTok | 128K |
| Gemini 2.5 Pro | $1.25–$2.50 / MTok | $10–$15 / MTok | 1M |
| Claude Haiku 4.5 | $1 / MTok | $5 / MTok | 200K |
| GPT-4o mini | $0.15 / MTok | $0.60 / MTok | 128K |
| Gemini 2.5 Flash | $0.15–$0.30 / MTok | $0.60–$3.50 / MTok | 1M |
| Claude Opus 4.6 | $5 / MTok | $25 / MTok | 200K |
| GPT-o3 | $10 / MTok | $40 / MTok | 200K |
Claude's pricing is competitive with OpenAI and Google, though the best value depends on your specific use case. Claude tends to excel at nuanced writing, following complex instructions, and code generation.
Batch API pricing (50% off)
If your workload doesn't need real-time responses, the Batch API processes requests asynchronously at half the standard price:
| Model | Batch Input | Batch Output |
|---|---|---|
| Claude Opus 4.6 | $2.50 / MTok | $12.50 / MTok |
| Claude Sonnet 4.6 | $1.50 / MTok | $7.50 / MTok |
| Claude Haiku 4.5 | $0.50 / MTok | $2.50 / MTok |
The Batch API is ideal for bulk processing tasks like document analysis, data extraction, or content moderation where you can tolerate some latency.
Prompt caching pricing
Prompt caching reduces costs by reusing previously processed parts of your prompt (like system instructions or large documents) across API calls. Instead of re-reading the same content every time, Claude reads it from cache at a fraction of the price.
| Cache Operation | Price Multiplier | Duration |
|---|---|---|
| 5-minute cache write | 1.25x base input price | 5 minutes |
| 1-hour cache write | 2x base input price | 1 hour |
| Cache read (hit) | 0.1x base input price | Same as write |
A cache hit costs just 10% of the standard input price. For example, with Claude Sonnet 4.6:
- Standard input: $3 / MTok
- Cache hit: $0.30 / MTok
If you have a large system prompt that you send with every request, caching pays for itself after just one reuse (for 5-minute cache) or two reuses (for 1-hour cache).
Long context pricing
Claude Opus 4.6, Sonnet 4.6, Sonnet 4.5, and Sonnet 4 support a 1M token context window (beta). When your input exceeds 200K tokens, premium pricing kicks in:
| Model | Standard (≤ 200K input) | Long Context (> 200K input) |
|---|---|---|
| Claude Opus 4.6 | $5 in / $25 out | $10 in / $37.50 out |
| Claude Sonnet 4.6 / 4.5 / 4 | $3 in / $15 out | $6 in / $22.50 out |
The threshold is based on input tokens only. Once you cross 200K input tokens, all tokens in that request are charged at the premium rate.
Fast mode pricing
Claude Opus 4.6 offers a fast mode (research preview) with significantly faster output at 6x standard rates:
| Input | Output | |
|---|---|---|
| Fast mode | $30 / MTok | $150 / MTok |
Other costs to consider
Tool use
Tool definitions, tool calls, and tool results all count as tokens and are priced at standard rates. Each tool-enabled request adds ~346 tokens of system prompt overhead.
Web search
$10 per 1,000 searches, plus standard token costs for search results.
Data residency
Specifying US-only inference adds a 1.1x multiplier on all token pricing.
Rate limits
Anthropic uses a tiered rate limit system. Higher tiers unlock more capacity:
| Tier | How to Qualify |
|---|---|
| Tier 1 | Entry-level (new accounts) |
| Tier 2 | Growing applications |
| Tier 3 | Established applications |
| Tier 4 | High-volume (1M context beta access) |
| Enterprise | Custom limits via sales |
Rate limits are per-organization and apply to both requests per minute and tokens per minute. If you hit a rate limit, you'll get a 429 error and need to wait or upgrade your tier.
Billing and payment
For developers using the Claude API directly, the billing and payment is as follows:
- Billed monthly based on actual usage
- Payments in USD
- Credit card required to start (no free tier for sustained use)
- New accounts receive a small amount of free credits for testing
- Enterprise customers can arrange invoicing
- Usage tracking available in the Claude Platform
Tips to reduce your Claude API costs
If you're using the Claude API directly, here are practical ways to keep your bill down:
Start with Haiku, upgrade as needed. Many tasks (classification, extraction, simple Q&A) don't need Opus-level reasoning. Use Haiku 4.5 at $1/$5 per MTok and only move up when quality demands it.
Use prompt caching aggressively. If you're sending the same system prompt or reference documents with every request, enable caching. A 5-minute cache write costs 1.25x once, but every subsequent hit costs just 0.1x — a 90% discount on cached tokens.
Batch non-urgent work. Document processing, data extraction, content moderation, and analytics can all use the Batch API at 50% off. If you don't need results in real-time, there's no reason to pay full price.
Trim your inputs. Every token you send costs money. Remove unnecessary conversation history, compress system prompts, and avoid sending entire documents when a relevant excerpt will do.
Set max output tokens. Use the
max_tokensparameter to cap response length. This prevents the model from generating unnecessarily long responses and keeps costs predictable.Monitor usage in the Console. Anthropic's usage dashboard shows token consumption by model, so you can spot unexpected spikes before they become expensive surprises.
The free alternative: Puter.js
If you're a developer building an app that uses Claude, there's a way to skip all of the above — no API keys, no billing setup, no rate limit management, and no cost to you.
Puter.js is a JavaScript SDK that gives you access to Claude and 500+ other AI models directly from your frontend code. It uses a "User-Pays" model: each user of your app covers their own AI usage through their Puter account. You, the developer, pay nothing.
Here's what that means in practice:
| Claude API (Direct) | Puter.js | |
|---|---|---|
| Cost to developer | Pay per token | Free |
| API key required | Yes | No |
| Billing setup | Credit card required | None |
| Rate limits | Per-organization tiers | Per-user (handled by Puter) |
| Backend required | Yes (to protect your key) | No |
| Models available | Claude only | Claude + GPT + Gemini + 500 more |
Try it now
Add one script tag to your HTML and start using Claude immediately:
<html>
<body>
<script src="https://js.puter.com/v2/"></script>
<script>
puter.ai.chat("Explain quantum computing in simple terms", {
model: "claude-sonnet-4-6"
}).then(response => {
document.body.innerHTML = response.message.content[0].text;
});
</script>
</body>
</html>
No API key. No backend. No billing. You can also use Claude Opus 4.6, Claude Haiku 4.5, and every other Claude model the same way.
You can also stream responses for a better user experience:
<html>
<body>
<div id="output"></div>
<script src="https://js.puter.com/v2/"></script>
<script>
async function streamResponse() {
const response = await puter.ai.chat("Write a short poem about coding", {
model: "claude-sonnet-4-6",
stream: true
});
const output = document.getElementById('output');
for await (const chunk of response) {
if (chunk?.text) {
output.textContent += chunk.text;
}
}
}
streamResponse();
</script>
</body>
</html>
Why developers choose Puter.js over direct API access
- $0 infrastructure cost: Your users pay for their own usage, so your app costs nothing to run regardless of scale
- No API key management: No keys to rotate, no secrets to protect, no backend needed to hide them
- No rate limit headaches: Each user has their own limits, so one user's traffic never blocks another's
- Access every AI provider: Switch between Claude, GPT, Gemini, Grok, DeepSeek, and more with one line of code — no separate accounts or billing for each
- Ship faster: Go from idea to production in minutes, not days of billing setup and backend configuration
Related
Free, Serverless AI and Cloud
Start creating powerful web applications with Puter.js in seconds!
Get Started Now