On this page

How much does the Claude API cost?How Claude pricing works What raises your bill How to reduce Claude API costs Can you use the Claude API for free?Real-world cost examples Complete Claude API pricing table Conclusion Related

Anthropic Claude API Pricing: Full Breakdown of Costs (Jul 2026)

Nariman Jelveh, Reynaldi Chernando

Updated: July 27, 2026

On this page

In this article, you'll learn what every current Claude model costs, how the billing works, which features change your bill, the ways to reduce costs, and the options for using Claude models without paying.

How much does the Claude API cost?

Anthropic's most capable model, Claude Fable 5, costs $10 per 1 million input tokens and $50 per 1 million output tokens. Claude Opus 5, the current flagship, costs $5 input and $25 output, Claude Sonnet 5 costs $3 input and $15 output, and Claude Haiku 4.5 costs $1 input and $5 output, all per 1 million tokens.

Model	Position	Input (per 1M tokens)	Output (per 1M tokens)
Claude Fable 5	Most capable model	$10.00	$50.00
Claude Opus 5	Flagship for complex reasoning and coding	$5.00	$25.00
Claude Opus 4.8	Previous Opus flagship	$5.00	$25.00
Claude Sonnet 5	Production default	$3.00	$15.00
Claude Haiku 4.5	High-volume, low-cost tasks	$1.00	$5.00

Claude Opus 5, released July 24, 2026 as the successor to Opus 4.8, keeps the same $5/$25 rates as its predecessor. Claude Sonnet 5, released June 30, 2026 as the successor to Sonnet 4.6, has introductory pricing of $2 input and $10 output per 1 million tokens through August 31, 2026, after which it reverts to the standard $3/$15. The calculations in this guide use the standard rates.

All prices in this guide come from Anthropic's official pricing documentation.

Claude.ai subscriptions (Pro, Max, Team, Enterprise) are consumer and team plans for the chat interface and Claude Code. They do not include API access, and API usage is billed separately through the Claude Console on prepaid credits.

How Claude pricing works

The API bills per token, pay as you go. Input tokens cover everything you send: prompts, system instructions, conversation history, documents, and tool definitions. Output tokens cover everything the model generates. A token is roughly 4 characters or 0.75 English words.

Thinking tokens are billed as output

Claude models can use extended thinking, where the model generates internal reasoning tokens before the final answer. These thinking tokens are billed at the output rate. A response that displays 200 words can bill for several thousand tokens if the model reasoned at length first. Opus 4.8 added effort controls that set how much reasoning the model spends per request, which gives you direct control over this cost.

Prompt caching has a write cost, not just a read discount

Cache reads cost 0.1x the base input rate (90% off). Writing content to the cache costs extra: 1.25x base input for a 5-minute cache, or 2x for a 1-hour cache. On Sonnet 5, that means $3.75 or $6.00 per million tokens to write, then $0.30 per million to read.

The arithmetic: a 5-minute cache write pays for itself after one cache read, and a 1-hour write pays for itself after two reads. Content that gets cached but never read again costs more than not caching it. Cache prompts that repeat across requests; don't add cache_control to one-off content.

The tokenizer changed in Opus 4.7

Opus 4.7 and later models (including Opus 4.8, Opus 5, and Fable 5) use a new tokenizer that can produce up to 35% more tokens for the same text compared to earlier models. The per-token rate is unchanged, but the same prompt can bill for more tokens after a model upgrade. If you migrated from Opus 4.6 or earlier and your bills rose without a usage change, this is a likely cause. Recalculate token estimates after any model migration rather than reusing old counts.

What raises your bill

Fast mode

Fast mode provides faster output for Opus models at higher rates: $10/$50 per million tokens on Opus 5 and Opus 4.8, and $30/$150 on Opus 4.6 and 4.7. It applies across the full context window and cannot be combined with the Batch API. Use it for latency-sensitive workloads; leave it off otherwise.

Tool use overhead

Enabling tools adds a fixed system prompt to every request: 290 tokens on Opus 4.8 (410 with forced tool choice), and roughly 500-590 tokens on Sonnet 4.6 and Haiku 4.5. Tool definitions, tool calls, and tool results all bill as tokens on top of that. Specific tools add their own input tokens per request: the bash tool adds 245, the text editor adds 700, computer use adds 735 plus screenshot image tokens.

Server-side tool fees

Web search costs $10 per 1,000 searches, plus the retrieved content bills as input tokens in that turn and every later turn of the conversation. Failed searches are not billed. Web fetch has no per-call fee; you pay token costs for the fetched content only, and the max_content_tokens parameter caps how much a single fetch can add. Code execution is free when used together with web search or web fetch. Used alone, it bills by container time: each organization gets 1,550 free hours per month, then $0.05 per hour per container, with a 5-minute minimum per execution. Requests that attach files bill execution time even if the tool is never invoked, because files are preloaded onto the container.

US-only data residency

Setting inference_geo to "us" applies a 1.1x multiplier to all token categories on Opus 4.6, Sonnet 4.6, and later models. The default global routing uses standard pricing.

Long context does not cost extra

Fable 5, Opus 4.6 through Opus 5, Sonnet 4.6, and Sonnet 5 include the full 1M-token context window at standard per-token rates. There is no separate long-context price schedule. A 900K-token request bills at the same rate per token as a 9K-token request. The cost control at long context is caching: a full 1M-token prompt costs $10 in input alone on Fable 5, or $1 per subsequent request as a cache read.

How to reduce Claude API costs

In order of impact:

1. Route tasks to the right model

The lineup spans 10x on input ($1 to $10) and 10x on output ($5 to $50). Use Haiku 4.5 for classification, extraction, routing, and other high-volume simple tasks. Use Sonnet 5 as the production default. Reserve Opus 5 and Fable 5 for the tasks that need them. Routing the bulk of traffic to Haiku and escalating hard cases reduces spend more than any other single change.

2. Cache repeated prompt content

Cache reads cost 10% of standard input. Put static content (system prompt, examples, shared documents) at the start of the prompt and variable content at the end. Use the 5-minute cache for active conversations and the 1-hour cache for content reused across longer gaps, accounting for the 1.25x and 2x write costs. Caching stacks with the Batch API discount and the data residency multiplier.

3. Use the Batch API for non-urgent work

The Batch API processes requests asynchronously, typically within 24 hours, at 50% off both input and output. Haiku 4.5 drops to $0.50/$2.50 per million tokens, Sonnet 5 to $1.50/$7.50, Opus 5 to $2.50/$12.50, and Fable 5 to $5/$25. Combined with caching, batch workloads with repeated context can run at a small fraction of list price.

4. Control thinking and output length

Set effort levels on Opus 5 and Opus 4.8, limit extended thinking budgets, and cap max output tokens. At a 1:5 price ratio, output tokens are the expensive side, and thinking tokens count as output.

5. For individual coding use, compare against a subscription

Claude Pro and Max subscriptions include Claude Code usage within plan limits. For a single developer's coding workload, a flat-rate subscription can cost less than the equivalent API billing. The API remains the right choice for applications, pipelines, and anything programmatic. If your only use is interactive coding assistance, price both options before defaulting to the API.

Can you use the Claude API for free?

Anthropic does not offer a permanent free API tier. There is a one-time credit for new accounts, two Anthropic programs that grant free access, and one method that removes the developer's bill entirely.

Puter.js: the User-Pays model

Puter.js is a JavaScript library that lets you add Claude models to your app with no API key, no backend, and no bill to you as the developer. It uses the User-Pays model: each user of your app covers their own AI usage through their Puter account, so your cost is zero at any number of users.

<html>
<body>
  <script src="https://js.puter.com/v2/"></script>
  <script>
    puter.ai.chat(
      "Explain quantum computing in simple terms.",
      { model: "claude-sonnet-5" }
    ).then(response => {
      document.body.innerHTML = response.message.content;
    });
  </script>
</body>
</html>

We calculated the numbers for a working app: 500 monthly users sending 30 messages each, at 1,000 input and 300 output tokens per message, consume 15M input and 4.5M output tokens per month. On Sonnet 5 through the Anthropic API, that is $45.00 for input and $67.50 for output, $112.50 per month, scaling linearly with users. Through Puter.js the same app costs the developer $0 at 500 users and $0 at 50,000 users.

One-time signup credit

New API accounts receive a small one-time credit to test the API. The amount is subject to change, and once it is spent there is no recurring free allowance. Check the billing page in the Claude Console after signup.

Anthropic for Startups

Anthropic's startup program provides free API credits, priority rate limits, and founder resources. Per the program page, eligibility for credits requires equity funding from an institutional investor, a founding date within the last four years, and no previously received Anthropic startup credits. Startups backed by Anthropic's VC partners may qualify for additional benefits. Apply at claude.com/programs/startups.

Claude for Open Source

Anthropic runs a program for open-source maintainers that grants free Claude access for qualifying projects. Eligibility is based on project reach, and the grant is structured as subscription access rather than raw API credits, so check the current program terms against your use case before applying.

A related note: Claude models are also available through AWS Bedrock and Google Vertex AI, so startup credits from those cloud programs can pay for Claude usage even though they are not Anthropic credits.

Real-world cost examples

For each example below, we calculated cost as tokens divided by 1 million, times the rate, separately for input and output.

Customer support chatbot. We modeled 1,000 conversations per month, 8 messages each, at 1,200 input tokens (system prompt plus history) and 250 output tokens per message, which works out to 9.6M input and 2M output tokens monthly.

Model	Monthly cost
Haiku 4.5	$19.60
Sonnet 5	$58.80
Opus 5	$98.00
Fable 5	$196.00

With caching on Sonnet 5, where we estimate 70% of input tokens are cache reads of the repeated system prompt and history: 6.72M cache reads at $0.30 = $2.02, plus 2.88M standard input at $3.00 = $8.64, plus $30.00 output, for about $41 per month. The first request in each cache window also pays the 1.25x write rate on the cached portion, which adds a small amount on top.

Summarizing 100 PDFs. At 20,000 tokens per document and 500-token summaries: 2M input, 50K output. Our calculation puts this at $6.75 on Sonnet 5, $2.25 on Haiku 4.5, and $1.13 on Haiku through the Batch API.

Monthly content generation. 30 articles with 2,000-token prompts and 4,000 output tokens each, thinking included: 60K input and 120K output. By our calculation, that is about $1.98 per month on Sonnet 5.

Complete Claude API pricing table

All prices in USD per 1 million tokens, standard API rates.

Current models

Model	Input	5m cache write	1h cache write	Cache read	Output
Claude Fable 5	$10.00	$12.50	$20.00	$1.00	$50.00
Claude Opus 5	$5.00	$6.25	$10.00	$0.50	$25.00
Claude Opus 4.8	$5.00	$6.25	$10.00	$0.50	$25.00
Claude Opus 4.7	$5.00	$6.25	$10.00	$0.50	$25.00
Claude Opus 4.6	$5.00	$6.25	$10.00	$0.50	$25.00
Claude Sonnet 5	$3.00	$3.75	$6.00	$0.30	$15.00
Claude Sonnet 4.6	$3.00	$3.75	$6.00	$0.30	$15.00
Claude Sonnet 4.5	$3.00	$3.75	$6.00	$0.30	$15.00
Claude Haiku 4.5	$1.00	$1.25	$2.00	$0.10	$5.00

Batch API (50% off input and output)

Model	Batch input	Batch output
Claude Fable 5	$5.00	$25.00
Claude Opus 5	$2.50	$12.50
Claude Opus 4.8	$2.50	$12.50
Claude Sonnet 5	$1.50	$7.50
Claude Haiku 4.5	$0.50	$2.50

Fast mode (Opus only, research preview)

Model	Input	Output
Claude Opus 5	$10.00	$50.00
Claude Opus 4.8	$10.00	$50.00
Claude Opus 4.6 / 4.7	$30.00	$150.00

Tools and modifiers

Item	Price
Web search	$10.00 / 1K searches, plus content token costs
Web fetch	No fee; token costs only
Code execution	Free with web search or web fetch; otherwise 1,550 free hours/month, then $0.05 / container-hour (5-minute minimum)
Tool use system prompt	290–590 tokens per request depending on model
US-only inference (inference_geo)	1.1x on all token categories
Long context (up to 1M tokens)	Standard rates, no surcharge

Deprecated models (Opus 4.1 and earlier, Sonnet 4, Haiku 3.5) have their own rates listed in Anthropic's pricing documentation. Claude is also available through AWS Bedrock and Google Vertex AI, where the cloud provider bills you and regional endpoints carry a 10% premium over global endpoints.

Conclusion

Claude API pricing in 2026 runs from $1/$5 per million tokens on Haiku 4.5 to $10/$50 on Fable 5, with a 1:5 input-to-output ratio across the lineup and no long-context surcharge up to 1M tokens. The main cost decisions are model routing, caching content that actually repeats, batching non-urgent work for 50% off, and controlling thinking and output length.

Prices were verified against Anthropic's official pricing documentation. Confirm current rates at claude.com/pricing before committing to a budget.

Ship a Full-Stack App with One Prompt

Give this to your AI Build an AI chat app using Puter.js

Try in

Coding manually? see the guide