On this page

How much does the OpenAI API cost?How OpenAI API pricing works What makes your bill higher than expected How to reduce OpenAI API costs Can you use the OpenAI API for free?Real-world cost examples Complete OpenAI API pricing table Conclusion Related

OpenAI API Pricing: Full Breakdown of Costs (Jun 2026)

Nariman Jelveh, Reynaldi Chernando

Updated: June 12, 2026

On this page

In this guide, you'll learn what every current OpenAI model costs, how the billing works (including the things that can make your bill higher than expected), the most effective ways to cut costs, and the options for using OpenAI models for free.

How much does the OpenAI API cost?

OpenAI's flagship model, GPT-5.5, costs $5.00 per 1 million input tokens and $30.00 per 1 million output tokens. The cheapest model in the current lineup, GPT-5.4 nano, costs $0.20 per 1 million input tokens and $1.25 per 1 million output tokens, roughly 25 times cheaper than the flagship.

Here's the quick view of the models most people ask about:

Model	Best for	Input (per 1M tokens)	Output (per 1M tokens)
GPT-5.5	Best overall: complex coding and professional work	$5.00	$30.00
GPT-5.5 Pro	Maximum reasoning quality, price is secondary	$30.00	$180.00
GPT-5.4	Balanced production workloads	$2.50	$15.00
GPT-5.4 mini	Fast, cheap, still capable; great for subagents	$0.75	$4.50
GPT-5.4 nano	High-volume simple tasks: routing, extraction, classification	$0.20	$1.25
GPT-5.3 Codex	Specialized coding and agentic dev work	$1.75	$14.00

All prices in this guide come from OpenAI's official pricing and model documentation. Next, let's look at how the pricing works.

How OpenAI API pricing works

The OpenAI API uses pay-as-you-go, per-token billing: you're charged for the tokens you send to the model (input) and the tokens it generates (output). There are no subscriptions or seat fees for the API itself. Note that API usage is billed separately from ChatGPT Plus or Business subscriptions, which don't include any API credit.

What's a token?

A token is a chunk of text, usually a word, part of a word, or a punctuation mark. As a rule of thumb, 1,000 tokens is about 750 English words, so 1 million tokens is roughly the length of 10 novels. When you see "$5.00 per 1M input tokens," that means sending 10 novels' worth of text to GPT-5.5 costs five dollars.

Input vs. output tokens

Output tokens cost significantly more than input tokens: 6x more for GPT-5.5 ($5 in, $30 out). This is true across the lineup, and it means token-heavy generation (long articles, verbose code) costs more than token-heavy reading (analyzing documents). When estimating costs, always calculate input and output separately.

Reasoning tokens are billed as output

This is the most common reason a bill comes in higher than expected: on reasoning-enabled models like GPT-5.5, the model generates internal "thinking" tokens before producing its answer, and those reasoning tokens are billed at the output rate even though you never see them.

A 200-word response might use several thousand reasoning tokens that you don't see. For complex prompts, the reasoning tokens can cost more than the answer itself. You can control this with the reasoning effort setting: lower effort means fewer reasoning tokens and a cheaper, faster response; higher effort means better answers on hard problems at a higher cost. For simple tasks, turning reasoning effort down (or using a non-reasoning model) is an immediate saving.

What makes your bill higher than expected

Beyond the base token rates, four things commonly push costs above estimates.

Long context costs more per token

OpenAI's published rates apply to requests under roughly 270K tokens of context. Go above that, and the entire request moves to a long-context price schedule: GPT-5.5 jumps from $5.00 to $10.00 per 1M input tokens (2x) and from $30.00 to $45.00 per 1M output tokens (1.5x). If you put entire codebases or books into a single request, you pay double rates on input.

Image input is billed as tokens

Sending an image to a text model isn't free: the image is converted into tokens and billed at that model's normal token rates (a typical low-resolution image is a few hundred tokens). A request with several images can cost noticeably more than a text-only one.

Built-in tools have their own fees

Using OpenAI's hosted tools adds charges on top of token costs:

Web search: $10.00 per 1,000 calls, plus the retrieved search content is billed as input tokens at your model's rate.
Code execution (containers): $0.03 per GB per 20-minute session, so a 64GB container runs $1.92 per session.
File search: $2.50 per 1,000 tool calls, plus $0.10 per GB per day of vector storage (the first GB is free).

A single "simple" request that triggers web search and a code run can cost several times more than the same request without tools. The function calls you define yourself (your own tools) carry no extra fee beyond the tokens they add to the conversation.

How to reduce OpenAI API costs

The methods below are ordered by impact, with the most effective first.

1. Pick the right model for the job

Model choice is the biggest cost factor in the API: the gap between GPT-5.5 and GPT-5.4 nano is 25x on input and 24x on output. No caching strategy or batch discount comes close to that.

A simple decision guide:

Routing, extraction, classification, formatting → GPT-5.4 nano. These tasks don't need frontier intelligence, and nano handles them at $0.20/$1.25.
Production chat, summarization, agent subtasks → GPT-5.4 mini or GPT-5.4. Strong quality at a fraction of flagship cost.
Complex reasoning, hard coding problems, high-stakes output → GPT-5.5. Use flagship rates only where flagship quality is needed.
The hardest problems where quality matters most → GPT-5.5 Pro, at $30/$180.

Most production systems route 70–90% of traffic to a small model and escalate only the hard cases. If you do nothing else from this list, do this.

2. Use prompt caching for repeated input

Cached input tokens cost 90% less than fresh ones: GPT-5.5 input drops from $5.00 to $0.50 per 1M tokens when cached. Prompt caching kicks in automatically when the beginning of your prompt repeats across requests, which is exactly what happens with system prompts, few-shot examples, and shared documents. To benefit, put the static parts of your prompt first and the variable parts (the user's message) last. For chatbots with long system prompts, this alone routinely cuts input costs by more than half.

3. Use the Batch API for anything that can wait

The Batch API gives a flat 50% discount on both input and output tokens in exchange for asynchronous processing: you submit a file of requests and get results back within 24 hours. It's ideal for data pipelines, evaluations, bulk summarization, and content generation jobs where nobody is waiting on the response.

4. Use Flex processing for slow-but-synchronous work

Flex processing is the lesser-known middle option: it prices tokens at Batch rates (the same 50% off) but works through the normal API: your requests just get slower responses and may occasionally hit resource unavailability. You enable it by setting service_tier: "flex" on a request. It's a great fit for background jobs, evals, and internal tools where you want a normal request/response flow but don't need speed.

5. Limit output and reasoning effort

Since output tokens cost 6x input tokens, controlling generation length pays off directly: set a max output limit, ask for concise answers or structured output instead of prose, and, on reasoning models, dial reasoning effort down for tasks that don't need it. Verbose responses are slower and cost more.

One tier goes the other direction: Priority processing charges roughly double the Standard rate ($12.50/$75.00 for GPT-5.5) in exchange for faster, more consistent latency. It's for user-facing applications where speed is worth paying for, the opposite trade-off from the options above.

Can you use the OpenAI API for free?

There's no permanent free tier on the OpenAI API itself, but there are several legitimate ways to use OpenAI models without paying, and one of them scales indefinitely.

Puter.js: the User-Pays model

Puter.js is a JavaScript library that lets you add OpenAI models (including GPT-5.5, GPT-5.4, and GPT-5.3 Codex) to your app with no API key, no backend, and no bill to you as the developer. It works on the User-Pays model: each user of your app covers their own AI usage through their Puter account, so your costs stay at zero no matter how many users you have.

<html>
<body>
  <script src="https://js.puter.com/v2/"></script>
  <script>
    puter.ai.chat("Explain quantum computing in simple terms", {
      model: "openai/gpt-5.4-nano"
    }).then(response => {
      document.body.innerHTML = response.message.content;
    });
  </script>
</body>
</html>

To see what this saves, we ran the numbers on a modest app: 500 monthly users sending 30 messages each, averaging 1,000 input and 300 output tokens per message. That's 15M input and 4.5M output tokens a month. On GPT-5.4 through the OpenAI API, our calculation puts that at about $37.50 for input and $67.50 for output, roughly $105 every month, growing linearly with your user base. Through Puter.js, the same app costs you $0 at 500 users, and still $0 at 50,000 users, because each user carries their own usage.

Free trial credits

OpenAI has at times offered small trial credits to new accounts. Availability changes, so check your billing page after signing up rather than counting on it. The API otherwise runs on prepaid credits, so nothing happens until you add funds.

OpenRouter's free endpoints

OpenRouter offers free variants of many models (the ones tagged :free), which is a quick way to test an integration without spending anything. Know the limits going in: free usage is capped at 50 requests per day and 20 per minute, rising to 1,000 per day once you've purchased at least $10 in credits. It's built for testing and prototyping, not continuous production use. Popular free models also get rate-limited by upstream providers at peak times, and failed requests still count against your daily quota.

Free ChatGPT Pro and Codex for open-source maintainers

If you maintain a widely used open-source project, OpenAI's Codex for Open Source program offers six months of free ChatGPT Pro with Codex, plus API credits, and selective access to Codex Security for eligible repositories. You apply through OpenAI's program page with your GitHub account; applications are reviewed case by case based on your project's reach and role in the ecosystem.

Startup credits

OpenAI and its partners periodically run startup programs that include API credits, typically through accelerators and cloud-partner programs. Eligibility and amounts change frequently, so if you're running a funded startup, check OpenAI's current startup offerings and your accelerator's perks directly.

Real-world cost examples

Per-million-token prices are hard to apply directly, so here's what we calculated for a few real workloads. We used the same method each time: estimate tokens per request, multiply by volume, then by the per-million rate.

Customer support chatbot. We modeled 1,000 conversations a month, averaging 8 messages each, with about 1,200 input tokens (system prompt plus history) and 250 output tokens per message. That's 9.6M input and 2M output tokens monthly, which we priced as:

Model	Monthly cost
GPT-5.4 nano	~$4.40
GPT-5.4 mini	~$16
GPT-5.4	~$54
GPT-5.5	~$108

Same workload, 25x cost difference, which is why model choice comes first. Add prompt caching (most of that 1,200-token input is a repeated system prompt) and the mini figure drops to around $11.

Summarizing 100 PDFs. At ~20,000 tokens per document with 500-token summaries, that's 2M input and 50K output tokens. We calculate about $5.75 on GPT-5.4, or about $2.90 through the Batch API. Document processing is relatively cheap.

Daily content generation. For 30 articles a month on GPT-5.5, with 2,000-token prompts and roughly 4,000 output tokens each (including reasoning tokens), we estimate about $3.90 a month. Low-volume generation work like this is cheap; high-volume, always-on traffic is what builds large bills.

Complete OpenAI API pricing table

All prices are per 1M tokens, Standard tier, short context (under ~270K tokens).

Text models

Model	Input	Cached input	Output
GPT-5.5	$5.00	$0.50	$30.00
GPT-5.5 Pro	$30.00	—	$180.00
GPT-5.4	$2.50	$0.25	$15.00
GPT-5.4 mini	$0.75	$0.075	$4.50
GPT-5.4 nano	$0.20	$0.02	$1.25
GPT-5.4 Pro	$30.00	—	$180.00
GPT-5.3 Codex	$1.75	$0.175	$14.00
ChatGPT (chat-latest)	$5.00	$0.50	$30.00

Long-context rates (requests above ~270K tokens): GPT-5.5 moves to $10.00 / $1.00 / $45.00, GPT-5.4 to $5.00 / $0.50 / $22.50, and the Pro models to $60.00 input / $270.00 output.

Service tier multipliers

Tier	Price vs. Standard	Trade-off
Batch	50% off	Asynchronous, results within 24 hours
Flex	50% off	Synchronous but slower, occasional unavailability
Standard	Baseline	Default
Priority	~2x (2.5x on GPT-5.5)	Faster, more consistent latency

Built-in tools

Tool	Price
Web search	$10.00 / 1K calls + content tokens at model rates
Containers (code execution)	$0.03 per GB per 20-minute session
File search	$2.50 / 1K calls + $0.10 / GB / day storage (first GB free)
Moderation API	Free

For models not listed here (legacy GPT-4-series, deep research, embeddings variants, and every dated snapshot), see OpenAI's full pricing documentation.

Conclusion

OpenAI API pricing in 2026 runs from $0.20 per million input tokens on GPT-5.4 nano to $30.00 on the Pro models, with the flagship GPT-5.5 at $5.00 input and $30.00 output.

Keeping the bill predictable comes down to a few deliberate choices:

Pick the smallest model that does the job.
Cache your repeated prompts.
Push non-urgent work to Batch or Flex for 50% off.
Watch the costs that don't show up in the headline rates: reasoning tokens, long context, and tool calls.

Prices here were against OpenAI's official pricing pages. OpenAI updates pricing frequently, so always confirm current rates before committing to a budget.

Free, Serverless AI and Cloud

Start creating powerful web applications with Puter.js in seconds!

Get Started Now

Read the Docs • Try the Playground