OpenAI API Pricing: Full Breakdown of Costs (Jun 2026)
On this page
In this guide, you'll learn what every current OpenAI model costs, how the billing works (including the things that can make your bill higher than expected), the most effective ways to cut costs, and the options for using OpenAI models for free.
How much does the OpenAI API cost?
OpenAI's flagship model, GPT-5.5, costs $5.00 per 1 million input tokens and $30.00 per 1 million output tokens. The cheapest model in the current lineup, GPT-5.4 nano, costs $0.20 per 1 million input tokens and $1.25 per 1 million output tokens, roughly 25 times cheaper than the flagship.
Here's the quick view of the models most people ask about:
| Model | Best for | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|---|
| GPT-5.5 | Best overall: complex coding and professional work | $5.00 | $30.00 |
| GPT-5.5 Pro | Maximum reasoning quality, price is secondary | $30.00 | $180.00 |
| GPT-5.4 | Balanced production workloads | $2.50 | $15.00 |
| GPT-5.4 mini | Fast, cheap, still capable; great for subagents | $0.75 | $4.50 |
| GPT-5.4 nano | High-volume simple tasks: routing, extraction, classification | $0.20 | $1.25 |
| GPT-5.3 Codex | Specialized coding and agentic dev work | $1.75 | $14.00 |
All prices in this guide come from OpenAI's official pricing and model documentation. Next, let's look at how the pricing works.
How OpenAI API pricing works
The OpenAI API uses pay-as-you-go, per-token billing: you're charged for the tokens you send to the model (input) and the tokens it generates (output). There are no subscriptions or seat fees for the API itself. Note that API usage is billed separately from ChatGPT Plus or Business subscriptions, which don't include any API credit.
What's a token?
A token is a chunk of text, usually a word, part of a word, or a punctuation mark. As a rule of thumb, 1,000 tokens is about 750 English words, so 1 million tokens is roughly the length of 10 novels. When you see "$5.00 per 1M input tokens," that means sending 10 novels' worth of text to GPT-5.5 costs five dollars.
Input vs. output tokens
Output tokens cost significantly more than input tokens: 6x more for GPT-5.5 ($5 in, $30 out). This is true across the lineup, and it means token-heavy generation (long articles, verbose code) costs more than token-heavy reading (analyzing documents). When estimating costs, always calculate input and output separately.
Reasoning tokens are billed as output
This is the most common reason a bill comes in higher than expected: on reasoning-enabled models like GPT-5.5, the model generates internal "thinking" tokens before producing its answer, and those reasoning tokens are billed at the output rate even though you never see them.
A 200-word response might use several thousand reasoning tokens that you don't see. For complex prompts, the reasoning tokens can cost more than the answer itself. You can control this with the reasoning effort setting: lower effort means fewer reasoning tokens and a cheaper, faster response; higher effort means better answers on hard problems at a higher cost. For simple tasks, turning reasoning effort down (or using a non-reasoning model) is an immediate saving.
What makes your bill higher than expected
Beyond the base token rates, four things commonly push costs above estimates.
Long context costs more per token
OpenAI's published rates apply to requests under roughly 270K tokens of context. Go above that, and the entire request moves to a long-context price schedule: GPT-5.5 jumps from $5.00 to $10.00 per 1M input tokens (2x) and from $30.00 to $45.00 per 1M output tokens (1.5x). If you put entire codebases or books into a single request, you pay double rates on input.
Image input is billed as tokens
Sending an image to a text model isn't free: the image is converted into tokens and billed at that model's normal token rates (a typical low-resolution image is a few hundred tokens). A request with several images can cost noticeably more than a text-only one.
Built-in tools have their own fees
Using OpenAI's hosted tools adds charges on top of token costs:
- Web search: $10.00 per 1,000 calls, plus the retrieved search content is billed as input tokens at your model's rate.
- Code execution (containers): $0.03 per GB per 20-minute session, so a 64GB container runs $1.92 per session.
- File search: $2.50 per 1,000 tool calls, plus $0.10 per GB per day of vector storage (the first GB is free).
A single "simple" request that triggers web search and a code run can cost several times more than the same request without tools. The function calls you define yourself (your own tools) carry no extra fee beyond the tokens they add to the conversation.
How to reduce OpenAI API costs
The methods below are ordered by impact, with the most effective first.
1. Pick the right model for the job
Model choice is the biggest cost factor in the API: the gap between GPT-5.5 and GPT-5.4 nano is 25x on input and 24x on output. No caching strategy or batch discount comes close to that.
A simple decision guide:
- Routing, extraction, classification, formatting → GPT-5.4 nano. These tasks don't need frontier intelligence, and nano handles them at $0.20/$1.25.
- Production chat, summarization, agent subtasks → GPT-5.4 mini or GPT-5.4. Strong quality at a fraction of flagship cost.
- Complex reasoning, hard coding problems, high-stakes output → GPT-5.5. Use flagship rates only where flagship quality is needed.
- The hardest problems where quality matters most → GPT-5.5 Pro, at $30/$180.
Most production systems route 70–90% of traffic to a small model and escalate only the hard cases. If you do nothing else from this list, do this.
2. Use prompt caching for repeated input
Cached input tokens cost 90% less than fresh ones: GPT-5.5 input drops from $5.00 to $0.50 per 1M tokens when cached. Prompt caching kicks in automatically when the beginning of your prompt repeats across requests, which is exactly what happens with system prompts, few-shot examples, and shared documents. To benefit, put the static parts of your prompt first and the variable parts (the user's message) last. For chatbots with long system prompts, this alone routinely cuts input costs by more than half.
3. Use the Batch API for anything that can wait
The Batch API gives a flat 50% discount on both input and output tokens in exchange for asynchronous processing: you submit a file of requests and get results back within 24 hours. It's ideal for data pipelines, evaluations, bulk summarization, and content generation jobs where nobody is waiting on the response.
4. Use Flex processing for slow-but-synchronous work
Flex processing is the lesser-known middle option: it prices tokens at Batch rates (the same 50% off) but works through the normal API: your requests just get slower responses and may occasionally hit resource unavailability. You enable it by setting service_tier: "flex" on a request. It's a great fit for background jobs, evals, and internal tools where you want a normal request/response flow but don't need speed.
5. Limit output and reasoning effort
Since output tokens cost 6x input tokens, controlling generation length pays off directly: set a max output limit, ask for concise answers or structured output instead of prose, and, on reasoning models, dial reasoning effort down for tasks that don't need it. Verbose responses are slower and cost more.
One tier goes the other direction: Priority processing charges roughly double the Standard rate ($12.50/$75.00 for GPT-5.5) in exchange for faster, more consistent latency. It's for user-facing applications where speed is worth paying for, the opposite trade-off from the options above.
Can you use the OpenAI API for free?
There's no permanent free tier on the OpenAI API itself, but there are several legitimate ways to use OpenAI models without paying, and one of them scales indefinitely.
Puter.js: the User-Pays model
Puter.js is a JavaScript library that lets you add OpenAI models (including GPT-5.5, GPT-5.4, and GPT-5.3 Codex) to your app with no API key, no backend, and no bill to you as the developer. It works on the User-Pays model: each user of your app covers their own AI usage through their Puter account, so your costs stay at zero no matter how many users you have.
<html>
<body>
<script src="https://js.puter.com/v2/"></script>
<script>
puter.ai.chat("Explain quantum computing in simple terms", {
model: "openai/gpt-5.4-nano"
}).then(response => {
document.body.innerHTML = response.message.content;
});
</script>
</body>
</html>
To see what this saves, we ran the numbers on a modest app: 500 monthly users sending 30 messages each, averaging 1,000 input and 300 output tokens per message. That's 15M input and 4.5M output tokens a month. On GPT-5.4 through the OpenAI API, our calculation puts that at about $37.50 for input and $67.50 for output, roughly $105 every month, growing linearly with your user base. Through Puter.js, the same app costs you $0 at 500 users, and still $0 at 50,000 users, because each user carries their own usage.
Free trial credits
OpenAI has at times offered small trial credits to new accounts. Availability changes, so check your billing page after signing up rather than counting on it. The API otherwise runs on prepaid credits, so nothing happens until you add funds.
OpenRouter's free endpoints
OpenRouter offers free variants of many models (the ones tagged :free), which is a quick way to test an integration without spending anything. Know the limits going in: free usage is capped at 50 requests per day and 20 per minute, rising to 1,000 per day once you've purchased at least $10 in credits. It's built for testing and prototyping, not continuous production use. Popular free models also get rate-limited by upstream providers at peak times, and failed requests still count against your daily quota.
Free ChatGPT Pro and Codex for open-source maintainers
If you maintain a widely used open-source project, OpenAI's Codex for Open Source program offers six months of free ChatGPT Pro with Codex, plus API credits, and selective access to Codex Security for eligible repositories. You apply through OpenAI's program page with your GitHub account; applications are reviewed case by case based on your project's reach and role in the ecosystem.
Startup credits
OpenAI and its partners periodically run startup programs that include API credits, typically through accelerators and cloud-partner programs. Eligibility and amounts change frequently, so if you're running a funded startup, check OpenAI's current startup offerings and your accelerator's perks directly.
Real-world cost examples
Per-million-token prices are hard to apply directly, so here's what we calculated for a few real workloads. We used the same method each time: estimate tokens per request, multiply by volume, then by the per-million rate.
Customer support chatbot. We modeled 1,000 conversations a month, averaging 8 messages each, with about 1,200 input tokens (system prompt plus history) and 250 output tokens per message. That's 9.6M input and 2M output tokens monthly, which we priced as:
| Model | Monthly cost |
|---|---|
| GPT-5.4 nano | ~$4.40 |
| GPT-5.4 mini | ~$16 |
| GPT-5.4 | ~$54 |
| GPT-5.5 | ~$108 |
Same workload, 25x cost difference, which is why model choice comes first. Add prompt caching (most of that 1,200-token input is a repeated system prompt) and the mini figure drops to around $11.
Summarizing 100 PDFs. At ~20,000 tokens per document with 500-token summaries, that's 2M input and 50K output tokens. We calculate about $5.75 on GPT-5.4, or about $2.90 through the Batch API. Document processing is relatively cheap.
Daily content generation. For 30 articles a month on GPT-5.5, with 2,000-token prompts and roughly 4,000 output tokens each (including reasoning tokens), we estimate about $3.90 a month. Low-volume generation work like this is cheap; high-volume, always-on traffic is what builds large bills.
Complete OpenAI API pricing table
All prices are per 1M tokens, Standard tier, short context (under ~270K tokens).
Text models
| Model | Input | Cached input | Output |
|---|---|---|---|
| GPT-5.5 | $5.00 | $0.50 | $30.00 |
| GPT-5.5 Pro | $30.00 | — | $180.00 |
| GPT-5.4 | $2.50 | $0.25 | $15.00 |
| GPT-5.4 mini | $0.75 | $0.075 | $4.50 |
| GPT-5.4 nano | $0.20 | $0.02 | $1.25 |
| GPT-5.4 Pro | $30.00 | — | $180.00 |
| GPT-5.3 Codex | $1.75 | $0.175 | $14.00 |
| ChatGPT (chat-latest) | $5.00 | $0.50 | $30.00 |
Long-context rates (requests above ~270K tokens): GPT-5.5 moves to $10.00 / $1.00 / $45.00, GPT-5.4 to $5.00 / $0.50 / $22.50, and the Pro models to $60.00 input / $270.00 output.
Service tier multipliers
| Tier | Price vs. Standard | Trade-off |
|---|---|---|
| Batch | 50% off | Asynchronous, results within 24 hours |
| Flex | 50% off | Synchronous but slower, occasional unavailability |
| Standard | Baseline | Default |
| Priority | ~2x (2.5x on GPT-5.5) | Faster, more consistent latency |
Built-in tools
| Tool | Price |
|---|---|
| Web search | $10.00 / 1K calls + content tokens at model rates |
| Containers (code execution) | $0.03 per GB per 20-minute session |
| File search | $2.50 / 1K calls + $0.10 / GB / day storage (first GB free) |
| Moderation API | Free |
For models not listed here (legacy GPT-4-series, deep research, embeddings variants, and every dated snapshot), see OpenAI's full pricing documentation.
Conclusion
OpenAI API pricing in 2026 runs from $0.20 per million input tokens on GPT-5.4 nano to $30.00 on the Pro models, with the flagship GPT-5.5 at $5.00 input and $30.00 output.
Keeping the bill predictable comes down to a few deliberate choices:
- Pick the smallest model that does the job.
- Cache your repeated prompts.
- Push non-urgent work to Batch or Flex for 50% off.
- Watch the costs that don't show up in the headline rates: reasoning tokens, long context, and tool calls.
Prices here were against OpenAI's official pricing pages. OpenAI updates pricing frequently, so always confirm current rates before committing to a budget.
Related
- How to Get an OpenAI API Key
- Free, Unlimited OpenAI API
- Claude API Pricing
- Gemini API Pricing
- Grok API Pricing
- DeepSeek API Pricing
- Qwen API Pricing
- Mistral API Pricing
- Perplexity API Pricing
- MiniMax API Pricing
- Kimi API Pricing
- Z.ai GLM API Pricing
- Cohere API Pricing
- Free, Unlimited AI API
- Free LLM API
Free, Serverless AI and Cloud
Start creating powerful web applications with Puter.js in seconds!
Get Started Now