OpenAI API Pricing
On this page
This guide breaks down everything you need to know about OpenAI API pricing: every model, every tier, and every discount. Whether you're budgeting for a side project or planning enterprise-scale usage, you'll find the exact numbers here.
At the end, we'll also show you how to access OpenAI models for free using Puter.js: no API keys, no billing setup, no cost to you as a developer. Puter is the pioneer of the "User-Pays" model, which allows developers to incorporate AI capabilities into their applications while each user will cover their own usage costs.
How OpenAI API pricing works
OpenAI charges based on tokens, the pieces of text the model reads and generates. As a rough estimate, 1 token is approximately 4 characters or 0.75 words in English. You're billed separately for:
- Input tokens: the text you send to the model (your prompt, system instructions, conversation history)
- Cached input tokens: previously seen input that's served from cache at a steep discount
- Output tokens: the text the model generates in response
All prices below are per million tokens (MTok) in USD.
Model pricing
OpenAI offers a wide range of models across different capability and price tiers. Here are the current flagship models:
GPT models (text generation)
| Model | Input | Cached Input | Output |
|---|---|---|---|
| GPT-5.5 | $5 / MTok | $0.50 / MTok | $30 / MTok |
| GPT-5.4 | $2.50 / MTok | $0.25 / MTok | $15 / MTok |
| GPT-5.4 mini | $0.75 / MTok | $0.075 / MTok | $4.50 / MTok |
| GPT-5.4 nano | $0.20 / MTok | $0.02 / MTok | $1.25 / MTok |
| GPT-5.2 | $1.75 / MTok | $0.175 / MTok | $14 / MTok |
| GPT-5.1 | $1.25 / MTok | $0.125 / MTok | $10 / MTok |
| GPT-5 | $1.25 / MTok | $0.125 / MTok | $10 / MTok |
| GPT-5 mini | $0.25 / MTok | $0.025 / MTok | $2 / MTok |
| GPT-5 nano | $0.05 / MTok | $0.005 / MTok | $0.40 / MTok |
| GPT-4o | $2.50 / MTok | $1.25 / MTok | $10 / MTok |
| GPT-4o mini | $0.15 / MTok | $0.075 / MTok | $0.60 / MTok |
| GPT-4.1 | $2 / MTok | $0.50 / MTok | $8 / MTok |
| GPT-4.1 mini | $0.40 / MTok | $0.10 / MTok | $1.60 / MTok |
| GPT-4.1 nano | $0.10 / MTok | $0.025 / MTok | $0.40 / MTok |
Reasoning models
| Model | Input | Cached Input | Output |
|---|---|---|---|
| o3 | $2 / MTok | $0.50 / MTok | $8 / MTok |
| o4-mini | $1.10 / MTok | $0.275 / MTok | $4.40 / MTok |
| o3-mini | $1.10 / MTok | $0.55 / MTok | $4.40 / MTok |
| o1 | $15 / MTok | $7.50 / MTok | $60 / MTok |
| o1-mini | $1.10 / MTok | $0.55 / MTok | $4.40 / MTok |
Pro models (highest quality, premium pricing)
| Model | Input | Output |
|---|---|---|
| GPT-5.5 Pro | $30 / MTok | $180 / MTok |
| GPT-5.4 Pro | $30 / MTok | $180 / MTok |
| GPT-5.2 Pro | $21 / MTok | $168 / MTok |
| GPT-5 Pro | $15 / MTok | $120 / MTok |
| o3 Pro | $20 / MTok | $80 / MTok |
| o1 Pro | $150 / MTok | $600 / MTok |
Specialized models
| Model | Input | Output | Use Case |
|---|---|---|---|
| o3-deep-research | $10 / MTok | $40 / MTok | Deep research tasks |
| o4-mini-deep-research | $2 / MTok | $8 / MTok | Deep research (budget) |
| computer-use-preview | $3 / MTok | $12 / MTok | Computer use / browser automation |
Legacy models
| Model | Input | Output |
|---|---|---|
| GPT-4 Turbo (2024-04-09) | $10 / MTok | $30 / MTok |
| GPT-3.5 Turbo | $0.50 / MTok | $1.50 / MTok |
Which model should you choose?
- GPT-5.5: The most capable GPT model. Best for complex reasoning, creative tasks, and the hardest problems. At $5/$30 per MTok, it's a premium choice for when quality matters most.
- GPT-5.4: Excellent all-around model at half the output cost of GPT-5.5. A strong default for most production workloads.
- GPT-5.4 mini: Great balance of quality and cost for chatbots, content generation, and general-purpose tasks.
- GPT-5.4 nano: Ultra-cheap at $0.20/$1.25 per MTok. Perfect for high-volume, low-complexity tasks like classification, extraction, and routing.
- o3 / o4-mini: Reasoning models that think step-by-step. Use o3 for complex logic and math, o4-mini for budget-friendly reasoning.
- GPT-4o: Still widely used. Solid multimodal model with vision, audio, and text capabilities.
For most developers starting out, GPT-5.4 mini is the best value. Highly capable at $0.75/$4.50 per MTok, with cached input at just $0.075/MTok.
What does this cost in practice?
To give you a sense of real-world costs with GPT-5.4 mini ($0.75 input / $4.50 output per MTok):
| Use Case | Approx. Tokens | Estimated Cost |
|---|---|---|
| Single chat message (500 in / 500 out) | 1,000 | $0.003 |
| Summarize a 10-page document | ~5,000 in / 500 out | $0.006 |
| Analyze a 50-page PDF | ~25,000 in / 2,000 out | $0.03 |
| Process 1,000 customer support tickets | ~3.7M total | ~$10.00 |
| 10,000 short API calls / day (30 days) | ~300M/month | ~$790/month |
For the cheapest option, GPT-5 nano at $0.05/$0.40 per MTok cuts these numbers by another 75%.
OpenAI vs Claude vs Gemini: price comparison
How does OpenAI stack up against competing models at similar capability tiers?
| Model | Input | Output | Context Window |
|---|---|---|---|
| GPT-5.4 mini | $0.75 / MTok | $4.50 / MTok | 128K |
| Claude Sonnet 4.6 | $3 / MTok | $15 / MTok | 1M |
| Gemini 2.5 Pro | $1.25–$2.50 / MTok | $10–$15 / MTok | 1M |
| GPT-5.4 nano | $0.20 / MTok | $1.25 / MTok | 128K |
| Claude Haiku 4.5 | $1 / MTok | $5 / MTok | 200K |
| Gemini 2.5 Flash | $0.15–$0.30 / MTok | $0.60–$3.50 / MTok | 1M |
| GPT-5.5 | $5 / MTok | $30 / MTok | 272K |
| Claude Opus 4.7 | $5 / MTok | $25 / MTok | 1M |
OpenAI's nano and mini models are among the cheapest high-quality options available. At the flagship tier, GPT-5.5 and Claude Opus 4.7 have similar input pricing, but Claude is cheaper on output ($25 vs $30 per MTok).
Cached input pricing
One of the biggest cost-saving features in the OpenAI API is automatic input caching. When you send the same prompt prefix across multiple requests, cached tokens are served at a massive discount, typically 90% off standard input pricing.
| Model | Standard Input | Cached Input | Savings |
|---|---|---|---|
| GPT-5.5 | $5 / MTok | $0.50 / MTok | 90% |
| GPT-5.4 | $2.50 / MTok | $0.25 / MTok | 90% |
| GPT-5.4 mini | $0.75 / MTok | $0.075 / MTok | 90% |
| GPT-5.4 nano | $0.20 / MTok | $0.02 / MTok | 90% |
| o3 | $2 / MTok | $0.50 / MTok | 75% |
| o4-mini | $1.10 / MTok | $0.275 / MTok | 75% |
| GPT-4o | $2.50 / MTok | $1.25 / MTok | 50% |
Caching is automatic — no special configuration needed. If your requests share common prefixes (system prompts, conversation history, reference documents), you'll see cache hits automatically. This makes repeated calls with the same context dramatically cheaper.
Batch API pricing (50% off)
If your workload doesn't need real-time responses, the Batch API processes requests asynchronously at half the standard price:
| Model | Batch Input | Batch Cached Input | Batch Output |
|---|---|---|---|
| GPT-5.5 | $2.50 / MTok | $0.25 / MTok | $15 / MTok |
| GPT-5.4 | $1.25 / MTok | $0.125 / MTok | $7.50 / MTok |
| GPT-5.4 mini | $0.375 / MTok | $0.0375 / MTok | $2.25 / MTok |
| o3 | $1 / MTok | $0.25 / MTok | $4 / MTok |
| o4-mini | $0.55 / MTok | $0.1375 / MTok | $2.20 / MTok |
| GPT-4o | $1.25 / MTok | $0.625 / MTok | $5 / MTok |
| GPT-4o mini | $0.075 / MTok | $0.0375 / MTok | $0.30 / MTok |
The Batch API is ideal for bulk processing tasks like document analysis, data extraction, or content moderation where you can tolerate some latency (results typically returned within 24 hours).
Priority and Flex tiers
OpenAI offers alternative processing tiers beyond the standard tier:
Priority tier (2.5x standard pricing)
For workloads that need guaranteed availability and the lowest latency. Available for select models:
| Model | Priority Input | Priority Output |
|---|---|---|
| GPT-5.5 | $12.50 / MTok | $75 / MTok |
| GPT-4o | $4.25 / MTok | $17 / MTok |
Flex tier (variable pricing)
A lower-cost option with limited availability. Requests may be queued during peak times. Available for GPT-5.5, GPT-5.4, GPT-5, o3, and o4-mini.
Tool and feature pricing
Web search
Web search pricing depends on the model type:
| Model Type | Price |
|---|---|
| Reasoning models (o3, o4-mini, etc.) | $10 / 1,000 searches |
| Non-reasoning models (GPT-5.5, GPT-5.4, etc.) | $25 / 1,000 searches |
Standard token costs for search-generated content apply on top.
File search
| Component | Price |
|---|---|
| Storage | $0.10 / GB per day |
| Tool calls | $2.50 / 1,000 calls |
Containers (Code Interpreter)
Containers run code in isolated environments with configurable memory. Pricing is per 20-minute session:
| Memory | Price per session |
|---|---|
| 1 GB | $0.03 |
| 4 GB | $0.04 |
| 16 GB | $0.12 |
| 64 GB | $0.48 |
Embeddings
| Model | Price |
|---|---|
| text-embedding-3-small | $0.02 / MTok |
| text-embedding-3-large | $0.13 / MTok |
Moderation
The moderation endpoint is free to use.
Image generation
OpenAI's image generation models (gpt-image-2, gpt-image-1) are priced per image based on quality and resolution. For current image generation pricing, see the image generation guide.
Audio and Realtime API
OpenAI offers audio capabilities including real-time voice conversations, text-to-speech, and transcription. Audio tokens are priced differently from text tokens. For current audio and Realtime API pricing, see the official pricing page.
Data residency pricing
Regional processing endpoints (data residency) incur a 10% uplift on standard pricing for select models including GPT-5.5, GPT-5.4, and GPT-5.4 Pro variants.
Rate limits
OpenAI uses a tiered system based on your usage and payment history:
| Tier | How to Qualify |
|---|---|
| Free | Default for new accounts (limited usage) |
| Tier 1 | $5+ paid |
| Tier 2 | $50+ paid, 7+ days since first payment |
| Tier 3 | $100+ paid, 7+ days since first payment |
| Tier 4 | $250+ paid, 14+ days since first payment |
| Tier 5 | $1,000+ paid, 30+ days since first payment |
Rate limits apply to requests per minute (RPM), tokens per minute (TPM), and requests per day (RPD). Higher tiers unlock significantly more capacity.
Billing and payment
- Billed monthly based on actual usage
- Payments in USD
- Credit card required for paid tiers
- Free tier available with limited usage for testing
- Prepaid credits available for enterprise customers
- Usage tracking available in the OpenAI Platform dashboard
Tips to reduce your OpenAI API costs
If you're using the OpenAI API directly, here are practical ways to keep your bill down:
Start with nano or mini models. Many tasks don't need GPT-5.5. GPT-5.4 nano at $0.20/$1.25 per MTok handles classification, extraction, and simple Q&A well. Only upgrade when quality demands it.
Leverage cached input. OpenAI automatically caches repeated prompt prefixes at up to 90% off. Structure your requests with consistent system prompts and shared context to maximize cache hits.
Batch non-urgent work. Document processing, data extraction, and analytics can use the Batch API at 50% off. If you don't need results in real-time, there's no reason to pay full price.
Trim your inputs. Every token costs money. Remove unnecessary conversation history, compress system prompts, and avoid sending entire documents when a relevant excerpt will do.
Set max output tokens. Use the
max_tokensparameter to cap response length and keep costs predictable.Use reasoning models only when needed. o3 and o4-mini are great for math and logic, but overkill for simple tasks. Route requests to the cheapest model that can handle them.
Monitor usage in the dashboard. OpenAI's usage page shows token consumption by model, so you can spot unexpected spikes before they become expensive surprises.
The free alternative: Puter.js
If you're a developer building an app that uses OpenAI models, there's a way to skip all of the above: no API keys, no billing setup, no rate limit management, and no cost to you.
Puter.js is a JavaScript SDK that gives you access to OpenAI and 400+ other AI models directly from your frontend code, including chat, text-to-speech, image generation, and more. It uses a "User-Pays" model: each user of your app covers their own AI usage through their Puter account. You, the developer, pay nothing.
Here's what that means in practice:
| OpenAI API (Direct) | Puter.js | |
|---|---|---|
| Cost to developer | Pay per token | Free |
| API key required | Yes | No |
| Billing setup | Credit card required | None |
| Rate limits | Per-organization tiers | Per-user (handled by Puter) |
| Backend required | Yes (to protect your key) | No |
| Models available | OpenAI only | GPT + Claude + Gemini + 500 more |
| Capabilities | Chat, TTS, images, etc. (separate APIs) | Chat, TTS, image generation, and more in one unified SDK |
Try it now
Add one script tag to your HTML and start using GPT immediately:
<html>
<body>
<script src="https://js.puter.com/v2/"></script>
<script>
puter.ai.chat("Explain quantum computing in simple terms", {
model: "openai/gpt-5.4-nano"
}).then(response => {
document.body.innerHTML = response.message.content[0].text;
});
</script>
</body>
</html>
No API key. No backend. No billing. You can also use GPT-5.5, GPT-5.4, o3, o4-mini, and every other OpenAI model the same way.
You can also stream responses for a better user experience:
<html>
<body>
<div id="output"></div>
<script src="https://js.puter.com/v2/"></script>
<script>
async function streamResponse() {
const response = await puter.ai.chat("Write a short poem about coding", {
model: "openai/gpt-5.4-nano",
stream: true
});
const output = document.getElementById('output');
for await (const chunk of response) {
if (chunk?.text) {
output.textContent += chunk.text;
}
}
}
streamResponse();
</script>
</body>
</html>
Why developers choose Puter.js over direct API access
- $0 infrastructure cost: Your users pay for their own usage, so your app costs nothing to run regardless of scale
- No API key management: No keys to rotate, no secrets to protect, no backend needed to hide them
- No rate limit headaches: Each user has their own limits, so one user's traffic never blocks another's
- More than just chat: Text-to-speech, image generation, and other AI capabilities are all available through the same SDK, no juggling separate APIs
- Access every AI provider: Switch between GPT, Claude, Gemini, Grok, DeepSeek, and more with one line of code, no separate accounts or billing for each
- Ship faster: Go from idea to production in minutes, not days of billing setup and backend configuration
Related
Free, Serverless AI and Cloud
Start creating powerful web applications with Puter.js in seconds!
Get Started Now