Grok API Pricing: Full Breakdown of Costs (Jun 2026)
On this page
In this guide, you'll learn what every current Grok model costs, how the billing works (including a few fees that aren't obvious), ways to cut your bill, and the options for using Grok completely free.
How much does the Grok API cost?
xAI's flagship model, Grok 4.3, costs $1.25 per 1 million input tokens and $2.50 per 1 million output tokens, with a 1 million token context window. That output price is a small fraction of what OpenAI, Anthropic, or Google charge for their flagship models, which makes Grok one of the cheapest frontier-class APIs available.
Here's the quick view of the current lineup:
| Model | Best for | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|---|
| Grok 4.3 | Flagship: reasoning, agentic tool calling, low hallucination rate | $1.25 | $2.50 |
| Grok 4.20 (reasoning / non-reasoning / multi-agent) | Previous flagship family, same price | $1.25 | $2.50 |
| Grok Build 0.1 | Coding and app building | $1.00 | $2.00 |
| Grok Imagine (image) | Image generation | $0.002–$0.01 / image | $0.02–$0.07 / image |
| Grok Imagine (video) | Video generation | — | $0.05–$0.14 / second |
| Voice API (realtime) | Real-time voice agents | — | $0.05 / minute |
All pricing in this article is sourced from xAI's official docs.
Note that this is the API, not the subscriptions. SuperGrok and X Premium are consumer plans for the Grok chat interface and the X app; they don't include API access, and the API doesn't require them. The API is a separate, pay-as-you-go developer product billed per token through the xAI Console.
How Grok API pricing works
The Grok API uses per-token, pay-as-you-go billing: you pay for the tokens you send (input) and the tokens the model generates (output). A token is a chunk of text, roughly three-quarters of an English word, so 1 million tokens is about 750,000 words. The API is OpenAI SDK compatible, so cost estimates translate directly if you're migrating from GPT models.
A few structural points worth knowing:
Output costs only 2x input. Grok 4.3's $1.25-in/$2.50-out ratio is unusually narrow; most providers charge 4–6x more for output than input. Practically, this means generation-heavy workloads (long answers, content creation, verbose agents) are where Grok's pricing advantage is largest.
Reasoning tokens are billed. Grok's models can run in reasoning mode, where the model generates internal thinking tokens before answering. Those reasoning tokens are billed like any other generated tokens, even though you don't see them in the final response. The Grok 4.20 family makes the trade-off explicit with separate reasoning and non-reasoning variants at the same per-token price. The non-reasoning variant generates far fewer tokens per request, which makes it cheaper in practice for simple tasks.
No long-context surcharge. Grok 4.3's published rate applies across its full 1M-token context window; there's no separate, more expensive long-context schedule of the kind OpenAI uses. For workloads that routinely send hundreds of thousands of tokens per request, this flat pricing is a structural advantage, not just a lower sticker price.
Pricing can vary by region. xAI serves models from multiple clusters, and individual model pages in the docs list region-based pricing. If you're pinning requests to a specific data residency region, check the model's detail page rather than assuming the headline rate.
What can push your bill higher
Four things commonly push a real Grok bill above a simple token estimate.
Server-side tools charge per invocation
Grok's agentic tools each cost money per invocation on top of the tokens they consume. Current rates: Web Search, X Search, and Code Execution each cost $5.00 per 1,000 calls; File Attachments search costs $10.00 per 1,000 calls; and Collections Search (RAG) costs $2.50 per 1,000 calls. In agentic requests, the model decides how many tool calls to make. A single complex query might trigger five searches and two code executions, so tool costs scale with query complexity, not request count. The retrieved content also flows back into the model as billed input tokens.
Reasoning mode multiplies output tokens
A short visible answer can sit on top of thousands of billed reasoning tokens. If you're seeing higher-than-expected output token counts in your usage data, reasoning mode is the first place to look.
Storage and downloads are billed separately
Files stored on the platform cost $0.025 per GiB per day, and indexed collections (for RAG) cost 4x that at $0.10 per GiB per day. Downloading your own data back out costs $0.20 per GiB. None of these are large numbers, but a large RAG index left in place for months adds a recurring line to your bill.
The usage guidelines violation fee
This fee is specific to xAI: if a request violates the usage guidelines and gets blocked before generation, you're charged a $0.05 fee per blocked request, and requests flagged after generation are still billed for the tokens generated. For most developers this never comes up, but if you're running user-generated prompts at scale, moderating input on your side has a direct dollar value.
How to reduce Grok API costs
Grok's lineup is nearly flat-priced (the flagship and the 4.20 family all cost the same), so unlike most providers, "use a smaller model" isn't the main lever here. These are, in order of impact:
1. Turn off reasoning for simple tasks
Since all the chat models cost the same per token, the number of tokens generated is the cost difference, and reasoning mode is the largest multiplier on that number. Use the non-reasoning variant (or disable reasoning per request) for classification, extraction, formatting, and straightforward chat, and save reasoning mode for problems that need it. For many workloads, this halves output costs or better with no quality loss on the easy cases.
2. Use the Batch API for anything that can wait
The Batch API processes requests asynchronously, typically within 24 hours, at 20–50% off standard token rates, applied to every token type including reasoning and cached tokens. Batch requests also don't count against your rate limits, which makes it useful for bulk jobs like evaluations, dataset processing, and content moderation backlogs. The discount applies to text models only; image and video generation can run through Batch but at standard rates. Each model's exact batch price is on its detail page in the docs.
3. Structure prompts for caching
Cached input drops from $1.25 to $0.20 per 1M tokens, an 84% discount. Caching applies to repeated prompt prefixes, so put your static content (system prompt, instructions, shared documents) at the start of the prompt and the variable user content at the end. For chatbots and agents that resend a long system prompt on every request, this cuts input costs substantially.
4. Constrain tool use in agentic requests
Because the model autonomously decides how many $5-per-thousand tool calls to make, unconstrained agentic prompts are the least predictable line in a Grok bill. Scope your prompts ("search at most twice"), only enable the tools a request actually needs, and monitor per-request costs. The API returns exact cost data with each response, and the console's cost tracking breaks down spend by tool.
5. Cap output length
The standard advice still applies: set output limits and ask for concise or structured responses. With output at only 2x input, this matters less on Grok than on other providers, but at high volume it still adds up.
Can you use the Grok API for free?
There's no permanent free tier on the xAI API itself, but there are two ways to use Grok models without paying, and the first scales to any number of users.
Puter.js: the User-Pays model
Puter.js is a JavaScript library that lets you add Grok models (including Grok 4.3, Grok 4.20, and Grok Build) to your app with no API key, no backend, and no bill to you as the developer. It works on the User-Pays model: each user of your app covers their own AI usage through their Puter account, so your cost stays at zero regardless of how many users you have.
<html>
<body>
<script src="https://js.puter.com/v2/"></script>
<script>
puter.ai.chat(
"Explain quantum computing in a witty and engaging way.",
{ model: "x-ai/grok-4.3" }
).then(response => {
document.body.innerHTML = response.message.content;
});
</script>
</body>
</html>
We calculated this for a sample app: 500 monthly users sending 30 messages each, averaging 1,000 input and 300 output tokens per message, consumes 15M input and 4.5M output tokens a month. Through the xAI API on Grok 4.3, our estimate comes to about $18.75 for input and $11.25 for output, roughly $30 every month, scaling linearly as you grow. The same app through Puter.js costs you $0 at 500 users and still $0 at 50,000 users (where we calculate the API bill would have reached ~$3,000/month), because each user carries their own usage.
OpenRouter's free endpoints
OpenRouter periodically offers free variants of Grok models (tagged :free), useful for testing without any account spend. The limits make it a testing option only: 50 requests per day and 20 per minute on a free account, rising to 1,000 per day once you've purchased at least $10 in credits. Free endpoints also get rate-limited by upstream providers during peak times, and failed requests still count against your daily quota, so prototype on it but don't ship on it.
Real-world cost examples
We worked through a few common workloads using the same method each time: tokens per request × volume × the per-million rate, input and output calculated separately.
Customer support chatbot. 1,000 conversations a month, averaging 8 messages each, with ~1,200 input tokens (system prompt plus history) and 250 output tokens per message, so 9.6M input and 2M output tokens monthly. We calculate this at $12.00 input + $5.00 output ≈ $17/month on the flagship model. For comparison, we estimate the identical workload at ~$108/month on OpenAI's flagship. Add prompt caching (that repeated system prompt) and it drops to about $10.
Summarizing 100 PDFs. At ~20,000 tokens per document with 500-token summaries: 2M input, 50K output. On Grok 4.3, our calculation comes to about $2.63. Through the Batch API: roughly $1.30–$2.10 depending on the model's batch discount.
Daily content generation. 30 articles a month with 2,000-token prompts and ~4,000 output tokens each (reasoning included): we estimate about $0.38 a month. Effectively free.
Across all three, we found the same pattern: on Grok, low-to-mid volume workloads cost single-digit dollars.
Complete Grok API pricing table
All prices in USD, standard real-time rates.
Chat models (per 1M tokens)
| Model | Context | Input | Cached input | Output |
|---|---|---|---|---|
| Grok 4.3 | 1M | $1.25 | $0.20 | $2.50 |
| Grok 4.20 (reasoning) | 1M | $1.25 | $0.20 | $2.50 |
| Grok 4.20 (non-reasoning) | 1M | $1.25 | $0.20 | $2.50 |
| Grok 4.20 (multi-agent) | 1M | $1.25 | $0.20 | $2.50 |
| Grok Build 0.1 | 256K | $1.00 | $0.20 | $2.00 |
Image and video (Grok Imagine)
| Model | Input | Output |
|---|---|---|
| Image (quality) | $0.01 / image | $0.05 / image (1K), $0.07 / image (2K) |
| Image (standard) | $0.002 / image | $0.02 / image |
| Video | $0.01 / sec + $0.002 / image | $0.05 / sec (480p), $0.07 / sec (720p) |
| Video 1.5 preview (image-to-video) | $0.01 / image | $0.08 / sec (480p), $0.14 / sec (720p) |
Voice
| Mode | Price |
|---|---|
| Realtime voice | $0.05 / minute ($3.00 / hour) |
| Realtime text input | $0.004 / message |
| Text to speech | $15.00 / 1M characters |
| Speech to text | $0.10 / hour (REST), $0.20 / hour (streaming) |
Tools (per invocation, plus token costs)
| Tool | Price |
|---|---|
| Web Search | $5.00 / 1K calls |
| X Search | $5.00 / 1K calls |
| Code Execution | $5.00 / 1K calls |
| File Attachments search | $10.00 / 1K calls |
| Collections Search (RAG) | $2.50 / 1K calls |
| Image/video understanding, Remote MCP | Token-based only |
Storage and other fees
| Item | Price |
|---|---|
| File storage | $0.025 / GiB / day |
| Collection storage | $0.10 / GiB / day |
| File or collection downloads | $0.20 / GiB |
| Usage guidelines violation (pre-generation block) | $0.05 / request |
| Batch API | 20–50% off standard token rates |
xAI retired most legacy models (including the Grok 4 and Grok 3 families) in May 2026, with requests redirecting to current models, so if you're carrying old model names in your code, check the migration guide and the console's models page for what your team can actually access.
Conclusion
Grok 4.3 costs $1.25 per million input tokens and $2.50 per million output tokens, with no long-context surcharge across its 1M-token window. The rest of the lineup is priced within the same range, so the model you choose rarely changes the bill much.
Because the lineup is nearly flat-priced, picking a smaller model isn't the main way to cut costs. The levers are:
- Toggling reasoning off for simple tasks
- Caching repeated prompts for 84% off
- Batching non-urgent work for another 20–50%
- Constraining agentic tool calls
Prices were verified against xAI's official docs. Models retire and prices shift, so confirm current rates in the xAI Console before committing to a budget.
Related
- Free, Unlimited Grok API
- How to Get a Grok API Key
- Access Grok Using OpenAI-Compatible API
- How to Use Grok with the Vercel AI SDK
- OpenAI API Pricing
- Claude API Pricing
- Gemini API Pricing
- DeepSeek API Pricing
- Qwen API Pricing
- Mistral API Pricing
- Perplexity API Pricing
- MiniMax API Pricing
- Kimi API Pricing
- Z.ai GLM API Pricing
- Free, Unlimited AI API
- Free LLM API
Free, Serverless AI and Cloud
Start creating powerful web applications with Puter.js in seconds!
Get Started Now