ElevenLabs API Pricing: Full Breakdown of Costs (Jun 2026)
On this page
This guide breaks down what the ElevenLabs API costs across every product, what drives your bill, how to lower it, and how to use it for free (see also our free ElevenLabs API tutorial).
How much does the ElevenLabs API cost?
ElevenLabs bills text to speech per character: its high-quality Multilingual v2/v3 model costs $0.10 per 1,000 characters, and the faster Flash/Turbo model costs $0.05 per 1,000 characters, which is the cheapest way to generate speech on the platform. The rest of the catalog is priced by audio duration (per minute or per hour) or per generation, because ElevenLabs is a suite of voice products rather than a single model.
| Product | Model | Price | Unit |
|---|---|---|---|
| Text to Speech | Flash / Turbo | $0.05 | per 1,000 characters |
| Text to Speech | Multilingual v2/v3 | $0.10 | per 1,000 characters |
| Speech to Text | Scribe v1/v2 | $0.22 | per hour |
| Speech to Text | Scribe v2 Realtime | $0.39 | per hour |
| Agents | Speech Engine | $0.08 | per minute |
| Music | Eleven Music | $0.30 | per minute |
| Voice Changer | — | $0.12 | per minute |
| Voice Isolator | — | $0.12 | per minute |
| Sound Effects | — | $0.12 | per generation |
| Dubbing | Dubbing v1 | $0.33 | per minute (watermarked) |
Prices exclude tax. And the per-unit rate is the same whether you are on the free plan or the Business plan: paid tiers bundle more included usage and add features, they do not lower the rate.
ElevenCreative, ElevenAPI, and ElevenAgents are different products
ElevenLabs sells three things that share one credit system, and people building apps often pay for the wrong one:
- ElevenCreative is the consumer subscription (web app, Studio, voice cloning), measured in monthly credits.
- ElevenAPI is the developer interface billed per character, minute, or generation. This is what this article covers.
- ElevenAgents (the Speech Engine) is the real-time conversational layer billed per minute.
A Creator or Pro subscription gives you credits you can spend through the API, but you don't need one to call the API: pay-as-you-go works on its own.
How ElevenLabs API pricing works
ElevenLabs does not bill by token. There are no tokens, no input/output split, and no context window. You pay for the audio you produce or process.
Characters, not tokens
Text to speech is billed on the number of characters you send to be synthesized. The text you submit is the meter: a 500-character paragraph costs 500 characters' worth of generation regardless of how long the resulting audio is. In the consumer credit system, one character equals one credit on Multilingual, and Flash costs roughly half a credit per character, which is why Flash comes out at $0.05 against Multilingual's $0.10 per 1,000 characters.
Audio duration for everything that isn't text to speech
Speech to text is billed per hour of audio; dubbing, music, voice changer, and voice isolator are billed per minute of audio processed or produced. The agents Speech Engine is billed per minute of conversation. Sound effects are billed per generation. This matters when you plan a budget: a one-minute clip costs the same to transcribe whether it contains ten words or two hundred.
Subscription credits versus pay as you go
You can pay two ways. A subscription (Starter at $6/month through Business at $990/month) gives you a monthly credit allowance at a fixed price, with unused credits rolling over for up to two months while the subscription stays active. Pay-as-you-go charges the per-unit rates above with no monthly commitment. The per-unit price is identical either way, so the subscription is worth it only if you reliably use the bundled allowance.
The model you pick sets the rate
On text to speech, Flash and Turbo cost half what Multilingual costs. Flash and Turbo are functionally equivalent, and ElevenLabs recommends Flash over Turbo in every case. On speech to text, the batch Scribe v2 model at $0.22/hour costs less than Scribe v2 Realtime at $0.39/hour. Picking the cheaper model where it meets your quality bar is where most cost control starts.
What makes your bill higher than expected
Multilingual costs twice as much as Flash
The default text-to-speech model in many examples is Multilingual v2 at $0.10 per 1,000 characters. Flash delivers the same character throughput at $0.05. If you default to Multilingual for content that does not need its quality, you are paying double.
Scribe add-ons stack on the base rate
Scribe transcription starts at $0.22/hour, but two features add to that. Keyterm prompting adds $0.05/hour (a 20% increase), and entity detection, which is API-only, adds $0.07/hour (a 30% increase). Enable both and your effective rate is about $0.34/hour, roughly 55% above the base.
Watermark-free dubbing costs more
Dubbing is $0.33 per source minute with a watermark on the output. Removing the watermark raises the rate to $0.50 per minute, about 50% more. If you are dubbing content for commercial release, budget for the clean rate, not the headline one.
Realtime transcription carries a premium
Scribe v2 Realtime at $0.39/hour costs about 77% more than batch Scribe at $0.22/hour. If your workload is recorded audio that does not need live results, batch transcription is the cheaper path.
Concurrency limits push you up a tier
Higher subscription tiers raise how many requests you can run in parallel. A workload that needs to process many files or calls at once can hit a concurrency ceiling on a lower plan even when you have credits left, which forces an upgrade based on throughput rather than volume.
Free output is not licensed for commercial use
The free tier includes 10,000 credits a month but no commercial license. Commercial rights begin on the Starter tier at $6/month. Audio you generate on the free plan cannot be used in monetized or commercial work.
How to reduce ElevenLabs API costs
1. Use Flash or Turbo instead of Multilingual where quality allows
Flash at $0.05 per 1,000 characters halves your text-to-speech cost against Multilingual at $0.10. Use Multilingual for final, high-quality production audio and Flash for drafts, real-time responses, and high-volume content where the quality difference does not matter.
2. Cut the characters you synthesize
You are billed per character, so generate each piece of audio once and store it. Cache static prompts, menu messages, and repeated responses instead of re-synthesizing them on every request. Trimming boilerplate text before synthesis directly lowers the bill.
3. Match your subscription tier to your real usage
Because the per-unit rate is flat, a subscription only saves money if you use its bundled credits. If your usage is spiky or below a tier's allowance, pay-as-you-go is cheaper. If you consistently exceed a tier, moving up bundles the overage at the same rate while adding concurrency and features.
4. Turn off Scribe add-ons you don't need
Entity detection (+$0.07/hour) and keyterm prompting (+$0.05/hour) each raise your transcription rate. Enable them only for the jobs that need them rather than as global defaults.
5. Accept watermarked dubbing when you can
Watermarked dubbing at $0.33/minute costs a third less than the $0.50/minute clean rate. For internal review, previews, or non-commercial use, the watermarked output is enough.
6. Batch generations to stay within concurrency limits
Group work into batches that fit your plan's concurrency rather than upgrading for occasional parallel spikes. For transcription, Scribe processes long files in parallel internally, so consolidating short clips into longer files can use your throughput more efficiently.
Can you use the ElevenLabs API for free?
Puter.js: the User-Pays model
Puter.js is a JavaScript library that lets you add ElevenLabs models to your app with no API key, no backend, and no bill to you as the developer. It works on the User-Pays model: each user of your app covers their own AI usage through their Puter account, so your costs stay at zero no matter how many users you have.
<html>
<body>
<script src="https://js.puter.com/v2/"></script>
<script>
puter.ai.txt2speech("Explain quantum computing in simple terms", {
provider: "elevenlabs",
model: "eleven_multilingual_v2"
}).then(audio => {
audio.play();
});
</script>
</body>
</html>
We ran the same workload we use across our pricing guides: 500 monthly users sending 30 messages each. Each 300-token response is about 1,300 characters of speech, so that's roughly 19.5M characters a month across 15,000 messages. Text to speech bills only the characters you synthesize, so the input text isn't charged.
Here's what that costs:
- Multilingual v2 ($0.10 / 1K chars): about $1,950 a month
- Flash ($0.05 / 1K chars): about $975 a month
- Puter.js: $0 at any scale, because each user carries their own usage
The ElevenLabs costs grow linearly with your user base; Puter.js doesn't. (If your app also transcribes user audio, Scribe adds $0.22/hour on top.)
ElevenLabs Startup Grants
The Startup Grants Program gives eligible new companies 12 months of free access and 33,000,000 characters, along with higher concurrency limits and improved support. It is aimed at startups building real-time conversational agents, and you apply through ElevenLabs.
The ElevenLabs free tier
The free plan includes 10,000 credits a month, which covers roughly 10 minutes of Multilingual text to speech, and gives access to most API endpoints. It has no commercial license, so it suits prototyping and testing rather than production. Commercial use starts at the Starter tier ($6/month).
Real-world cost examples
A customer support voice bot
We modeled a support bot that speaks its replies, handling 10,000 conversations a month with about 1,500 characters of synthesized response per conversation, for 15M characters a month. We also priced the same workload as a full conversational agent on the Speech Engine, assuming an average three-minute call.
| Approach | Rate | Monthly volume | Monthly cost |
|---|---|---|---|
| Flash text to speech | $0.05 / 1K chars | 15M characters | $750 |
| Multilingual text to speech | $0.10 / 1K chars | 15M characters | $1,500 |
| Speech Engine (full agent) | $0.08 / minute | 30,000 minutes | $2,400 |
Raw text to speech is cheaper when you only need to voice text you already have. The Speech Engine costs more per minute because it handles the whole real-time loop (transcription, turn-taking, and synthesis) in one pipeline.
Narrating 100 PDFs
ElevenLabs cannot summarize documents: it has no language model, so it converts text to audio rather than condensing it. To narrate a batch of PDFs you would first extract or summarize the text with an LLM, then send that text to ElevenLabs. We calculated the narration step for 100 PDFs averaging 10 pages each, at about 3,000 characters per page, for 3M characters total. On Flash that is $150; on Multilingual, $300. The summarization step, if you want one, is billed by whichever LLM you use, not by ElevenLabs.
Daily content generation
We modeled a daily 10-minute audio brief, about 9,000 characters per episode, published every day for a month, for 270,000 characters. On Flash that comes to about $13.50 a month; on Multilingual, about $27. Adding a one-minute music bed per episode through the Music API at $0.30/minute adds about $9 a month, and any one-off custom music finetune is $1.50 each.
Complete ElevenLabs API pricing table
Text to Speech (per 1,000 characters)
| Model | Price |
|---|---|
| Flash / Turbo | $0.05 |
| Multilingual v2/v3 | $0.10 |
Speech to Text (per hour)
| Model | Base | Entity detection | Keyterm prompting |
|---|---|---|---|
| Scribe v1/v2 | $0.22 | +$0.07 | +$0.05 |
| Scribe v2 Realtime | $0.39 | — | — |
Other products
| Product | Price | Unit |
|---|---|---|
| Speech Engine (agents) | $0.08 included / $0.16 additional | per minute |
| Music | $0.30 (+$1.50 per finetune) | per minute |
| Voice Changer | $0.12 | per minute |
| Voice Isolator | $0.12 | per minute |
| Sound Effects | $0.12 | per generation |
| Dubbing v1 | $0.33 watermarked / $0.50 clean | per minute |
Subscription tiers (bundled credits, same per-unit rates)
| Tier | Price/month | Credits/month |
|---|---|---|
| Free | $0 | 10,000 |
| Starter | $6 | 30,000 |
| Creator | $22 (first month $11) | 121,000 |
| Pro | $99 | 600,000 |
| Scale | $299 | 1,800,000 |
| Business | $990 | 6,000,000 |
| Enterprise | Custom | Custom |
For the full per-tier included-usage breakdown across every product, see the official ElevenLabs API pricing page.
Conclusion
ElevenLabs text to speech costs $0.10 per 1,000 characters on Multilingual v2/v3 and $0.05 on Flash, with the rest of the catalog billed by audio duration (per minute or hour) or per generation. The main ways to control your bill:
- Use Flash or Turbo instead of Multilingual where quality allows (halves text-to-speech cost).
- Generate audio once and cache it instead of re-synthesizing.
- Match your subscription tier to your real usage, or stay on pay-as-you-go.
- Turn off Scribe add-ons and watermark-free dubbing when you do not need them.
- Use batch Scribe instead of Realtime for recorded audio.
Pricing verified against ElevenLabs' official pages.
Related
Free, Serverless AI and Cloud
Start creating powerful web applications with Puter.js in seconds!
Get Started Now