Speech to Text API
Add speech recognition to your app with Puter.js.
No backend or API keys required.
Multiple Transcription Models
Access GPT-4o Transcribe, Whisper, and diarization models through a single unified API.
- Switch models by changing one parameter
- Standard and advanced transcription engines
- No vendor lock-in
No API Keys Required
Skip the sign-ups, key management, and credential juggling. Just add Puter.js and start transcribing audio immediately.
Speaker Diarization
Identify who said what in multi-speaker recordings. Ideal for meeting transcriptions, interviews, and podcast notes.
No Backend Required
Transcribe audio directly from the browser with client-side JavaScript. No server code needed.
User-Pays Model
Your users cover their own transcription costs—so you can ship speech-to-text features without worrying about billing or runaway API expenses.
How It Works
1 Include Puter.js
Add Puter.js to your app:
<script src="https://js.puter.com/v2/"></script>
or with NPM:
npm install @heyputer/puter.js
2 Transcribe Audio
Use the simple JavaScript API to convert speech to text:
puter.ai.speech2txt("audio.mp3")
View the speech-to-text documentation for a full list of available features.
✓That's it!
No need to set up servers or infrastructure. No API keys, configuration, or rate-limiting. Everything is handled by Puter.js!
Speech Recognition Without the Complexity
Convert audio to text using different models and output formats.
The API handles all the complexity so you can focus on building your app.
// Basic speech to text transcription
const text = await puter.ai.speech2txt(audioFile);
console.log(text);
// Transcribe from a URL
const result = await puter.ai.speech2txt(
"https://example.com/audio.mp3"
);
// Use GPT-4o for higher accuracy
const transcript = await puter.ai.speech2txt(audioFile, {
model: "gpt-4o-transcribe"
});
// Speaker diarization - identify who said what
const diarized = await puter.ai.speech2txt(audioFile, {
model: "gpt-4o-transcribe-diarize",
response_format: "diarized_json"
});
// Generate SRT subtitles
const subtitles = await puter.ai.speech2txt(audioFile, {
response_format: "srt"
});
Frequently Asked Questions
Use Puter.js Speech to Text API to convert audio into accurate text transcriptions. Access multiple transcription models including GPT-4o Transcribe, Whisper, and diarization through a simple JavaScript API. Transcribe audio directly from your frontend code without managing API keys or backend infrastructure.
Puter.js is a JavaScript library that provides access to AI, storage, and other cloud services directly from your frontend code. It handles authentication, infrastructure, and scaling so you can focus on building your app.
With the User-Pays model, users cover their own transcription costs through their Puter account. This means you can build apps without worrying about infrastructure expenses.
You can use the API for meeting transcriptions, podcast notes, interview recordings, voice note apps, subtitle generation, accessibility features, language learning apps, voice assistants, and any application where users need to convert spoken audio into text.
Add Speech to Text to Your App
Get started with Puter.js and add speech recognition with a few lines of code.
Get Started Now