Tutorials

Free, Unlimited Speech-to-Text API

This tutorial demonstrates how to use Puter.js to transcribe audio files and convert speech to text for free, without requiring API keys or dealing with usage restrictions. With Puter.js, you can access powerful speech recognition models including GPT-4o transcription and Whisper for converting spoken audio into written text, perfect for meeting transcriptions, podcast notes, and multilingual translation needs.

Puter is the pioneer of the "User-Pays" model, enabling developers to integrate AI-powered transcription into their apps while users cover their own usage costs. This approach lets you offer professional-grade speech recognition features without managing API keys or backend infrastructure.

Getting Started

No API keys or account registration needed to use Puter.js. Simply add this script tag to your HTML file in either the <head> or <body> section:

<script src="https://js.puter.com/v2/"></script>

That's all the setup required to start transcribing audio with Puter.js!

Example 1Basic Audio Transcription

To transcribe an audio file, use the puter.ai.speech2txt() function. Here's a complete working example:

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        puter.ai.speech2txt('https://assets.puter.site/speech.mp3')
            .then(result => {
                puter.print(result.text || result);
            });
    </script>
</body>
</html>

Example 2Upload and Transcribe Audio Files

Create an interactive interface that lets users upload audio files for instant transcription:

<html>
<body>
    <input type="file" id="audio-upload" accept="audio/*">
    <button id="transcribe-btn">Transcribe Audio</button>
    <div id="transcript-output"></div>

    <script src="https://js.puter.com/v2/"></script>
    <script>
        document.getElementById('transcribe-btn').addEventListener('click', async () => {
            const fileInput = document.getElementById('audio-upload');
            const outputDiv = document.getElementById('transcript-output');
            
            if (!fileInput.files[0]) {
                outputDiv.textContent = 'Please select an audio file first';
                return;
            }

            outputDiv.textContent = 'Transcribing...';
            
            try {
                const result = await puter.ai.speech2txt(fileInput.files[0]);
                outputDiv.textContent = result.text || result;
            } catch (error) {
                outputDiv.textContent = 'Error: ' + error.message;
            }
        });
    </script>
</body>
</html>

Example 3Translate Foreign Language Audio to English

Automatically translate audio in any language into English text using the translation feature:

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        (async () => {
            const translation = await puter.ai.speech2txt({
                file: 'https://assets.puter.site/spanish-audio.mp3',
                translate: true,
                model: 'whisper-1'
            });
            
            puter.print('English translation: ' + translation.text);
        })();
    </script>
</body>
</html>

Example 4Speaker Diarization for Meetings

Identify different speakers in meeting recordings or interviews using the diarization feature:

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        (async () => {
            const meeting = await puter.ai.speech2txt({
                file: 'https://assets.puter.site/interview.ogg',
                model: 'gpt-4o-transcribe-diarize',
                response_format: 'diarized_json',
                chunking_strategy: 'auto'
            });

            meeting.segments.forEach(segment => {
                puter.print(`<strong>${segment.speaker}:</strong> ${segment.text}<br>`);
            });
        })();
    </script>
</body>
</html>

Example 5Choose Different Transcription Models

Puter.js supports multiple transcription models optimized for different use cases. Select the model that best fits your requirements:

// Fast, efficient transcription
puter.ai.speech2txt('https://assets.puter.site/speech.mp3', {
    model: 'gpt-4o-mini-transcribe'
}).then(result => {
    puter.print(result.text);
});

// High-quality transcription
puter.ai.speech2txt('https://assets.puter.site/speech.mp3', {
    model: 'gpt-4o-transcribe'
}).then(result => {
    puter.print(result.text);
});

// Transcription with speaker identification
puter.ai.speech2txt('https://assets.puter.site/interview.ogg', {
    model: 'gpt-4o-transcribe-diarize',
    response_format: 'diarized_json',
    chunking_strategy: 'auto'
}).then(result => {
    result.segments.forEach(seg => {
        puter.print(`${seg.speaker}: ${seg.text}`);
    });
});

Here's a full interactive example comparing different models:

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        const audioFile = 'https://assets.puter.site/speech.mp3';

        // Using gpt-4o-mini-transcribe
        puter.ai.speech2txt(audioFile, {
            model: 'gpt-4o-mini-transcribe'
        }).then(result => {
            puter.print('<h2>GPT-4o Mini Transcribe</h2>');
            puter.print(result.text);
        });

        // Using gpt-4o-transcribe
        puter.ai.speech2txt(audioFile, {
            model: 'gpt-4o-transcribe'
        }).then(result => {
            puter.print('<h2>GPT-4o Transcribe</h2>');
            puter.print(result.text);
        });

        // Using whisper-1
        puter.ai.speech2txt(audioFile, {
            model: 'whisper-1'
        }).then(result => {
            puter.print('<h2>Whisper-1</h2>');
            puter.print(result.text);
        });
    </script>
</body>
</html>

Example 6Advanced Options: Timestamps and Custom Formatting

Generate transcripts with timestamps or export in different formats like SRT for subtitles:

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <script>
        (async () => {
            // Get transcript with word-level timestamps
            const timestamped = await puter.ai.speech2txt({
                file: 'https://assets.puter.site/speech.mp3',
                model: 'whisper-1',
                response_format: 'verbose_json',
                timestamp_granularities: ['word', 'segment']
            });
            
            puter.print('<h3>Transcript with Timestamps:</h3>');
            timestamped.words.forEach(word => {
                puter.print(`[${word.start}s] ${word.word} `);
            });

            // Generate SRT subtitle file
            const subtitles = await puter.ai.speech2txt({
                file: 'https://assets.puter.site/speech.mp3',
                model: 'whisper-1',
                response_format: 'srt'
            });
            
            puter.print('<h3>SRT Subtitles:</h3>');
            puter.print(`<pre>${subtitles.text}</pre>`);
        })();
    </script>
</body>
</html>

Supported Transcription Models

The following speech-to-text models are available through Puter.js via the puter.ai.speech2txt() function:

gpt-4o-mini-transcribe
gpt-4o-transcribe
gpt-4o-transcribe-diarize
whisper-1

Supported Audio Formats

Puter.js accepts various audio sources:

  • Remote HTTPS URLs
  • File or Blob objects from browser uploads
  • Data URLs: data:audio/wav;base64,...
  • Puter paths: ~/Desktop/audio.mp3

That's everything you need! You now have free, unlimited access to professional speech-to-text capabilities using Puter.js. Convert audio to text, translate foreign languages, identify speakers, and generate subtitles—all without API keys or backend servers.

Related

Free, Serverless AI and Cloud

Start creating powerful web applications with Puter.js in seconds!

Get Started Now

Read the Docs Try the Playground