Speechall - Unified Speech-to-Text API Platform

Speech-to-Text Streaming

Real-time Audio Transcription with WebSocket

The Speechall API provides a WebSocket endpoint for real-time speech-to-text transcription, enabling you to stream audio and receive transcription results as they become available. This is ideal for live applications, voice assistants, and real-time communication tools.

💡 Want to see it in action? Check out our complete React TypeScript example that demonstrates WebSocket streaming with voice-controlled HTML generation.

WebSocket Endpoint

The streaming endpoint mirrors the functionality of the REST /transcribe endpoint but uses WebSocket protocol for bidirectional communication:

wss://api.speechall.com/v1/transcribe

Note: Use wss:// (WebSocket Secure) instead of https:// for the streaming endpoint.

Supported Providers

The streaming service currently supports four speech-to-text providers:

Assembly AI - High-accuracy transcription with speaker diarization
Deepgram - Low-latency streaming optimized for real-time applications
OpenAI - Whisper and GPT-4o models for transcription with multilingual support
Gladia - Advanced AI transcription with custom vocabulary support

Audio Requirements

For optimal performance and compatibility, ensure your audio meets these specifications:

Sample Rate: 16,000 Hz (16 kHz)
Channels: Mono (single channel)
Encoding: 16-bit PCM
Format: Raw binary audio chunks

Connection Parameters

When establishing a WebSocket connection, provide transcription configuration as query parameters, just like the REST endpoint:

const wsUrl = new URL('wss://api.speechall.com/v1/transcribe');
wsUrl.searchParams.set('model', 'assemblyai.best');
wsUrl.searchParams.set('language', 'en');
wsUrl.searchParams.set('output_format', 'json');
wsUrl.searchParams.set('punctuation', 'true');

const ws = new WebSocket(wsUrl.toString());

Authentication

Important for Browser Applications: Browsers do not allow custom headers (including Authorization headers) to be set on WebSocket requests. If you’re making WebSocket requests from a browser, you must include your API key as a query parameter:

const wsUrl = new URL('wss://api.speechall.com/v1/transcribe');
wsUrl.searchParams.set('api_key', apiKey);
wsUrl.searchParams.set('model', 'deepgram.nova-2');
wsUrl.searchParams.set('language', 'en');

const ws = new WebSocket(wsUrl.toString());

For server-side applications (Node.js, etc.), you can use either method:

Query parameter: api_key=YOUR_API_KEY
Authorization header: Authorization: Bearer YOUR_API_KEY

Available Parameters

Parameter	Type	Description	Default
`api_key`	string	Your API key (required for browser requests)	Required
`model`	string	Provider and model identifier (e.g., `assemblyai.best`, `deepgram.nova-2`)	Required
`language`	string	Language code in ISO 639-1 format (e.g., `en`, `es`) or `auto` for detection	`en`
`output_format`	string	Response format: `text`, `json`, or `verbose_json`	`text`
`punctuation`	boolean	Enable automatic punctuation	`true`
`diarization`	boolean	Enable speaker diarization	`false`
`temperature`	number	Controls randomness (0-1)	-
`initial_prompt`	string	Text prompt to guide the model	-
`speakers_expected`	integer	Expected number of speakers (1-10)	-
`custom_vocabulary`	array	List of custom words/phrases for better recognition	-

Basic Usage Example

Here’s a complete example of how to use the WebSocket streaming API in a browser:

// Establish WebSocket connection with parameters (browser-compatible)
const apiKey = 'YOUR_API_KEY';
const wsUrl = new URL('wss://api.speechall.com/v1/transcribe');
wsUrl.searchParams.set('api_key', apiKey);
wsUrl.searchParams.set('model', 'deepgram.nova-2');
wsUrl.searchParams.set('language', 'en');
wsUrl.searchParams.set('output_format', 'json');
wsUrl.searchParams.set('punctuation', 'true');

const ws = new WebSocket(wsUrl.toString());

// Handle connection events
ws.onopen = () => {
    console.log('WebSocket connected');
    startAudioCapture();
};

ws.onmessage = (event) => {
    if (wsUrl.searchParams.get('output_format') === 'json') {
        const transcription = JSON.parse(event.data);
        console.log('Transcription:', transcription.text);
    } else {
        // Plain text response
        console.log('Transcription:', event.data);
    }
};

ws.onerror = (error) => {
    console.error('WebSocket error:', error);
};

ws.onclose = () => {
    console.log('WebSocket connection closed');
};

// Send audio chunks
function sendAudioChunk(audioBuffer) {
    if (ws.readyState === WebSocket.OPEN) {
        ws.send(audioBuffer);
    }
}

Audio Capture Example

Here’s how to capture audio from the microphone and send it to the WebSocket:

async function startAudioCapture() {
    try {
        const stream = await navigator.mediaDevices.getUserMedia({
            audio: {
                sampleRate: 16000,
                channelCount: 1,
                echoCancellation: true,
                noiseSuppression: true
            }
        });

        const audioContext = new AudioContext({ sampleRate: 16000 });
        const source = audioContext.createMediaStreamSource(stream);
        const processor = audioContext.createScriptProcessor(4096, 1, 1);

        processor.onaudioprocess = (event) => {
            const inputBuffer = event.inputBuffer;
            const inputData = inputBuffer.getChannelData(0);
            
            // Convert float32 to int16
            const int16Array = new Int16Array(inputData.length);
            for (let i = 0; i < inputData.length; i++) {
                int16Array[i] = Math.max(-32768, Math.min(32767, inputData[i] * 32768));
            }
            
            // Send audio chunk to WebSocket
            sendAudioChunk(int16Array.buffer);
        };

        source.connect(processor);
        processor.connect(audioContext.destination);
        
    } catch (error) {
        console.error('Error accessing microphone:', error);
    }
}

Response Formats

JSON Format (`output_format: 'json'`)

When using JSON output format, you’ll receive structured responses:

{
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "text": "Hello, this is a test transcription.",
    "language": "en",
    "words": [
        {
            "text": "Hello",
            "start": 0.0,
            "end": 0.5,
            "confidence": 0.98
        }
    ]
}

Text Format (`output_format: 'text'`)

With text format, you’ll receive plain text chunks:

Hello, this is a test transcription.

Best Practices

Buffer Management: Implement proper audio buffering to handle network latency
Error Handling: Always handle WebSocket errors and implement reconnection logic
Audio Quality: Ensure good microphone quality and minimize background noise
Chunk Size: Send audio chunks of 4096 samples (256ms at 16kHz) for optimal performance
Connection Management: Close WebSocket connections properly when done

Error Handling

ws.onerror = (error) => {
    console.error('WebSocket error:', error);
    // Implement reconnection logic
    setTimeout(() => {
        reconnectWebSocket();
    }, 1000);
};

ws.onclose = (event) => {
    if (event.code !== 1000) {
        console.log('Connection closed unexpectedly:', event.code, event.reason);
        // Implement reconnection logic
    }
};

SDK Support

WebSocket streaming support is coming soon to our official SDKs:

TypeScript SDK - Real-time streaming with TypeScript support
Python SDK - Async WebSocket client for Python applications

Stay tuned for updates on SDK availability.

Rate Limits and Quotas

The same rate limits and quotas that apply to the REST API also apply to the WebSocket streaming endpoint. Monitor your usage through the console dashboard.

Troubleshooting

Common Issues:

Audio Format: Ensure audio is 16kHz, mono, 16-bit PCM
Authentication:
- For browser applications: Use api_key query parameter (headers are not supported)
- For server applications: Use either Authorization header or api_key query parameter
Network: Check for firewall restrictions on WebSocket connections
Browser Support: Ensure WebSocket and MediaDevices API support

Need Help?

If you encounter issues with the streaming API, please contact our support team with details about your implementation and any error messages.