Speech-to-Text
Providing Audio
This guide provides details on how to best provide audio data to the Speechall API for transcription, focusing on performance considerations for different methods.
The Speechall API offers several ways to send your audio for transcription:
Direct File Upload (
POST /transcribe
)- Method: You send the raw binary audio data directly in the request body (
Content-Type: audio/*
). Transcription options likemodel
,language
,output_format
, etc., are provided as query parameters. - Performance: This method generally offers the lowest latency for transcribing local files. Because the audio data is sent directly as the request body, the API can begin processing the stream of bytes as soon as they arrive, without needing to fully parse a complex multi-part form structure first.
- Best For: Latency-sensitive applications, command-line usage, direct integration from file systems.
- Method: You send the raw binary audio data directly in the request body (
OpenAI-Compatible Upload (
POST /openai-compatible/audio/transcriptions
,POST /openai-compatible/audio/translations
)- Method: You send the audio file and transcription options (like
model
,prompt
,response_format
) within amultipart/form-data
request body, mimicking the structure of OpenAI’s audio API. - Performance: While highly convenient for compatibility, processing
multipart/form-data
adds a small initial overhead. The API server must parse the form parts to extract the audio file and options before it can send the audio to the transcription engine. This introduces a slight latency penalty compared to the direct upload method, though often negligible for non-realtime applications. - Best For: Compatibility with OpenAI SDKs or tools designed for OpenAI’s audio API structure.
- Method: You send the audio file and transcription options (like
Remote URL Transcription (
POST /transcribe-remote
)- Method: You provide the URL of a publicly accessible audio file in the JSON request body (
file_url
) along with other transcription options. The Speechall API server then fetches the audio file from the provided URL. - Requirement: The audio file must be hosted at a publicly accessible URL that our API servers can reach via standard HTTP/S requests.
- Handling Private URLs: If your audio files are in private storage (e.g., within a VPC, behind a firewall, or requiring signed URLs) and cannot be made publicly accessible, please contact Speechall support. We can discuss potential solutions or specific arrangements to accommodate your requirements. Your feedback also helps us prioritize features like pre-signed URL support or private network connectivity options.
- Best For: Transcribing audio files already stored online (cloud storage, web servers), integrating with systems that generate public links.
- Method: You provide the URL of a publicly accessible audio file in the JSON request body (
Choose the method that best suits your technical setup, latency requirements, and integration needs. For the absolute lowest transcription latency from a local file, the POST /transcribe
endpoint with direct binary upload is recommended.