Getting Started with the Speechall API
This guide will walk you through the basic steps to make your first transcription request using the Speechall API.
Prerequisites
Before you begin, you will need:
- An API Key: Obtain your API key from the Speechall user dashboard.
- An audio file: A short audio file (e.g., MP3, WAV, M4A) to test the transcription.
- A tool to make HTTP requests: We’ll use
curl
in the examples, but you can use any programming language with an HTTP client.
Authentication
Authentication is done by including your API key in the Authorization
header of every request using the Bearer scheme.
The format is:
Authorization: Bearer YOUR_API_KEY
Replace YOUR_API_KEY
with the actual API key obtained from your Speechall dashboard.
Choosing a Model
The Speechall API supports multiple underlying STT models from various providers. Each model has a unique identifier in the format provider.model_name
(e.g., openai.whisper-1
, deepgram.nova-2
).
To see the list of all available models and their capabilities (supported languages, features, etc.), you can use the /speech-to-text-models
endpoint:
curl -X GET https://api.speechall.com/v1/speech-to-text-models
-H "Authorization: Bearer YOUR_API_KEY"
The response will be a JSON array listing the available models with their properties. Choose a model
identifier from this list for your transcription requests. For this guide, we’ll use openai.whisper-1
as an example.
Making Your First Transcription Request (Direct Upload)
The /transcribe
endpoint allows you to upload an audio file directly in the request body. This is the simplest way to transcribe a local file.
Endpoint: POST https://api.speechall.com/v1/transcribe
Parameters (Query String):
model
(required): The identifier of the model to use (e.g.,openai.whisper-1
).language
(optional): The language of the audio (e.g.,en
for English). Providing this helps accuracy. Defaults toen
. Useauto
for auto-detection if supported by the model.output_format
(optional): The desired output format. Usetext
,json_text
,json
,srt
, orvtt
. Defaults totext
.
Request Body:
The raw binary data of your audio file (Content-Type: audio/*
).
curl
Example:
Replace /path/to/your/audio.wav
with the actual path to your audio file and YOUR_API_KEY
with your key.
curl -X POST "https://api.speechall.com/v1/transcribe?model=openai.whisper-1&language=en&output_format=text"
-H "Authorization: Bearer YOUR_API_KEY"
--data-binary @/path/to/your/audio.wav
-H "Content-Type: audio/wav" # Adjust Content-Type based on your file type
-X POST
: Specifies the HTTP POST method."https://api.speechall.com/v1/transcribe?..."
: The full URL including query parameters formodel
,language
, andoutput_format
.-H "Authorization: Bearer YOUR_API_KEY"
: Includes your API key for authentication.--data-binary @/path/to/your/audio.wav
: Sends the raw binary content of the specified file as the request body. The@
prefix tellscurl
to read the content from the file.-H "Content-Type: audio/wav"
: Specifies the MIME type of the audio data being sent. Adjust this based on your file (e.g.,audio/mpeg
for MP3,audio/flac
for FLAC).
Understanding the Response
If the request is successful (HTTP status code 200), the response body will contain the transcription in the format you specified:
output_format=text
: The response body is plain text containing the full transcription.This is the transcribed text of your audio file.
output_format=json_text
: The response is a simple JSON object.{ "id": "txn_...", "text": "This is the transcribed text of your audio file." }
output_format=json
: The response is a detailed JSON object including segments, timestamps, detected language, and potentially speaker labels or provider metadata. See theTranscriptionDetailed
schema in the API reference for the full structure.{ "id": "txn_...", "text": "Hello world. This is a test.", "language": "en", "segments": [ { "start": 0.5, "end": 2.1, "text": "Hello world." }, { "start": 2.5, "end": 4.0, "text": "This is a test." } ] // ... other fields like provider_metadata }
output_format=srt
orvtt
: The response is plain text in the respective subtitle format.
Handling Errors
The API uses standard HTTP status codes to indicate the outcome of a request.
200 OK
: Request successful.400 Bad Request
: The request was invalid (e.g., missing required parameters, invalid value). The response body will contain a JSON object with an errormessage
.401 Unauthorized
: Authentication failed. Your API key is missing or invalid.404 Not Found
: The requested endpoint or a referenced resource (like aruleset_id
) was not found.429 Too Many Requests
: You have exceeded your rate limit. Check theRetry-After
header.5xx Server Errors
: An error occurred on the server side. Retrying might work.
For 4xx and 5xx errors, the response body is typically a JSON object like:
{
"message": "A description of the error.",
"code": "error_code_identifier" // Optional
}
Next Steps
You’ve successfully made your first transcription! Now you can explore other capabilities:
- Transcribe from a URL: Use the
/transcribe-remote
endpoint if your audio is already online. - OpenAI Compatible Endpoints: Integrate with systems expecting the OpenAI API structure using
/openai-compatible/audio/transcriptions
and/openai-compatible/audio/translations
. - Apply Replacement Rules: Create rulesets with
/replacement-rulesets
and apply them using theruleset_id
parameter on transcription endpoints. - Experiment with Features: Try enabling
diarization
, requestingword
leveltimestamp_granularity
, or usingcustom_vocabulary
with supported models. - Explore More Models: Use the
/speech-to-text-models
endpoint to find models that best fit your language, required features (like diarization), and performance needs. - Use our TypeScript SDK: For developers working with JavaScript or TypeScript, our official TypeScript SDK offers a convenient way to integrate with the Speechall API, providing type safety and simplified request management.
Refer to the full API Reference (generated automatically from the OpenAPI document) for complete details on all endpoints, parameters, schemas, and response formats.