Getting Started with the Speechall API

This guide will walk you through the basic steps to make your first transcription request using the Speechall API.

Prerequisites

Before you begin, you will need:

  1. An API Key: Obtain your API key from the Speechall user dashboard.
  2. An audio file: A short audio file (e.g., MP3, WAV, M4A) to test the transcription.
  3. A tool to make HTTP requests: We’ll use curl in the examples, but you can use any programming language with an HTTP client.

Authentication

Authentication is done by including your API key in the Authorization header of every request using the Bearer scheme.

The format is:

Authorization: Bearer YOUR_API_KEY

Replace YOUR_API_KEY with the actual API key obtained from your Speechall dashboard.

Choosing a Model

The Speechall API supports multiple underlying STT models from various providers. Each model has a unique identifier in the format provider.model_name (e.g., openai.whisper-1, deepgram.nova-2).

To see the list of all available models and their capabilities (supported languages, features, etc.), you can use the /speech-to-text-models endpoint:

curl -X GET https://api.speechall.com/v1/speech-to-text-models 
     -H "Authorization: Bearer YOUR_API_KEY"

The response will be a JSON array listing the available models with their properties. Choose a model identifier from this list for your transcription requests. For this guide, we’ll use openai.whisper-1 as an example.

Making Your First Transcription Request (Direct Upload)

The /transcribe endpoint allows you to upload an audio file directly in the request body. This is the simplest way to transcribe a local file.

Endpoint: POST https://api.speechall.com/v1/transcribe

Parameters (Query String):

  • model (required): The identifier of the model to use (e.g., openai.whisper-1).
  • language (optional): The language of the audio (e.g., en for English). Providing this helps accuracy. Defaults to en. Use auto for auto-detection if supported by the model.
  • output_format (optional): The desired output format. Use text, json_text, json, srt, or vtt. Defaults to text.

Request Body:

The raw binary data of your audio file (Content-Type: audio/*).

curl Example:

Replace /path/to/your/audio.wav with the actual path to your audio file and YOUR_API_KEY with your key.

curl -X POST "https://api.speechall.com/v1/transcribe?model=openai.whisper-1&language=en&output_format=text" 
     -H "Authorization: Bearer YOUR_API_KEY" 
     --data-binary @/path/to/your/audio.wav 
     -H "Content-Type: audio/wav" # Adjust Content-Type based on your file type
  • -X POST: Specifies the HTTP POST method.
  • "https://api.speechall.com/v1/transcribe?...": The full URL including query parameters for model, language, and output_format.
  • -H "Authorization: Bearer YOUR_API_KEY": Includes your API key for authentication.
  • --data-binary @/path/to/your/audio.wav: Sends the raw binary content of the specified file as the request body. The @ prefix tells curl to read the content from the file.
  • -H "Content-Type: audio/wav": Specifies the MIME type of the audio data being sent. Adjust this based on your file (e.g., audio/mpeg for MP3, audio/flac for FLAC).

Understanding the Response

If the request is successful (HTTP status code 200), the response body will contain the transcription in the format you specified:

  • output_format=text: The response body is plain text containing the full transcription.
    This is the transcribed text of your audio file.
  • output_format=json_text: The response is a simple JSON object.
    {
      "id": "txn_...",
      "text": "This is the transcribed text of your audio file."
    }
  • output_format=json: The response is a detailed JSON object including segments, timestamps, detected language, and potentially speaker labels or provider metadata. See the TranscriptionDetailed schema in the API reference for the full structure.
    {
      "id": "txn_...",
      "text": "Hello world. This is a test.",
      "language": "en",
      "segments": [
        {
          "start": 0.5,
          "end": 2.1,
          "text": "Hello world."
        },
        {
          "start": 2.5,
          "end": 4.0,
          "text": "This is a test."
        }
      ]
      // ... other fields like provider_metadata
    }
  • output_format=srt or vtt: The response is plain text in the respective subtitle format.

Handling Errors

The API uses standard HTTP status codes to indicate the outcome of a request.

  • 200 OK: Request successful.
  • 400 Bad Request: The request was invalid (e.g., missing required parameters, invalid value). The response body will contain a JSON object with an error message.
  • 401 Unauthorized: Authentication failed. Your API key is missing or invalid.
  • 404 Not Found: The requested endpoint or a referenced resource (like a ruleset_id) was not found.
  • 429 Too Many Requests: You have exceeded your rate limit. Check the Retry-After header.
  • 5xx Server Errors: An error occurred on the server side. Retrying might work.

For 4xx and 5xx errors, the response body is typically a JSON object like:

{
  "message": "A description of the error.",
  "code": "error_code_identifier" // Optional
}

Next Steps

You’ve successfully made your first transcription! Now you can explore other capabilities:

  • Transcribe from a URL: Use the /transcribe-remote endpoint if your audio is already online.
  • OpenAI Compatible Endpoints: Integrate with systems expecting the OpenAI API structure using /openai-compatible/audio/transcriptions and /openai-compatible/audio/translations.
  • Apply Replacement Rules: Create rulesets with /replacement-rulesets and apply them using the ruleset_id parameter on transcription endpoints.
  • Experiment with Features: Try enabling diarization, requesting word level timestamp_granularity, or using custom_vocabulary with supported models.
  • Explore More Models: Use the /speech-to-text-models endpoint to find models that best fit your language, required features (like diarization), and performance needs.
  • Use our TypeScript SDK: For developers working with JavaScript or TypeScript, our official TypeScript SDK offers a convenient way to integrate with the Speechall API, providing type safety and simplified request management.

Refer to the full API Reference (generated automatically from the OpenAPI document) for complete details on all endpoints, parameters, schemas, and response formats.