ElevenLabs Scribe v1

Scribe is a speech-to-text model built for accuracy and handling real-world audio. It supports 99 languages and features word-level timestamps, speaker diarization, and audio-event tagging.

Model ID

elevenlabs.scribe-v1

Use this ID when making API calls to reference this model

Provider

ElevenLabs

Model Type

general

Accuracy Tier

premium

Release Date

February 26, 2025

Supported Languages

afarhyazbebsbgcazhhrcsdanlenetfifrgldeelhehihuisiditjaknkkkolvltmkmsmrminenofaplptrorusrskslesswsvtltathtrukurvicy
Automatic Language Detection: Yes
Performance & Cost

Cost

$0.40000/hour

$0.00011/second

Maximum Duration

2h 0m

Maximum File Size

953.67 MB

Features

Supported capabilities and functionalities

Core Features

Punctuation
Diarization
Streaming
Speaker Labels
Word Timestamps
Confidence Scores
Custom Vocabulary
Profanity Filtering
Noise Reduction
Voice Activity Detection

Subtitle Formats

SRT Support
VTT Support
Technical Specifications

Input/output formats and technical details

Subtitle Format Support

SRT VTT

Supported Audio Encodings

mp3mp4