GPT-4o Transcribe

Speech-to-text model powered by GPT-4o. Offers improvements in word error rate, language recognition, and accuracy compared to original Whisper models.

Provider

OpenAI

Model Type

general

Accuracy Tier

premium

Release Date

March 20, 2025

Supported Languages

No language information available
Performance & Cost

Cost

$0.36000/hour

$0.00010/second

Maximum File Size

25.00 MB

Features

Supported capabilities and functionalities

Core Features

Punctuation
Diarization
Streaming
Speaker Labels
Word Timestamps
Confidence Scores
Language Detection
Custom Vocabulary
Profanity Filtering
Noise Reduction
Technical Specifications

Input/output formats and technical details

Supported Output Formats

jsontext

Supported Audio Encodings

MP3MP4MPEGMPGAM4AWAVWEBM