IBM Watson Speech to Text

A cloud-based speech recognition service from IBM Watson that converts audio into text using AI and machine learning. It provides accurate transcription and supports various use cases. It is designed for enterprise applications requiring robust, scalable, and customizable capabilities.

Model ID

ibm.standard

Use this ID when making API calls to reference this model

Provider

IBM

Model Type

general

Accuracy Tier

enhanced

Supported Languages

ar-MSzh-CNcs-CZnl-BEnl-NLen-AUen-INen-GBen-USfr-CAfr-FRde-DEhi-INit-ITja-JPko-KRpt-BRes-ESes-LAsv-SE
Automatic Language Detection: No
Performance & Cost

Cost

$1.20000/hour

$0.00033/second

Features

Supported capabilities and functionalities

Core Features

Punctuation
Diarization
Streaming
Speaker Labels
Word Timestamps
Confidence Scores
Custom Vocabulary
Profanity Filtering
Noise Reduction
Voice Activity Detection

Subtitle Formats

SRT Support
VTT Support
Technical Specifications

Input/output formats and technical details

Subtitle Format Support

No subtitle formats supported

Supported Audio Encodings

FLACMPEGMP3OggPCMWAVWebMOpusLINEAR16

Supported Sample Rates

8000 Hz16000 Hz