IBM Watson Speech to Text

Model Information

A cloud-based speech recognition service from IBM Watson that converts audio into text using AI and machine learning. It provides accurate transcription and supports various use cases. It is designed for enterprise applications requiring robust, scalable, and customizable capabilities.

Model ID

ibm.standard

Use this ID when making API calls to reference this model

Provider

IBM

Model Type

general

Accuracy Tier

enhanced

Supported Languages

ar-MSzh-CNcs-CZnl-BEnl-NLen-AUen-INen-GBen-USfr-CAfr-FRde-DEhi-INit-ITja-JPko-KRpt-BRes-ESes-LAsv-SE

Automatic Language Detection: No

Performance & Cost

Cost

$1.20000/hour

$0.00033/second

Features

Supported capabilities and functionalities

Core Features

Punctuation

Diarization

Streaming

Speaker Labels

Word Timestamps

Confidence Scores

Custom Vocabulary

Profanity Filtering

Noise Reduction

Voice Activity Detection

Subtitle Formats

SRT Support

VTT Support

Technical Specifications

Input/output formats and technical details

Subtitle Format Support

No subtitle formats supported

Supported Audio Encodings

FLACMPEGMP3OggPCMWAVWebMOpusLINEAR16

Supported Sample Rates

8000 Hz16000 Hz