Watson Speech to Text

A cloud-based speech recognition service from IBM Watson that converts audio into text using AI and machine learning. It provides accurate transcription and supports various use cases. It is designed for enterprise applications requiring robust, scalable, and customizable capabilities.

Provider

IBM

Model Type

general

Accuracy Tier

enhanced

Supported Languages

ar-MSzh-CNcs-CZnl-BEnl-NLen-AUen-INen-GBen-USfr-CAfr-FRde-DEhi-INit-ITja-JPko-KRpt-BRes-ESes-LAsv-SE
Performance & Cost

Cost

$1.20000/hour

$0.00033/second

Features

Supported capabilities and functionalities

Core Features

Punctuation
Diarization
Streaming
Speaker Labels
Word Timestamps
Confidence Scores
Language Detection
Custom Vocabulary
Profanity Filtering
Noise Reduction
Technical Specifications

Input/output formats and technical details

Supported Output Formats

textjson

Supported Audio Encodings

OggWebMMP3WAVFLACLINEAR16G.729A-LawMu-lawBasic audio

Supported Sample Rates

8000 Hz16000 Hz