Azure AI Speech-to-Text

Azure's default, general-purpose speech-to-text model, trained on a vast amount of Microsoft-owned data. It is suitable for conversational and dictation scenarios and supports both real-time and batch transcription.

Provider

Azure

Model Type

general

Accuracy Tier

standard

Supported Languages

autoaf-ZAam-ETar-AEar-BHar-DZar-EGar-ILar-IQar-JOar-KWar-LBar-LYar-MAar-OMar-QAar-SAar-SYar-TNar-YEaz-AZbg-BGbn-BDbn-INbs-BAca-EScs-CZcy-GBda-DKde-ATde-CHde-DEel-GRen-AUen-CAen-GBen-HKen-IEen-INen-KEen-NGen-NZen-PHen-SGen-TZen-USen-ZAes-ARes-BOes-CLes-COes-CRes-CUes-DOes-ECes-ESes-GTes-HNes-MXes-NIes-PAes-PEes-PRes-PYes-SVes-UYes-VEet-EEeu-ESfa-IRfi-FIfil-PHfr-BEfr-CAfr-CHfr-FRga-IEgl-ESgu-INhe-ILhi-INhr-HRhu-HUid-IDis-ISit-ITja-JPjv-IDka-GEkk-KZkm-KHkn-INko-KRlo-LAlt-LTlv-LVmk-MKml-INmn-MNmr-INms-MYmt-MTmy-MMnb-NOne-NPnl-BEnl-NLpl-PLps-AFpt-BRpt-PTro-ROru-RUsi-LKsk-SKsl-SIso-SOsq-ALsr-RSsv-SEsw-KEsw-TZta-INte-INth-THtr-TRuk-UAur-PKuz-UZvi-VNzh-CNzh-HKzh-TWzu-ZA
Performance & Cost

Cost

$0.18000/hour

$0.00005/second

Maximum Duration

5h 0m

Maximum File Size

2.00 GB

Features

Supported capabilities and functionalities

Core Features

Punctuation
Diarization
Streaming
Speaker Labels
Word Timestamps
Confidence Scores
Language Detection
Custom Vocabulary
Profanity Filtering
Noise Reduction
Technical Specifications

Input/output formats and technical details

Supported Output Formats

textjsonsrtvtt

Supported Audio Encodings

LINEAR16FLACMP3OpusWAV

Supported Sample Rates

8000 Hz16000 Hz