Azure AI Speech-to-Text

Model Information

Azure's default, general-purpose speech-to-text model, trained on a vast amount of Microsoft-owned data. It is suitable for conversational and dictation scenarios and supports both real-time and batch transcription.

Model ID

azure.standard

Use this ID when making API calls to reference this model

Provider

Azure

Model Type

general

Accuracy Tier

standard

Supported Languages

autoaf-ZAam-ETar-AEar-BHar-DZar-EGar-ILar-IQar-JOar-KWar-LBar-LYar-MAar-OMar-QAar-SAar-SYar-TNar-YEaz-AZbg-BGbn-BDbn-INbs-BAca-EScs-CZcy-GBda-DKde-ATde-CHde-DEel-GRen-AUen-CAen-GBen-HKen-IEen-INen-KEen-NGen-NZen-PHen-SGen-TZen-USen-ZAes-ARes-BOes-CLes-COes-CRes-CUes-DOes-ECes-ESes-GTes-HNes-MXes-NIes-PAes-PEes-PRes-PYes-SVes-UYes-VEet-EEeu-ESfa-IRfi-FIfil-PHfr-BEfr-CAfr-CHfr-FRga-IEgl-ESgu-INhe-ILhi-INhr-HRhu-HUid-IDis-ISit-ITja-JPjv-IDka-GEkk-KZkm-KHkn-INko-KRlo-LAlt-LTlv-LVmk-MKml-INmn-MNmr-INms-MYmt-MTmy-MMnb-NOne-NPnl-BEnl-NLpl-PLps-AFpt-BRpt-PTro-ROru-RUsi-LKsk-SKsl-SIso-SOsq-ALsr-RSsv-SEsw-KEsw-TZta-INte-INth-THtr-TRuk-UAur-PKuz-UZvi-VNzh-CNzh-HKzh-TWzu-ZA

Automatic Language Detection: Yes

Performance & Cost

Cost

$0.18000/hour

$0.00005/second

Maximum Duration

2h 0m

Maximum File Size

250.00 MB

Features

Supported capabilities and functionalities

Core Features

Punctuation

Diarization

Streaming

Speaker Labels

Word Timestamps

Confidence Scores

Custom Vocabulary

Profanity Filtering

Noise Reduction

Voice Activity Detection

Subtitle Formats

SRT Support

VTT Support

Technical Specifications

Input/output formats and technical details

Subtitle Format Support

SRT VTT

Supported Audio Encodings

MP3MP4MPEGMPGAM4AWAVWebMOgg

Supported Sample Rates

8000 Hz16000 Hz