Main Content

Speech Transcription and Synthesis

Use a pretrained model or third-party APIs for text-to-speech and speech-to-text

Audio Toolbox™ provides examples for small-vocabulary recognition and sound synthesis. Use the wav2vec 2.0 pretrained network to perform general speech-to-text transcription with speech2text. You can download Audio Toolbox extended functionality from File Exchange for text-to-speech and speech-to-text through interfaces to popular third-party APIs. Supported APIs include Google® Speech, IBM® Watson Speech, and Microsoft® Azure Speech.

You can interact with speech-to-text functionality graphically in the Signal Labeler app to quickly label regions of speech.

Apps

Signal LabelerLabel signal attributes, regions, and points of interest, and extract features

Functions

speech2textTranscribe speech signal to text
text2speechSynthesize speech from text
speechClientInterface with pretrained model or third-party speech service

Topics