eBook

End-to-End Speech AI Pipelines

An in-depth explainer of ASR and TTS, the two main components of Speech AI.

What’s Included in This eBook?

Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) are the two most essential Speech AI technologies. Each of these technological pipelines includes multiple stages, such as data preprocessing, deep learning models, and post-processing. This eBook details what occurs in each of their individual components and how to evaluate the performance of these technologies.

Getting Around Faster and Easier

What Is Automatic Speech Recognition?

ASR, also known as speech-to-text, is the process of automatically converting spoken audio into written form.

How Does a Speech AI System Work?

What Is Text-to-Speech?

TTS, also known as speech synthesis, takes text as an input and generates a human-like synthesized voice.

How is Speech AI being used in Industries?

How Do I Evaluate ASR and TTS?

Metrics, such as word error rate (WER) and mean opinion score (MOS), are used to assess the performance of ASR and TTS pipelines, respectively.

Register to Download

Send me the latest news, announcements, and more from NVIDIA about Enterprise Business Solutions.
Send me the latest news, announcements, and more from NVIDIA about Developer Technology & Tools.

Send me the latest news, announcements, and more from NVIDIA about Enterprise Business Solutions and Developer Technology & Tools.

Send me the latest news, announcements, and more from NVIDIA about: