Speech AI Summit

Free digital event, hosted by NVIDIA

November 2, 2022, 9:00 a.m.–2:00 p.m. PT

Join us for an engaging online conversation with experts from Google, Meta, NVIDIA, and more on trends and techniques in automatic speech recognition (ASR) and text-to-speech (TTS) technologies.

Speech AI is becoming ubiquitous, helping businesses around the world power virtual assistants, scale call centers, take meeting notes, enhance AR experiences, and more. During the summit, you’ll hear from speech AI leaders on their latest work - research, production, and open source - to make speech AI more accurate, engaging, and globally accessible.

Get a Free, Self-Paced Course

The first 100 summit registrants will receive credit for a free online, self-paced speech AI course from the NVIDIA Deep Learning Institute. All summit registrants will receive a discount on the course. See terms and conditions.


Dr. Ahmad Bazzi

Dr. Ahmad Bazzi

Research Associate at NYU Abu Dhabi | YouTube Educator

Edresson Casanova

Edresson Casanova

TTS Deep Learning Engineer | Coqui.ai

Michael Davies

Michael Davies

SVP, Technical and Field Operations| FOX Sports

Caroline de Brito Gottlieb

Caroline de Brito Gottlieb

Product Manager, Data Strategy - Riva | NVIDIA

EM Lewis-Jong

EM Lewis-Jong

Product Lead, Mozilla Foundation | Common Voice

Abdelrahman Mohamed

Abdelrahman Mohamed

Research Scientist in FAIR | Meta

Tara Sainath

Tara Sainath

Principal Research Scientist | Google

Dr. Sunil Sivadas

Dr. Sunil Sivadas

Practice Lead, NEXT Product & Platforms, Next Gen Tech | NCS

David Weinstein

David Weinstein

Director of XR | NVIDIA


9:00 – 9:45 a.m. PT

Unlocking Speech AI Technology for Global Language Users

Voice-enabled technology is becoming ubiquitous. But many are being left behind by an anglocentric and demographically biased algorithmic world. Mozilla Common Voice and NVIDIA are collaborating to change that. By partnering on a public crowdsourced multilingual speech corpus - now the largest of its kinds in the world - and open source pre-trained models, we are making it easier than ever before to build Automatic Speech Recognition that works for speakers of the world's many languages. 

EM Lewis-Jong, Product Lead, Mozilla Foundation | Common Voice
Caroline de Brito Gottlieb, Product Manager, Data Strategy - Riva | NVIDIA

10:00 – 10:45 a.m. PT

Overview of Zero-Shot Multi-speaker TTS Systems

Text-to-Speech (TTS) systems have significantly advanced in recent years with deep learning approaches, these advances have motivated research that aims to synthesize speech into the voice of a target speaker using just a few seconds of speech. This approach is called Zero-Shot Multi-speaker TTS. In this talk, we will explore the timeline and the state-of-the-art on this task. 

Edresson Casanova, TTS Deep Learning Engineer | Coqui.ai

11:00 – 11:45 a.m. PT

Speech representation learning and the emergence of Textless NLP research

Speech representation learning approaches achieved a groundbreaking performance in low-resource, multilingual, and audio-visual speech recognition. Furthermore, they opened the door for a new Textless NLP research direction that harnesses rich nonverbal information in human interaction missing in pretraining text resources and facilitates working with languages and dialects without standard orthography. This talk highlights work on both directions, connect them, and discusses future directions. 

Abdelrahman Mohamed, Research Scientist in FAIR | Meta

12:00 - 12:45 p.m. PT

End-to-End Speech Recognition: The Journey from Research to Production

End-to-end (E2E) speech recognition has become a popular research paradigm in recent years, allowing the modular components of a conventional speech recognition system (acoustic model, pronunciation model, language model), to be replaced by one neural network. In this talk, we will discuss a multi-year research journey of E2E modeling for speech recognition at Google. This journey has resulted in E2E models that can surpass the performance of conventional models across many different quality and latency metrics, as well as the productionization of E2E models for Pixel 4, 5 and 6 phones. We will also touch upon future research efforts with E2E models, including multi-lingual speech recognition. 

Tara Sainath, Principal Research Scientist | Google

1:00 – 1:45 p.m. PT

Speech AI in Industry Panel

Hear from industry leaders about new interesting applications of speech AI in industry as well as challenges, tips, and emerging trends.

Michael Davies, SVP, Technical and Field Operations| FOX Sports
Dr. Sunil Sivadas, Practice Lead, NEXT Product & Platforms, Next Gen Tech | NCS
David Weinstein, Director of XR | NVIDIA
Moderated by Dr. Ahmad Bazzi, Research Associate at NYU Abu Dhabi | YouTube Educator

Featured Resources

Learn more about Speech AI ebook

Intro to Speech AI E-book

Explore the fundamentals of ASR and TTS  and how they are used in various industries. 

Learn more about Speech AI terminologies

Essential Guide to Speech AI Terminology

Get an overview of the important terminologies in the world of speech AI.

Speech AI User Stories

Speech AI User Stories

Dive into real-life speech AI use cases for contact center, video conferencing, and more.

NVIDIA Riva Automatic Speech Recognition

Build Speech Apps at Scale with NVIDIA Riva

Try NVIDIA® Riva’s interactive ASR and TTS demos to see, in real time, how Riva delivers highly accurate transcription and natural-sounding, expressive voices.

Sign up to receive the latest speech AI news from NVIDIA