This site requires Javascript in order to view all its content. Please enable Javascript in order to access all the functionality of this web site. Here are the instructions how to enable JavaScript in your web browser.

Speech AI Summit

Free digital event, hosted by NVIDIA

November 2, 2022, 9:00 a.m.–2:00 p.m. PT

Watch Replays

Join us for an engaging online conversation with experts from Google, Meta, NVIDIA, and more on trends and techniques in automatic speech recognition (ASR) and text-to-speech (TTS) technologies.

Speech AI is becoming ubiquitous, helping businesses around the world power virtual assistants, scale call centers, take meeting notes, enhance AR experiences, and more. During the summit, you’ll hear from speech AI leaders on their latest work - research, production, and open source - to make speech AI more accurate, engaging, and globally accessible.

Get a Free, Self-Paced Course

The first 100 summit registrants will receive credit for a free online, self-paced speech AI course from the NVIDIA Deep Learning Institute. All summit registrants will receive a discount on the course. See terms and conditions.

Speakers

Dr. Ahmad Bazzi

Research Associate at NYU Abu Dhabi | YouTube Educator

Edresson Casanova

TTS Deep Learning Engineer | Coqui.ai

Michael Davies

SVP, Technical and Field Operations| FOX Sports

Caroline de Brito Gottlieb

Product Manager, Data Strategy - Riva | NVIDIA

EM Lewis-Jong

Product Lead, Mozilla Foundation | Common Voice

Abdelrahman Mohamed

Research Scientist in FAIR | Meta

Tara Sainath

Principal Research Scientist | Google

Dr. Sunil Sivadas

Practice Lead, NEXT Product & Platforms, Next Gen Tech | NCS

David Weinstein

Director of XR | NVIDIA

Sessions

9:00 – 9:45 a.m. PT

Unlocking Speech AI Technology for Global Language Users

Watch Replay

Voice-enabled technology is becoming ubiquitous. But many are being left behind by an anglocentric and demographically biased algorithmic world. Mozilla Common Voice and NVIDIA are collaborating to change that. By partnering on a public crowdsourced multilingual speech corpus - now the largest of its kinds in the world - and open source pre-trained models, we are making it easier than ever before to build Automatic Speech Recognition that works for speakers of the world's many languages.

EM Lewis-Jong, Product Lead, Mozilla Foundation | Common Voice

Caroline de Brito Gottlieb, Product Manager, Data Strategy - Riva | NVIDIA

10:00 – 10:45 a.m. PT

Overview of Zero-Shot Multi-speaker TTS Systems

Watch Replay

Text-to-Speech (TTS) systems have significantly advanced in recent years with deep learning approaches, these advances have motivated research that aims to synthesize speech into the voice of a target speaker using just a few seconds of speech. This approach is called Zero-Shot Multi-speaker TTS. In this talk, we will explore the timeline and the state-of-the-art on this task.

Edresson Casanova, TTS Deep Learning Engineer | Coqui.ai

11:00 – 11:45 a.m. PT

Speech representation learning and the emergence of Textless NLP research

Watch Replay

Speech representation learning approaches achieved a groundbreaking performance in low-resource, multilingual, and audio-visual speech recognition. Furthermore, they opened the door for a new Textless NLP research direction that harnesses rich nonverbal information in human interaction missing in pretraining text resources and facilitates working with languages and dialects without standard orthography. This talk highlights work on both directions, connect them, and discusses future directions.

Abdelrahman Mohamed, Research Scientist in FAIR | Meta

12:00 - 12:45 p.m. PT

End-to-End Speech Recognition: The Journey from Research to Production

Watch Replay

End-to-end (E2E) speech recognition has become a popular research paradigm in recent years, allowing the modular components of a conventional speech recognition system (acoustic model, pronunciation model, language model), to be replaced by one neural network. In this talk, we will discuss a multi-year research journey of E2E modeling for speech recognition at Google. This journey has resulted in E2E models that can surpass the performance of conventional models across many different quality and latency metrics, as well as the productionization of E2E models for Pixel 4, 5 and 6 phones. We will also touch upon future research efforts with E2E models, including multi-lingual speech recognition.

Tara Sainath, Principal Research Scientist | Google

1:00 – 1:45 p.m. PT

Speech AI in Industry Panel

Watch Replay

Hear from industry leaders about new interesting applications of speech AI in industry as well as challenges, tips, and emerging trends.

Michael Davies, SVP, Technical and Field Operations| FOX Sports

Dr. Sunil Sivadas, Practice Lead, NEXT Product & Platforms, Next Gen Tech | NCS

David Weinstein, Director of XR | NVIDIA

Moderated by Dr. Ahmad Bazzi, Research Associate at NYU Abu Dhabi | YouTube Educator

Featured Resources

Intro to Speech AI E-book

Explore the fundamentals of ASR and TTS and how they are used in various industries.

Read Now

Learn more about Speech AI terminologies

Essential Guide to Speech AI Terminology

Get an overview of the important terminologies in the world of speech AI.

Read Now

Speech AI User Stories

Dive into real-life speech AI use cases for contact center, video conferencing, and more.

Read Now

NVIDIA Riva Automatic Speech Recognition

Build Speech Apps at Scale with NVIDIA Riva

Try NVIDIA^® Riva’s interactive ASR and TTS demos to see, in real time, how Riva delivers highly accurate transcription and natural-sounding, expressive voices.

Try Now