AVSE Challenge: Audio-Visual Speech Enhancement Challenge

Andrea Lorena Aldana Blanco, Cassia Valentini Botinhao, Ondrej Klejch, Mandar Gogate, Kia Dashtipour, Amir Hussain, Peter Bell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Audio-visual speech enhancement is the task of improving the quality of a speech signal when video of the speaker is available. It opens-up the opportunity of improving speech intelligibility in adverse listening scenarios that are currently too challenging for audio-only speech enhancement models. The Audio-Visual Speech Enhancement (AVSE) challenge aims to set the first benchmark in this area. We provide participants with datasets and scripts to test their audio-visual speech enhancement models under a common framework for both training and evaluation. The data is derived from real-world videos, and comprises noisy mixes, in which audio from target speaker is mixed with either a competing speaker or a noise signal. The submitted systems are evaluated by conducting AV intelligibility tests involving human participants. We expect this challenge to be a platform for advancing the field of audio-visual speech-enhancement and to provide further insight about the scope and limitations of current AV speech enhancement approaches.
Original languageEnglish
Title of host publicationProceedings of the 2022 IEEE Spoken Language Technology Workshop
PublisherInstitute of Electrical and Electronics Engineers
Pages465-471
Number of pages7
ISBN (Electronic)979-8-3503-9690-4, 979-8-3503-9689-8
ISBN (Print)979-8-3503-9691-1
DOIs
Publication statusPublished - 27 Jan 2023
EventThe IEEE Spoken Language Technology Workshop, 2022 - Doha, Qatar
Duration: 9 Jan 202312 Jan 2023
https://slt2022.org/

Workshop

WorkshopThe IEEE Spoken Language Technology Workshop, 2022
Abbreviated titleSLT 2022
Country/TerritoryQatar
CityDoha
Period9/01/2312/01/23
Internet address

Keywords / Materials (for Non-textual outputs)

  • Audio-visual speech enhancement
  • subjective intelligibility
  • LRS3 dataset

Fingerprint

Dive into the research topics of 'AVSE Challenge: Audio-Visual Speech Enhancement Challenge'. Together they form a unique fingerprint.

Cite this