AVSE Challenge: Audio-Visual Speech Enhancement Challenge

Andrea Lorena Aldana Blanco, Cassia Valentini Botinhao, Ondrej Klejch, Mandar Gogate, Kia Dashtipour, Amir Hussain, Peter Bell

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Audio-visual speech enhancement is the task of improving the quality of a speech signal when video of the speaker is available. It opens-up the opportunity of improving speech intelligibility in adverse listening scenarios that are currently too challenging for audio-only speech enhancement models. The Audio-Visual Speech Enhancement (AVSE) challenge aims to set the first benchmark in this area. We provide participants with datasets and scripts to test their audio-visual speech enhancement models under a common framework for both training and evaluation. The data is derived from real-world videos, and comprises noisy mixes, in which audio from target speaker is mixed with either a competing speaker or a noise signal. The submitted systems are evaluated by conducting AV intelligibility tests involving human participants. We expect this challenge to be a platform for advancing the field of audio-visual speech-enhancement and to provide further insight about the scope and limitations of current AV speech enhancement approaches.
Original languageEnglish
Title of host publicationProceedings of the 2022 IEEE Spoken Language Technology Workshop
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages7
ISBN (Electronic)979-8-3503-9690-4, 979-8-3503-9689-8
ISBN (Print)979-8-3503-9691-1
Publication statusPublished - 27 Jan 2023
EventThe IEEE Spoken Language Technology Workshop, 2022 - Doha, Qatar
Duration: 9 Jan 202312 Jan 2023


WorkshopThe IEEE Spoken Language Technology Workshop, 2022
Abbreviated titleSLT 2022
Internet address


  • Audio-visual speech enhancement
  • subjective intelligibility
  • LRS3 dataset


Dive into the research topics of 'AVSE Challenge: Audio-Visual Speech Enhancement Challenge'. Together they form a unique fingerprint.

Cite this