Abstract / Description of output
Audio-visual speech enhancement is the task of improving the quality of a speech signal when video of the speaker is available. It opens-up the opportunity of improving speech intelligibility in adverse listening scenarios that are currently too challenging for audio-only speech enhancement models. The Audio-Visual Speech Enhancement (AVSE) challenge aims to set the first benchmark in this area. We provide participants with datasets and scripts to test their audio-visual speech enhancement models under a common framework for both training and evaluation. The data is derived from real-world videos, and comprises noisy mixes, in which audio from target speaker is mixed with either a competing speaker or a noise signal. The submitted systems are evaluated by conducting AV intelligibility tests involving human participants. We expect this challenge to be a platform for advancing the field of audio-visual speech-enhancement and to provide further insight about the scope and limitations of current AV speech enhancement approaches.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2022 IEEE Spoken Language Technology Workshop |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 465-471 |
Number of pages | 7 |
ISBN (Electronic) | 979-8-3503-9690-4, 979-8-3503-9689-8 |
ISBN (Print) | 979-8-3503-9691-1 |
DOIs | |
Publication status | Published - 27 Jan 2023 |
Event | The IEEE Spoken Language Technology Workshop, 2022 - Doha, Qatar Duration: 9 Jan 2023 → 12 Jan 2023 https://slt2022.org/ |
Workshop
Workshop | The IEEE Spoken Language Technology Workshop, 2022 |
---|---|
Abbreviated title | SLT 2022 |
Country/Territory | Qatar |
City | Doha |
Period | 9/01/23 → 12/01/23 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- Audio-visual speech enhancement
- subjective intelligibility
- LRS3 dataset