Efficient intelligibility evaluation using keyword spotting: A study on audio-visual speech enhancement

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We propose a new method for human speech intelligibility evaluation based on keyword spotting. In this method, participants play a stimulus and select the word they hear from a close set of alternatives. To find which sentence to use, the target word, and alternatives we mine a largjavascript:void(0);e set of stimuli using a phonetic dictionary and a language model. Unlike other tests, our method does not rely on specially designed sentences and can be used to evaluate in-the-wild material such as TED talks. We focus on audio-visual (AV) speech enhancement (SE) evaluation as a study case. We compared our method to a transcription task and observed that the two produce highly correlated results, albeit our task requiring substantially less participation time. We then adopted it on a large-scale evaluation of AVSE systems. Results show that keyword spotting is a suitable and efficient alternative to assess intelligibility from AV stimuli.
Original languageEnglish
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherIEEE
Number of pages5
ISBN (Electronic)9781728163277
ISBN (Print)9781728163284
DOIs
Publication statusPublished - 5 May 2023
Event2023 IEEE International Conference on Acoustics, Speech and Signal Processing - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023
https://2023.ieeeicassp.org/

Publication series

NameInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP)
PublisherIEEE
ISSN (Print)1520-6149
ISSN (Electronic)2379-190X

Conference

Conference2023 IEEE International Conference on Acoustics, Speech and Signal Processing
Abbreviated titleICASSP
Country/TerritoryGreece
CityRhodes Island
Period4/06/2310/06/23
Internet address

Fingerprint

Dive into the research topics of 'Efficient intelligibility evaluation using keyword spotting: A study on audio-visual speech enhancement'. Together they form a unique fingerprint.

Cite this