A sound engineering approach to near end listening enhancement

Carol Chermaz*, Simon King

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We present the beta version of ASE (the Automatic Sound Engineer), a NELE (Near End Listening Enhancement) algorithm based on audio engineering knowledge. Generations of sound engineers have improved the intelligibility of speech against competing sounds and reverberation, while maintaining high sound quality and artistic integrity (e.g., audio track mixing in music and movies). We try to grasp the essential aspects of this expert knowledge and apply it to the more mundane context of speech playback in realistic noise. The algorithm described here was entered into the Hurricane Challenge 2.0, an evaluation of NELE algorithms. Results from those listening tests across three languages show the potential of our approach, which achieved improvements of over 7 dB EIC (Equivalent Intensity Change), corresponding to an absolute increase of 58% WAR (Word Accuracy Rate).
Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association
Pages1356-1360
Number of pages5
Volume2020-October
DOIs
Publication statusPublished - 31 Oct 2020
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X

Conference

Conference21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Country/TerritoryChina
CityShanghai
Period25/10/2029/10/20

Keywords / Materials (for Non-textual outputs)

  • near end listening enhancement
  • sound engineering
  • speech modifications

Fingerprint

Dive into the research topics of 'A sound engineering approach to near end listening enhancement'. Together they form a unique fingerprint.

Cite this