Abstract
Explainable AI (XAI) techniques have been widely used to help explain and understand the output of deep learning models in fields such as image classification and Natural Language Processing. Interest in using XAI techniques to explain deep learning-based Automatic Speech Recognition (ASR) is emerging. But there is not enough evidence on whether these explanations can be trusted. To address this, we adapt a state-of-the-art XAI technique from the image classification domain, Local Interpretable Model-Agnostic Explanations (LIME), to a model trained for a TIMIT-based phoneme recognition task. This simple task provides a controlled setting for evaluation while also providing expert annotated ground truth to assess the quality of explanations. We find a variant of LIME based on time partitioned audio segments, that we propose in this paper, produces the most reliable explanations, containing the ground truth 96% of the time in its top three audio segments.
Original language | English |
---|---|
Title of host publication | ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 10296-10300 |
Number of pages | 5 |
ISBN (Electronic) | 979-8-3503-4485-1 |
ISBN (Print) | 979-8-3503-4486-8 |
DOIs | |
Publication status | Published - 18 Mar 2024 |
Event | 2024 IEEE International Conference on Acoustics, Speech and Signal Processing - Seoul, Korea, Republic of Duration: 14 Apr 2024 → 19 Apr 2024 https://2024.ieeeicassp.org/ |
Publication series
Name | International Conference on Acoustics, Speech, and Signal Processing (ICASSP) |
---|---|
Publisher | IEEE |
ISSN (Print) | 1520-6149 |
ISSN (Electronic) | 2379-190X |
Conference
Conference | 2024 IEEE International Conference on Acoustics, Speech and Signal Processing |
---|---|
Abbreviated title | ICASSP 2024 |
Country/Territory | Korea, Republic of |
City | Seoul |
Period | 14/04/24 → 19/04/24 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- Explanation
- Phoneme Recognition