Abstract / Description of output
Automatic Speech Recognition (ASR) systems often face challenges in alignment quality, particularly with the Connectionist Temporal Classification (CTC) approach, which frequently results in a high number of blank frames, known as the “peaky” issue. In this study, we explore the impact of modifying ASR model topologies on alignment quality without compromising Word Error Rate (WER) performance. Our findings demonstrate that introducing additional states to the CTC topology significantly improves alignment quality and mitigates the peaky issue. Conversely, increasing the minimum traversal frame can degrade alignment quality in our specific settings. These insights emphasise the critical importance of topology design in balancing alignment accuracy and recognition performance in ASR systems.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2024 IEEE Spoken Language Technology Workshop |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 1-7 |
Number of pages | 7 |
Publication status | Accepted/In press - 30 Aug 2024 |
Event | IEEE Spoken Language Technology Workshop 2024 - Banyan Tree Macau, Macau, China Duration: 2 Dec 2024 → 5 Dec 2024 https://2024.ieeeslt.org |
Publication series
Name | Proceedings of the IEEE Spoken Language Technology Workshop |
---|---|
Publisher | IEEE |
ISSN (Print) | 2639-5479 |
Conference
Conference | IEEE Spoken Language Technology Workshop 2024 |
---|---|
Abbreviated title | SLT 2024 |
Country/Territory | China |
City | Macau |
Period | 2/12/24 → 5/12/24 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- ASR
- CTC
- topology
- alignment