Advancing CTC models for better speech alignment: A topological approach

Zeyu Zhao, Peter Bell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Automatic Speech Recognition (ASR) systems often face challenges in alignment quality, particularly with the Connectionist Temporal Classification (CTC) approach, which frequently results in a high number of blank frames, known as the “peaky” issue. In this study, we explore the impact of modifying ASR model topologies on alignment quality without compromising Word Error Rate (WER) performance. Our findings demonstrate that introducing additional states to the CTC topology significantly improves alignment quality and mitigates the peaky issue. Conversely, increasing the minimum traversal frame can degrade alignment quality in our specific settings. These insights emphasise the critical importance of topology design in balancing alignment accuracy and recognition performance in ASR systems.
Original languageEnglish
Title of host publicationProceedings of the 2024 IEEE Spoken Language Technology Workshop
PublisherInstitute of Electrical and Electronics Engineers
Pages1-7
Number of pages7
Publication statusAccepted/In press - 30 Aug 2024
EventIEEE Spoken Language Technology Workshop 2024 - Banyan Tree Macau, Macau, China
Duration: 2 Dec 20245 Dec 2024
https://2024.ieeeslt.org

Publication series

NameProceedings of the IEEE Spoken Language Technology Workshop
PublisherIEEE
ISSN (Print)2639-5479

Conference

ConferenceIEEE Spoken Language Technology Workshop 2024
Abbreviated titleSLT 2024
Country/TerritoryChina
CityMacau
Period2/12/245/12/24
Internet address

Keywords / Materials (for Non-textual outputs)

  • ASR
  • CTC
  • topology
  • alignment

Fingerprint

Dive into the research topics of 'Advancing CTC models for better speech alignment: A topological approach'. Together they form a unique fingerprint.

Cite this