Investigating Sequence-Level Normalisation for CTC-Like End-To-End ASR

Zeyu Zhao, Peter Bell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

End-to-end Automatic Speech Recognition (E2E ASR) significantly simplifies the training process of an ASR model. Connectionist Temporal Classification (CTC) is one of the most popular methods for E2E ASR training. Implicitly, CTC has a unique topology which is very useful for sequence modelling. However, we find that by changing to another topology, we can make it even more effective. In this paper, we propose a new CTC-like method, for E2E ASR training, by modifying the topology of original CTC, so that the wellknown abuse of the blank label in CTC can be resolved theoretically. As we change the topology, a normalisation term is necessary, which makes the form of the final loss function similar to Maximum Mutual Information (MMI); we hence name our method MMI-CTC. In addition to maximising the posterior probability of the target sequence, the normalisation enables models to explicitly minimise the probability of competing hypothesis at the word sequence level. Our experimental results show that MMI-CTC is more efficient than CTC, and that the normalisation is essential for sequence training.
Original languageEnglish
Title of host publicationProceedings of 2022 IEEE International Conference on Acoustics, Speech and Signal Processing
PublisherInstitute of Electrical and Electronics Engineers
Pages7792-7796
Number of pages5
ISBN (Electronic)978-1-6654-0540-9
ISBN (Print)978-1-6654-0541-6
DOIs
Publication statusPublished - 27 Apr 2022
Event2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Online, Singapore
Duration: 7 May 202227 May 2022
Conference number: 47
https://2022.ieeeicassp.org/index.php

Publication series

NameInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP)
PublisherIEEE
ISSN (Print)1520-6149
ISSN (Electronic)2379-190X

Conference

Conference2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Abbreviated titleICASSP 2022
Period7/05/2227/05/22
Internet address

Keywords / Materials (for Non-textual outputs)

  • ASR
  • E2E ASR
  • CTC
  • MMI
  • Sequence Training

Fingerprint

Dive into the research topics of 'Investigating Sequence-Level Normalisation for CTC-Like End-To-End ASR'. Together they form a unique fingerprint.

Cite this