Abstract
Some End-to-End (E2E) Automatic Speech Recognition (ASR) models, such as Attention-based Encoder-Decoder (AED) and Recurrent Neural Network Transducer (RNN-T) are known to have components that effectively act as internal language models (ILM), implicitly modelling the prior probability of the output sequence. However, the existence of an ILM in pure Connectionist Temporal Classification (CTC) ASR systems remains debated. In this paper, we investigate the existence and strength of an ILM in CTC systems. Since CTC posterior probabilities cannot be analytically factorised, we propose a novel empirical method to probe the ILM. After validating our method on a hybrid DNN model with various external language models, we apply it to CTC models trained under different conditions, examining the effects of training data, modelling units, and training or pre-training methods. Our results show no strong evidence of an ILM in CTC-based ASR systems, even with the largest training dataset in our experiments. However, we make the surprising finding that when a CTC encoder is jointly trained with an AED loss, an ILM emerges, even when only the CTC component is used in decoding.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Place of Publication | Piscataway, NJ, USA |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 1-5 |
Number of pages | 5 |
ISBN (Electronic) | 9798350368741 |
ISBN (Print) | 9798350368758 |
DOIs | |
Publication status | Published - 7 Mar 2025 |
Event | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing - Hyderabad International Convention Centre, Hyderabad, India Duration: 6 Apr 2025 → 11 Apr 2025 https://2025.ieeeicassp.org/ |
Publication series
Name | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing |
---|---|
Publisher | Institute of Electrical and Electronics Engineers |
ISSN (Print) | 1520-6149 |
ISSN (Electronic) | 2379-190X |
Conference
Conference | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing |
---|---|
Abbreviated title | ICASSP 2025 |
Country/Territory | India |
City | Hyderabad |
Period | 6/04/25 → 11/04/25 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- automatic speech recognition
- connectionist temporal classification
- internal language model