Learning Noise Invariant Features Through Transfer Learning for Robust End-to-End Speech Recognition

Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Steve Renals

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

End-to-end models yield impressive speech recognition results on clean datasets while having inferior performance on noisy datasets. To address this, we propose transfer learning from a clean dataset (WSJ) to a noisy dataset (CHiME-4) for connectionist temporal classification models. We argue that the clean classifier (the upper layers of a neural network trained on clean data) can force the feature extractor (the lower layers) to learn the underlying noise invariant patterns in the noisy dataset. While training on the noisy dataset, the clean classifier is either frozen or trained with a small learning rate. The feature extractor is trained with no learning rate re-scaling. The proposed method gives up to 15.5% relative character error rate (CER) reduction compared to models trained only on CHiME-4. Furthermore, we use the test sets of Aurora-4 to perform evaluation on unseen noisy conditions. Our method has significantly lower CERs (11.3% relative on average) on all 14 Aurora-4 test sets compared to the conventional transfer learning method (no learning rate re-scale for any layer), indicating our method enables the model to learn noise invariant features.
Original languageEnglish
Title of host publicationICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages7024-7028
Number of pages5
ISBN (Electronic)978-1-5090-6631-5
ISBN (Print)978-1-5090-6632-2
DOIs
Publication statusAccepted/In press - 24 Jan 2020
Event2020 IEEE International Conference on Acoustics, Speech, and Signal Processing - Barcelona, Spain
Duration: 4 May 20208 May 2020
Conference number: 45

Publication series

Name
PublisherIEEE
ISSN (Print)1520-6149
ISSN (Electronic)2379-190X

Conference

Conference2020 IEEE International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP 2020
CountrySpain
CityBarcelona
Period4/05/208/05/20

Keywords

  • end-to-end
  • robust speech recognition
  • transfer learning

Fingerprint Dive into the research topics of 'Learning Noise Invariant Features Through Transfer Learning for Robust End-to-End Speech Recognition'. Together they form a unique fingerprint.

Cite this