Abstract
Typical ASR systems segment the input audio in toutterances using purely acoustic information, which may not resemble the sentence-like units that are expected by conventional machine translation (MT) systems for Spoken Language Translation. In this work, we propose a model for correcting the acoustic segmentation of ASR models for low-resource languages to improve performance on down-stream tasks. We propose the use of subtitles as a proxy dataset for correcting ASR acoustic segmentation, creating synthetic acoustic utterances by modeling common error modes. We train a neural tagging model for correcting ASR acoustic segmentation and show that it improves downstream performance on MT and audio-document cross-language information retrieval (CLIR).
Original language | English |
---|---|
Title of host publication | Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume |
Publisher | Association for Computational Linguistics |
Pages | 2842-2854 |
Number of pages | 13 |
ISBN (Print) | 978-1-954085-02-2 |
Publication status | Published - 19 Apr 2021 |
Event | 16th conference of the European Chapter of the Association for Computational Linguistics - Virtual Conference Duration: 19 Apr 2021 → 23 Apr 2021 https://2021.eacl.org/ |
Conference
Conference | 16th conference of the European Chapter of the Association for Computational Linguistics |
---|---|
Abbreviated title | EACL 2021 |
City | Virtual Conference |
Period | 19/04/21 → 23/04/21 |
Internet address |