Beyond Sentence-Level End-to-End Speech Translation: Context Helps

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Document-level contextual information has shown benefits to text-based machine translation, but whether and how context helps end-to-end (E2E) speech translation (ST) is still under-studied. We fill this gap through extensive experiments using a simple concatenation-based context-aware ST model, paired with adaptive feature selection on speech encodings for computational efficiency. We investigate several decoding approaches, and introduce in-model ensemble decoding which jointly performs document- and sentence-level translation using the same model. Our results on the MuST-C benchmark with Transformer demonstrate the effectiveness of context to E2E ST. Compared to sentence-level ST, context-aware ST obtains better translation quality (+0.18-2.61 BLEU), improves pronoun and homophone translation, shows better robustness to (artificial) audio segmentation errors, and reduces latency and flicker to deliver higher quality for simultaneous translation.
Original languageEnglish
Title of host publicationProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Place of PublicationOnline
PublisherAssociation for Computational Linguistics
Pages2566-2578
Number of pages13
ISBN (Electronic)978-1-954085-52-7
DOIs
Publication statusPublished - 1 Aug 2021
EventThe Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing - Bangkok, Thailand
Duration: 1 Aug 20216 Aug 2021
https://2021.aclweb.org/

Conference

ConferenceThe Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing
Abbreviated titleACL-IJCNLP 2021
Country/TerritoryThailand
CityBangkok
Period1/08/216/08/21
Internet address

Fingerprint

Dive into the research topics of 'Beyond Sentence-Level End-to-End Speech Translation: Context Helps'. Together they form a unique fingerprint.

Cite this