Exploring Unsupervised Pretraining Objectives for Machine Translation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT), by drastically reducing the need for large parallel data. Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. In this work, we systematically compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context. We pretrain models with different methods on English$German, English$Nepali and English$Sinhala monolingual data, and evaluate them on NMT. In (semi-) supervised NMT, varying the pretraining objective leads to surprisingly small differences in the finetuned performance, whereas unsupervised NMT is much more sensitive to it. To understand these results, we thoroughly study the pretrained models and verify that they encode and use information in different ways. We conclude that finetuning on parallel data is mostly sensitive to few properties that are shared by most models, such as a strong decoder, in contrast to unsupervised NMT that also requires models with strong cross-lingual abilities
Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021
EditorsChengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
PublisherAssociation for Computational Linguistics
Pages2956-2971
Number of pages16
ISBN (Electronic)978-1-954085-54-1
DOIs
Publication statusPublished - 1 Aug 2021
EventThe Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing - Bangkok, Thailand
Duration: 1 Aug 20216 Aug 2021
https://2021.aclweb.org/

Conference

ConferenceThe Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing
Abbreviated titleACL-IJCNLP 2021
Country/TerritoryThailand
CityBangkok
Period1/08/216/08/21
Internet address

Fingerprint

Dive into the research topics of 'Exploring Unsupervised Pretraining Objectives for Machine Translation'. Together they form a unique fingerprint.

Cite this