Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. In this paper, we explore ways to improve them. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures. We identify the off-target translation issue (i.e. translating into a wrong target language) as the major source of the inferior zero-shot performance, and propose random online backtranslation to enforce the translation of unseen training language pairs. Experiments on OPUS-100 (a novel multilingual dataset with 100 languages) show that our approach substantially narrows the performance gap with bilingual models in both oneto-many and many-to-many settings, and improves zero-shot performance by ∼10 BLEU, approaching conventional pivot-based methods.
Original languageEnglish
Title of host publicationProceedings of the 58th Annual Meeting of the Association for Computational Linguistics
PublisherAssociation for Computational Linguistics (ACL)
Pages1628–1639
Number of pages12
ISBN (Electronic)978-1-952148-25-5
DOIs
Publication statusPublished - 10 Jul 2020
Event2020 Annual Conference of the Association for Computational Linguistics - Hyatt Regency Seattle, Virtual conference, United States
Duration: 5 Jul 202010 Jul 2020
Conference number: 58
https://acl2020.org/

Conference

Conference2020 Annual Conference of the Association for Computational Linguistics
Abbreviated titleACL 2020
CountryUnited States
CityVirtual conference
Period5/07/2010/07/20
Internet address

Fingerprint

Dive into the research topics of 'Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation'. Together they form a unique fingerprint.

Cite this