Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation

Biao Zhang, Ankur Bapna, Rico Sennrich, Orhan Firat

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Using a mix of shared and language-specific (LS) parameters has shown promise in multilingual neural machine translation (MNMT), but the question of when and where LS capacity matters most is still under-studied. We offer such a study by proposing conditional language-specific routing (CLSR). CLSR employs hard binary gates conditioned on token representations to dynamically select LS or shared paths. By manipulating these gates, it can schedule LS capacity across sub-layers in MNMT subject to the guidance of translation signals and budget constraints. Moreover, CLSR can easily scale up to massively multilingual settings. Experiments with Transformer on OPUS-100 and WMT datasets show that: 1) MNMT is sensitive to both the amount and the position of LS modeling: distributing 10%-30% LS computation to the top and/or bottom encoder/decoder layers delivers the best performance; and 2) one-to-many translation benefits more from CLSR compared to many-to-one translation, particularly with unbalanced training data. Our study further verifies the trade-off between the shared capacity and LS capacity for multilingual translation. We corroborate our analysis by confirming the soundness of our findings as foundation of our improved multilingual Transformers. Source code and models are available at https://github.com/googleinterns/cct-m4.
Original languageEnglish
Title of host publicationInternational Conference on Learning Representations (ICLR 2021)
Number of pages19
Publication statusPublished - 4 May 2021
EventNinth International Conference on Learning Representations 2021 - Virtual Conference
Duration: 4 May 20217 May 2021
https://iclr.cc/Conferences/2021/Dates

Conference

ConferenceNinth International Conference on Learning Representations 2021
Abbreviated titleICLR 2021
CityVirtual Conference
Period4/05/217/05/21
Internet address

Fingerprint

Dive into the research topics of 'Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation'. Together they form a unique fingerprint.

Cite this