Abstract / Description of output
We participated in all tracks of the WMT 2021 efficient machine translation task: single-core CPU, multi-core CPU, and GPU hardware with throughput and latency conditions. Our submissions combine several efficiency strategies: knowledge distillation, a simpler simple recurrent unit (SSRU) decoder with one or two layers, lexical shortlists, smaller numerical formats, and pruning. For the CPU track, we used quantized 8-bit models. For the GPU track, we experimented with FP16 and 8-bit integers in tensorcores. Some of our submissions optimize for size via 4-bit log quantization and omitting a lexical shortlist. We have extended pruning to more parts of the network, emphasizing component- and block-level pruning that actually improves speed unlike coefficient-wise pruning.
Original language | English |
---|---|
Title of host publication | Proceedings of the Sixth Conference on Machine Translation |
Place of Publication | Stroudsburg, PA, USA |
Publisher | Association for Computational Linguistics |
Pages | 775-780 |
Number of pages | 6 |
ISBN (Print) | 978-1-954085-94-7 |
Publication status | Published - 10 Nov 2021 |
Event | EMNLP 2021 Sixth Conference on Machine Translation (WMT) - Punta Cana, Dominican Republic Duration: 10 Nov 2021 → 11 Nov 2021 Conference number: 6 https://www.statmt.org/wmt21/ |
Conference
Conference | EMNLP 2021 Sixth Conference on Machine Translation (WMT) |
---|---|
Abbreviated title | WMT 21 |
Country/Territory | Dominican Republic |
City | Punta Cana |
Period | 10/11/21 → 11/11/21 |
Internet address |