From Research to Production and Back: Ludicrously Fast Neural Machine Translation

Young Jin Kim, Marcin Junczys-Dowmunt, Hany Hassan, Alham Fikri Aji, Kenneth Heafield, Roman Grundkiewicz, Nikolay Bogoychev

Research output: Chapter in Book/Report/Conference proceedingConference contribution


This paper describes the submissions of the “Marian” team to the WNGT 2019 efficiency shared task. Taking our dominating submissions to the previous edition of the shared task as a starting point, we develop improved teacher-student training via multi-agent duallearning and noisy backward-forward translation for Transformer-based student models. For efficient CPU-based decoding, we propose pre-packed 8-bit matrix products, improved batched decoding, cache-friendly student architectures with parameter sharing and light-weight RNN-based decoder architectures. GPU-based decoding benefits from the same architecture changes, from pervasive 16-bit inference and concurrent streams. These modifications together with profiler-based C++ code optimization allow us to push the Pareto frontier established during the 2018 edition towards 24x (CPU) and 14x (GPU) faster models at comparable or higher BLEU values. Our fastest CPU model is more than 4x faster than last year’s fastest submission at more than 3 points higher BLEU. Our fastest GPU model at 1.5 seconds translation time is slightly faster than last year’s fastest RNN-based submissions, but outperforms them by more than 4 BLEU and 10 BLEU points respectively.
Original languageEnglish
Title of host publicationProceedings of the The 3rd Workshop on Neural Generation and Translation (WNGT 2019)
Place of PublicationHong Kong
PublisherAssociation for Computational Linguistics (ACL)
Number of pages9
ISBN (Print)78-1-950737-83-3
Publication statusPublished - 4 Nov 2019
EventThe 3rd Workshop on Neural Generation and Translation: at EMNLP-IJCNLP 2019 - Hong Kong, Hong Kong
Duration: 4 Nov 20194 Nov 2019


WorkshopThe 3rd Workshop on Neural Generation and Translation
Abbreviated titleWNGT 2019
Country/TerritoryHong Kong
CityHong Kong
Internet address


Dive into the research topics of 'From Research to Production and Back: Ludicrously Fast Neural Machine Translation'. Together they form a unique fingerprint.

Cite this