Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task

Nikolay Bogoychev, Roman Grundkiewicz, Alham Fikri Aji, Maximiliana Behnke, Kenneth Heafield, Sidharth Kashyap, Emmanouil-Ioannis Farsarakis, Mateusz Chudyk

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We participated in all tracks of the Workshop on Neural Generation and Translation 2020 Efficiency Shared Task: single-core CPU, multicore CPU, and GPU. At the model level, we use teacher-student training with a variety of student sizes, tie embeddings and sometimes layers, use the Simpler Simple Recurrent Unit, and introduce head pruning. On GPUs, we used 16-bit floating-point tensor cores. On CPUs, we customized 8-bit quantization and multiple processes with affinity for the multicore setting. To reduce model size, we experimented with 4-bit log quantization but use floats at runtime. In the shared task, most of our submissions were Pareto optimal with respect the trade-off between time and quality.
Original languageEnglish
Title of host publicationProceedings of the Fourth Workshop on Neural Generation and Translation
Place of PublicationSeattle
PublisherAssociation for Computational Linguistics (ACL)
Pages218–224
Number of pages7
ISBN (Electronic)978-1-952148-17-0
DOIs
Publication statusPublished - 10 Jul 2020
EventThe 4th Workshop on Neural Generation and Translation - Online workshop, Seattle, United States
Duration: 10 Jul 202010 Jul 2020
https://sites.google.com/view/wngt20

Workshop

WorkshopThe 4th Workshop on Neural Generation and Translation
Abbreviated titleWNGT 2020
Country/TerritoryUnited States
CitySeattle
Period10/07/2010/07/20
Internet address

Fingerprint

Dive into the research topics of 'Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task'. Together they form a unique fingerprint.

Cite this