Projects per year
Abstract / Description of output
We participated in all tracks of the Workshop on Neural Generation and Translation 2020 Efficiency Shared Task: single-core CPU, multicore CPU, and GPU. At the model level, we use teacher-student training with a variety of student sizes, tie embeddings and sometimes layers, use the Simpler Simple Recurrent Unit, and introduce head pruning. On GPUs, we used 16-bit floating-point tensor cores. On CPUs, we customized 8-bit quantization and multiple processes with affinity for the multicore setting. To reduce model size, we experimented with 4-bit log quantization but use floats at runtime. In the shared task, most of our submissions were Pareto optimal with respect the trade-off between time and quality.
Original language | English |
---|---|
Title of host publication | Proceedings of the Fourth Workshop on Neural Generation and Translation |
Place of Publication | Seattle |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 218–224 |
Number of pages | 7 |
ISBN (Electronic) | 978-1-952148-17-0 |
DOIs | |
Publication status | Published - 10 Jul 2020 |
Event | The 4th Workshop on Neural Generation and Translation - Online workshop, Seattle, United States Duration: 10 Jul 2020 → 10 Jul 2020 https://sites.google.com/view/wngt20 |
Workshop
Workshop | The 4th Workshop on Neural Generation and Translation |
---|---|
Abbreviated title | WNGT 2020 |
Country/Territory | United States |
City | Seattle |
Period | 10/07/20 → 10/07/20 |
Internet address |
Fingerprint
Dive into the research topics of 'Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task'. Together they form a unique fingerprint.Projects
- 2 Finished