Abstract
Quantization is one way to compress Neural Machine Translation (NMT) models, especially for edge devices. This paper pushes quantization from 8 bits, seen in current work on machine translation, to 4 bits. Instead of fixed-point quantization, we use logarithmic quantization since parameters are skewed towards zero. We then observe that quantizing the bias terms in this way damages quality, so we leave them uncompressed. Bias terms are a tiny fraction of the model so the impact on compression rate is minimal. Retraining is necessary to preserve quality, for which we propose to use an error-feedback mechanism that treats compression errors like noisy gradients. We empirically show that NMT models based on the Transformer or RNN architectures can be compressed up to 4-bit precision without any noticeable quality degradation. Models can be compressed up to binary precision, albeit with lower quality. The RNN architecture appears more robust towards compression, compared to the Transformer.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the Fourth Workshop on Neural Generation and Translation |
| Place of Publication | Seattle |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 35–42 |
| Number of pages | 8 |
| ISBN (Electronic) | 978-1-952148-17-0 |
| DOIs | |
| Publication status | Published - 10 Jul 2020 |
| Event | The 4th Workshop on Neural Generation and Translation - Online workshop, Seattle, United States Duration: 10 Jul 2020 → 10 Jul 2020 https://sites.google.com/view/wngt20 |
Workshop
| Workshop | The 4th Workshop on Neural Generation and Translation |
|---|---|
| Abbreviated title | WNGT 2020 |
| Country/Territory | United States |
| City | Seattle |
| Period | 10/07/20 → 10/07/20 |
| Internet address |
Fingerprint
Dive into the research topics of 'Compressing Neural Machine Translation Models with 4-bit Precision'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Browser-based Multilingual Translation
Heafield, K. (Principal Investigator)
1/01/19 → 30/06/22
Project: Research
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver