Abstract
This paper describes the submissions to the efficiency track for GPUs by members of the University of Edinburgh, Adam Mickiewicz University, Tilde and University of Alicante. We focus on efficient implementation of the recurrent deep-learning model as implemented in Amun, the fast inference engine for neural machine translation. We improve the performance with an efficient mini-batching algorithm, and by fusing the softmax operation with the k-best extraction algorithm.
Original language | English |
---|---|
Title of host publication | The 2nd Workshop on Neural Machine Translation and Generation 2018 |
Place of Publication | Melbourne, Australia |
Publisher | ACL Anthology |
Pages | 116-121 |
Number of pages | 6 |
Publication status | Published - Jul 2018 |
Event | 2nd Workshop on Neural Machine Translation and Generation - Melbourne, Australia Duration: 15 Jul 2018 → 20 Jul 2018 https://sites.google.com/site/wnmt18/home https://sites.google.com/site/wnmt18/ |
Conference
Conference | 2nd Workshop on Neural Machine Translation and Generation |
---|---|
Abbreviated title | WNMT 2018 |
Country/Territory | Australia |
City | Melbourne |
Period | 15/07/18 → 20/07/18 |
Internet address |