Exploring Hyper-Parameter Optimization for Neural Machine Translation on GPU Architectures

Robert Lim, Kenneth Heafield, Hieu Hoang, Mark Briers, Allen Malony

Research output: Contribution to conferencePaperpeer-review

Abstract / Description of output

Neural machine translation (NMT) has been accelerated by deep learning neural networks over statisticalbased approaches, due to the plethora and programmability of commodity heterogeneous computing architectures such as FPGAs and GPUs and the massive amount of training corpuses generated from news outlets, government agencies and social media. Training a learning classifier for neural networks entails tuning hyper-parameters that would yield the best performance. Unfortunately, the number of parameters for machine translation include discrete categories as well as continuous options, which makes for a combinatorial explosive problem. This research explores optimizing hyper parameters when training deep learning neural networks for machine translation. Specifically, our work investigates training a language model with Marian NMT. Results compare NMT under various hyper-parameter settings across a variety of modern GPU architecture generations in single node and multi-node settings, revealing insights on which hyper-parameters matter most in terms of performance, such as words processed per second, convergence rates, and translation accuracy, and provides insights on how to best achieve highperforming NMT systems.
Original languageEnglish
Pages1-8
Number of pages8
Publication statusPublished - 2018
EventSecond Annual Workshop on Naval Applications of Machine Learning
- San Diego, United States
Duration: 13 Feb 201815 Feb 2018
https://sites.google.com/go.spawar.navy.mil/naml2018/home

Conference

ConferenceSecond Annual Workshop on Naval Applications of Machine Learning
Abbreviated titleNAML 2018
Country/TerritoryUnited States
CitySan Diego
Period13/02/1815/02/18
Internet address

Fingerprint

Dive into the research topics of 'Exploring Hyper-Parameter Optimization for Neural Machine Translation on GPU Architectures'. Together they form a unique fingerprint.

Cite this