On the Efficiency of Recurrent Neural Network Optimization Algorithms

Ben Krause, Liang Lu, Iain Murray, Steve Renals

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This study compares the sequential and parallel efficiency of training Recurrent Neural Networks (RNNs) with Hessian-free optimization versus a gradient descent variant. Experiments are performed using the long short term memory (LSTM)
architecture and the newly proposed multiplicative LSTM (mLSTM) architecture.
Results demonstrate a number of insights into these architectures and optimization
algorithms, including that Hessian-free optimization has the potential for large
efficiency gains in a highly parallel setup.
Original languageEnglish
Title of host publicationOPT2015 Optimization for Machine Learning at the Neural Information Processing Systems Conference, 2015
Number of pages5
Publication statusPublished - Dec 2015

Fingerprint Dive into the research topics of 'On the Efficiency of Recurrent Neural Network Optimization Algorithms'. Together they form a unique fingerprint.

Cite this