Abstract
This study compares the sequential and parallel efficiency of training Recurrent Neural Networks (RNNs) with Hessian-free optimization versus a gradient descent variant. Experiments are performed using the long short term memory (LSTM)
architecture and the newly proposed multiplicative LSTM (mLSTM) architecture.
Results demonstrate a number of insights into these architectures and optimization
algorithms, including that Hessian-free optimization has the potential for large
efficiency gains in a highly parallel setup.
architecture and the newly proposed multiplicative LSTM (mLSTM) architecture.
Results demonstrate a number of insights into these architectures and optimization
algorithms, including that Hessian-free optimization has the potential for large
efficiency gains in a highly parallel setup.
| Original language | English |
|---|---|
| Title of host publication | OPT2015 Optimization for Machine Learning at the Neural Information Processing Systems Conference, 2015 |
| Number of pages | 5 |
| Publication status | Published - Dec 2015 |