## Abstract

Traditionally, neural networks are parameterized using optimization

procedures such as stochastic gradient descent, RMSProp and ADAM.

These procedures tend to drive the parameters of the network toward a local

minimum. In this article, we employ alternative \sampling" algorithms

(referred to here as \thermodynamic parameterization methods") which rely

on discretized stochastic dierential equations for a dened target distribution

on parameter space. We show that the thermodynamic perspective already

improves neural network training. Moreover, by partitioning the parameters

based on natural layer structure we obtain schemes with very rapid convergence

for data sets with complicated loss landscapes.

We describe easy-to-implement hybrid partitioned numerical algorithms,

based on discretized stochastic dierential equations, which are adapted to

feed-forward neural networks, including a multi-layer Langevin algorithm, Ad-

LaLa (combining the adaptive Langevin and Langevin algorithms) and LOL

(combining Langevin and Overdamped Langevin); we examine the convergence

of these methods using numerical studies and compare their performance

among themselves and in relation to standard alternatives such as stochastic

gradient descent and ADAM. We present evidence that thermodynamic parameterization

methods can be (i) faster, (ii) more accurate, and (iii) more

robust than standard algorithms used within machine learning frameworks.

procedures such as stochastic gradient descent, RMSProp and ADAM.

These procedures tend to drive the parameters of the network toward a local

minimum. In this article, we employ alternative \sampling" algorithms

(referred to here as \thermodynamic parameterization methods") which rely

on discretized stochastic dierential equations for a dened target distribution

on parameter space. We show that the thermodynamic perspective already

improves neural network training. Moreover, by partitioning the parameters

based on natural layer structure we obtain schemes with very rapid convergence

for data sets with complicated loss landscapes.

We describe easy-to-implement hybrid partitioned numerical algorithms,

based on discretized stochastic dierential equations, which are adapted to

feed-forward neural networks, including a multi-layer Langevin algorithm, Ad-

LaLa (combining the adaptive Langevin and Langevin algorithms) and LOL

(combining Langevin and Overdamped Langevin); we examine the convergence

of these methods using numerical studies and compare their performance

among themselves and in relation to standard alternatives such as stochastic

gradient descent and ADAM. We present evidence that thermodynamic parameterization

methods can be (i) faster, (ii) more accurate, and (iii) more

robust than standard algorithms used within machine learning frameworks.

Original language | English |
---|---|

Number of pages | 33 |

Journal | Foundations of Data Science |

Publication status | Accepted/In press - 4 Dec 2019 |