Partitioned integrators for thermodynamic parameterization of neural networks

Benedict Leimkuhler, Charles Matthews, Tiffany Vlaar

Research output: Contribution to journalArticlepeer-review

Abstract

Traditionally, neural networks are parameterized using optimization
procedures such as stochastic gradient descent, RMSProp and ADAM.
These procedures tend to drive the parameters of the network toward a local
minimum. In this article, we employ alternative \sampling" algorithms
(referred to here as \thermodynamic parameterization methods") which rely
on discretized stochastic dierential equations for a dened target distribution
on parameter space. We show that the thermodynamic perspective already
improves neural network training. Moreover, by partitioning the parameters
based on natural layer structure we obtain schemes with very rapid convergence
for data sets with complicated loss landscapes.
We describe easy-to-implement hybrid partitioned numerical algorithms,
based on discretized stochastic dierential equations, which are adapted to
feed-forward neural networks, including a multi-layer Langevin algorithm, Ad-
LaLa (combining the adaptive Langevin and Langevin algorithms) and LOL
(combining Langevin and Overdamped Langevin); we examine the convergence
of these methods using numerical studies and compare their performance
among themselves and in relation to standard alternatives such as stochastic
gradient descent and ADAM. We present evidence that thermodynamic parameterization
methods can be (i) faster, (ii) more accurate, and (iii) more
robust than standard algorithms used within machine learning frameworks.
Original languageEnglish
Number of pages33
JournalFoundations of Data Science
Publication statusAccepted/In press - 4 Dec 2019

Fingerprint Dive into the research topics of 'Partitioned integrators for thermodynamic parameterization of neural networks'. Together they form a unique fingerprint.

Cite this