Abstract
Our work is motivated by a desire to study the theoretical underpinning for the convergence of stochastic gradient type algorithms widely used for nonconvex learning tasks such as training of neural networks. The key insight is that a certain class of the finitedimensional nonconvex problems becomes convex when lifted to infinitedimensional space of measures. We leverage this observation and show that the corresponding energy functional defined on the space of probability measures has a unique minimiser which can be characterised by a firstorder condition using the notion of linear functional derivative. Next, we study the corresponding gradient flow structure in 2Wasserstein metric, which we call MeanField Langevin Dynamics (MFLD), and show that the flow of marginal laws induced by the gradient flow converges to a stationary distribution, which is exactly the minimiser of the energy functional. We observe that this convergence is exponential under conditions that are satisfied for highly regularised learning tasks. Our proof of convergence to stationary probability measure is novel and it relies on a generalisation of LaSalle's invariance principle combined with HWI inequality. Importantly, we assume neither that interaction potential of MFLD is of convolution type nor that it has any particular symmetric structure. Furthermore, we allow for the general convex objective function, unlike, most papers in the literature that focus on quadratic loss. Finally, we show that the error between finitedimensional optimisation problem and its infinitedimensional limit is of order one over the number of parameters.
Original language  English 

Publisher  ArXiv 
Pages  131 
Number of pages  31 
Publication status  Published  19 May 2019 
Fingerprint Dive into the research topics of 'MeanField Langevin Dynamics and Energy Landscape of Neural Networks'. Together they form a unique fingerprint.
Profiles

Lukas Szpruch
 School of Mathematics  Reader, Programme Director (ATI)
Person: Academic: Research Active