Edinburgh Research Explorer

Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio

Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Stanislaw Jastrzębski
  • Zachary Kenton
  • Devansh Arpit
  • Nicolas Ballas
  • Asja Fischer
  • Yoshua Bengio
  • Amos Storkey

Related Edinburgh Organisations

Open Access permissions

Open

Documents

https://link.springer.com/chapter/10.1007/978-3-030-01424-7_39
Original languageEnglish
Title of host publicationProceedings of 27th International Conference on Artificial Neural Networks
Place of PublicationRhodes, Greece
PublisherSpringer, Cham
Pages392-402
Number of pages10
ISBN (Electronic)978-3-030-01424-7
ISBN (Print)978-3-030-01423-0
DOIs
Publication statusPublished - Oct 2018
Event27th International Conference on Artificial Neural Networks - Rhodes, Greece
Duration: 4 Oct 20187 Oct 2018
https://e-nns.org/icann2018/

Conference

Conference27th International Conference on Artificial Neural Networks
Abbreviated titleICANN 2018
CountryGreece
CityRhodes
Period4/10/187/10/18
Internet address

Abstract

We show that the dynamics and convergence properties of SGD are set by the ratio of learning rate to batch size. We observe that this ratio is a key determinant of the generalization error, which we suggest is mediated by controlling the width of the final minima found by SGD. We verify our analysis experimentally on a range of deep neural networks and datasets.

Event

27th International Conference on Artificial Neural Networks

4/10/187/10/18

Rhodes, Greece

Event: Conference

Download statistics

No data available

ID: 75846465