Projects per year
Abstract
We show that the dynamics and convergence properties of SGD are set by the ratio of learning rate to batch size. We observe that this ratio is a key determinant of the generalization error, which we suggest is mediated by controlling the width of the final minima found by SGD. We verify our analysis experimentally on a range of deep neural networks and datasets.
Original language | English |
---|---|
Title of host publication | Proceedings of 27th International Conference on Artificial Neural Networks |
Place of Publication | Rhodes, Greece |
Publisher | Springer, Cham |
Pages | 392-402 |
Number of pages | 10 |
ISBN (Electronic) | 978-3-030-01424-7 |
ISBN (Print) | 978-3-030-01423-0 |
DOIs | |
Publication status | Published - Oct 2018 |
Event | 27th International Conference on Artificial Neural Networks - Rhodes, Greece Duration: 4 Oct 2018 → 7 Oct 2018 https://e-nns.org/icann2018/ |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer, Cham |
Volume | 11141 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Name | Theoretical Computer Science and General Issues |
---|---|
Volume | 11141 |
Conference
Conference | 27th International Conference on Artificial Neural Networks |
---|---|
Abbreviated title | ICANN 2018 |
Country/Territory | Greece |
City | Rhodes |
Period | 4/10/18 → 7/10/18 |
Internet address |
Fingerprint
Dive into the research topics of 'Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Bonseyes - Platform for Open Development of Systems of Artificial Intelligence
1/12/16 → 31/01/20
Project: Research