Projects per year
Abstract / Description of output
We show that the dynamics and convergence properties of SGD are set by the ratio of learning rate to batch size. We observe that this ratio is a key determinant of the generalization error, which we suggest is mediated by controlling the width of the final minima found by SGD. We verify our analysis experimentally on a range of deep neural networks and datasets.
Original language | English |
---|---|
Title of host publication | Proceedings of 27th International Conference on Artificial Neural Networks |
Place of Publication | Rhodes, Greece |
Publisher | Springer |
Pages | 392-402 |
Number of pages | 10 |
ISBN (Electronic) | 978-3-030-01424-7 |
ISBN (Print) | 978-3-030-01423-0 |
DOIs | |
Publication status | Published - Oct 2018 |
Event | 27th International Conference on Artificial Neural Networks - Rhodes, Greece Duration: 4 Oct 2018 → 7 Oct 2018 https://e-nns.org/icann2018/ |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer, Cham |
Volume | 11141 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Name | Theoretical Computer Science and General Issues |
---|---|
Volume | 11141 |
Conference
Conference | 27th International Conference on Artificial Neural Networks |
---|---|
Abbreviated title | ICANN 2018 |
Country/Territory | Greece |
City | Rhodes |
Period | 4/10/18 → 7/10/18 |
Internet address |
Fingerprint
Dive into the research topics of 'Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Bonseyes - Platform for Open Development of Systems of Artificial Intelligence
1/12/16 → 31/01/20
Project: Research