Abstract
We show that the dynamics and convergence properties of SGD are set by the ratio of learning rate to batch size. We observe that this ratio is a key determinant of the generalization error, which we suggest is mediated by controlling the width of the final minima found by SGD. We verify our analysis experimentally on a range of deep neural networks and datasets.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of 27th International Conference on Artificial Neural Networks |
| Place of Publication | Rhodes, Greece |
| Publisher | Springer |
| Pages | 392-402 |
| Number of pages | 10 |
| ISBN (Electronic) | 978-3-030-01424-7 |
| ISBN (Print) | 978-3-030-01423-0 |
| DOIs | |
| Publication status | Published - Oct 2018 |
| Event | 27th International Conference on Artificial Neural Networks - Rhodes, Greece Duration: 4 Oct 2018 → 7 Oct 2018 https://e-nns.org/icann2018/ |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Publisher | Springer, Cham |
| Volume | 11141 |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
| Name | Theoretical Computer Science and General Issues |
|---|---|
| Volume | 11141 |
Conference
| Conference | 27th International Conference on Artificial Neural Networks |
|---|---|
| Abbreviated title | ICANN 2018 |
| Country/Territory | Greece |
| City | Rhodes |
| Period | 4/10/18 → 7/10/18 |
| Internet address |
Fingerprint
Dive into the research topics of 'Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Bonseyes - Platform for Open Development of Systems of Artificial Intelligence
Storkey, A. (Principal Investigator) & O'Boyle, M. (Co-investigator)
1/12/16 → 31/01/20
Project: Research
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver