Projects per year
We show that the dynamics and convergence properties of SGD are set by the ratio of learning rate to batch size. We observe that this ratio is a key determinant of the generalization error, which we suggest is mediated by controlling the width of the final minima found by SGD. We verify our analysis experimentally on a range of deep neural networks and datasets.
|Title of host publication||Proceedings of 27th International Conference on Artificial Neural Networks|
|Place of Publication||Rhodes, Greece|
|Number of pages||10|
|Publication status||Published - Oct 2018|
|Event||27th International Conference on Artificial Neural Networks - Rhodes, Greece|
Duration: 4 Oct 2018 → 7 Oct 2018
|Name||Lecture Notes in Computer Science|
|Name||Theoretical Computer Science and General Issues|
|Conference||27th International Conference on Artificial Neural Networks|
|Abbreviated title||ICANN 2018|
|Period||4/10/18 → 7/10/18|
FingerprintDive into the research topics of 'Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio'. Together they form a unique fingerprint.
- 1 Finished
1/12/16 → 31/01/20