Abstract
Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on massive devices simultaneously without the control of a central server. However, existing theories claim that decentralization invariably undermines generalization. In this paper, we challenge the conventional belief and present a completely new perspective for understanding decentralized learning. We prove that D-SGD implicitly minimizes the loss function of an average-direction Sharpness-aware minimization (SAM) algorithm under general non-convex non-β-smooth settings. This surprising asymptotic equivalence reveals an intrinsic regularization-optimization trade-off and three advantages of decentralization: (1) there exists a free uncertainty evaluation mechanism in D-SGD to improve posterior estimation; (2) D-SGD exhibits a gradient smoothing effect; and (3) the sharpness regularization effect of D-SGD does not decrease as total batch size increases, which justifies the potential generalization benefit of D-SGD over centralized SGD (C-SGD) in large-batch scenarios.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 40th International Conference on Machine Learning |
| Publisher | PMLR |
| Pages | 43005-43036 |
| Number of pages | 32 |
| Volume | 202 |
| Publication status | Published - 10 Jul 2023 |
| Event | The Fortieth International Conference on Machine Learning - Honolulu, United States Duration: 23 Jul 2023 → 29 Jul 2023 Conference number: 40 https://icml.cc/ |
Publication series
| Name | Proceedings of Machine Learning Research |
|---|---|
| Publisher | PMLR |
| ISSN (Electronic) | 2640-3498 |
Conference
| Conference | The Fortieth International Conference on Machine Learning |
|---|---|
| Abbreviated title | ICML 2023 |
| Country/Territory | United States |
| City | Honolulu |
| Period | 23/07/23 → 29/07/23 |
| Internet address |
Fingerprint
Dive into the research topics of 'Decentralized SGD and Average-direction SAM are Asymptotically Equivalent'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver