Abstract
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning, including hyperparameter optimization, loss function learning, few-shot learning, invariance learning and more. These problems are often formalized as Bi-Level optimizations (BLO). We introduce a novel perspective by turning a given BLO problem into a stochastic optimization, where the inner loss function becomes a smooth probability distribution, and the outer loss becomes an expected loss over the inner distribution. To solve this stochastic optimization, we adopt Stochastic Gradient Langevin Dynamics (SGLD) MCMC to sample inner distribution, and propose a recurrent algorithm to compute the MC-estimated hypergradient. Our derivation is similar to forward-mode differentiation, but we introduce a new first-order approximation that makes it feasible for large models without needing to store huge Jacobian matrices. The main benefits are two-fold: i) Our stochastic formulation takes into account uncertainty, which makes the method robust to suboptimal inner optimization or non-unique multiple inner minima due to overparametrization; ii) Compared to existing methods that often exhibit unstable behavior and hyperparameter sensitivity in practice, our method leads to considerably more reliable solutions. We demonstrate that the new approach achieves promising results on diverse meta learning problems and easily scales to learning 87M hyperparameters in the case of Vision Transformers.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence |
| Editors | Toby Walsh, Julie Shah, Zico Kolter |
| Place of Publication | Washington, DC, USA |
| Publisher | AAAI Press |
| Pages | 17913-17920 |
| Number of pages | 8 |
| ISBN (Electronic) | 9781577358978 |
| DOIs | |
| Publication status | Published - 11 Apr 2025 |
| Event | The 39th Annual AAAI Conference on Artificial Intelligence - Pennsylvania Convention Center, Philadelphia, United States Duration: 25 Feb 2025 → 4 Mar 2025 Conference number: 39 https://aaai.org/conference/aaai/aaai-25/ |
Publication series
| Name | Proceedings of the AAAI Conference on Artificial Intelligence |
|---|---|
| Publisher | AAAI Press |
| Number | 17 |
| Volume | 39 |
| ISSN (Print) | 2159-5399 |
| ISSN (Electronic) | 2374-3468 |
Conference
| Conference | The 39th Annual AAAI Conference on Artificial Intelligence |
|---|---|
| Abbreviated title | AAAI-25 |
| Country/Territory | United States |
| City | Philadelphia |
| Period | 25/02/25 → 4/03/25 |
| Internet address |
Keywords / Materials (for Non-textual outputs)
- machine learning
Fingerprint
Dive into the research topics of 'A stochastic approach to Bi-Level optimization for hyperparameter optimization and meta learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver