EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization

Ondrej Bohdal, Yongxin Yang, Timothy M Hospedales

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Gradient-based meta-learning and hyperparameter optimization have seen significant progress recently, enabling practical end-to-end training of neural networks together with many hyperparameters. Nevertheless, existing approaches are relatively expensive as they need to compute second-order derivatives and store a longer computational graph. This cost prevents scaling them to larger network architectures. We present EvoGrad, a new approach to meta-learning that draws upon evolutionary techniques to more efficiently compute hypergradients. EvoGrad estimates hypergradient with respect to hyperparameters without calculating second-order gradients, or storing a longer computational graph, leading to significant improvements in efficiency. We evaluate EvoGrad on two substantial recent meta-learning applications, namely cross-domain few-shot learning with feature-wise transformations and noisy label learning with MetaWeightNet. The results show that EvoGrad significantly improves efficiency and enables scaling meta-learning to bigger CNN architectures such as from ResNet18 to ResNet34.
Original languageEnglish
Title of host publication35th Conference on Neural Information Processing Systems (NeurIPS 2021)
PublisherNeural Information Processing Systems
Number of pages12
Publication statusPublished - 6 Dec 2021
Event35th Conference on Neural Information Processing Systems - Virtual
Duration: 6 Dec 202114 Dec 2021
https://nips.cc/

Publication series

NameAdvances in Neural Information Processing Systems
ISSN (Print)1049-5258

Conference

Conference35th Conference on Neural Information Processing Systems
Abbreviated titleNeurIPS 2021
Period6/12/2114/12/21
Internet address

Fingerprint

Dive into the research topics of 'EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization'. Together they form a unique fingerprint.

Cite this