Worst-case Feature Risk Minimization for Data-Efficient Learning

Jingshi Lei, Da Li, Chengming Xu, Liming Fang, Timothy M Hospedales, Yanwei Fu

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Deep learning models typically require massive amounts of annotated data to train a strong model for a task of interest. However, data annotation is time-consuming and costly. How to use labeled data from a related but distinct domain, or just a few samples to train a satisfactory model are thus important questions. To achieve this goal, models should resist overfitting to the specifics of the training data in order to generalize well to new data. This paper proposes a novel Worst-case Feature Risk Minimization (WFRM) method that helps improve model generalization. Specifically, we tackle a minimax optimization problem in feature space at each training iteration. Given the input features, we seek the feature perturbation that maximizes the current training loss and then minimizes the training loss of the worst-case features. By incorporating our WFRM during training, we significantly improve model generalization under distributional shift – Domain Generalization (DG) and in the low-data regime – Few-shot Learning (FSL). We theoretically analyze WFRM and find the key reason why it works better than ERM – it induces an empirical risk-based semi-adaptive L2 regularization of the classifier weights, enabling a better risk-complexity trade-off. We evaluate WFRM on two data-efficient learning tasks, including three standard DG benchmarks of PACS, VLCS, OfficeHome and the most challenging FSL benchmark Meta-Dataset. Despite the simplicity, our method consistently improves various DG and FSL methods, leading to the new state-of-the-art performances in all settings.
Original languageEnglish
Pages (from-to)1-19
Number of pages19
JournalTransactions on Machine Learning Research
Volume2023
Issue number09
Publication statusPublished - 26 Oct 2023

Fingerprint

Dive into the research topics of 'Worst-case Feature Risk Minimization for Data-Efficient Learning'. Together they form a unique fingerprint.

Cite this