Why Do Self-Supervised Models Transfer? On the Impact of Invariance on Downstream Tasks

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Self-supervised learning is a powerful paradigm for representation learning on unlabelled images. A wealth of effective new methods based on instance matching rely on data-augmentation to drive learning, and these have reached a rough agreement on an augmentation scheme that optimises popular recognition benchmarks. However, there is strong reason to suspect that different tasks in computer vision require features to encode different (in)variances, and therefore likely require different augmentation strategies. In this paper, we measure the invariances learned by contrastive methods and confirm that they do learn invariance to the augmentations used and further show that this invariance largely transfers to related real-world changes in pose and lighting. We show that learned invariances strongly affect downstream task performance and confirm that different downstream tasks benefit from polar opposite (in)variances, leading to performance loss when the standard augmentation strategy is used. Finally, we demonstrate that a simple fusion of representations with complementary invariances ensures wide transferability to all the diverse downstream tasks considered.
Original languageEnglish
Title of host publicationProceedings of the 33rd British Machine Vision Conference 2022, (BMVC 2022)
PublisherBMVA Press
Number of pages14
Publication statusPublished - 25 Nov 2022
EventThe 33rd British Machine Vision Conference, 2022 - London, United Kingdom
Duration: 21 Nov 202224 Nov 2022
Conference number: 33
https://www.bmvc2022.org/

Conference

ConferenceThe 33rd British Machine Vision Conference, 2022
Abbreviated titleBMVC 2022
Country/TerritoryUnited Kingdom
CityLondon
Period21/11/2224/11/22
Internet address

Fingerprint

Dive into the research topics of 'Why Do Self-Supervised Models Transfer? On the Impact of Invariance on Downstream Tasks'. Together they form a unique fingerprint.

Cite this