Projects per year
Abstract
Large-scale pre-trained vision models are becoming increasingly prevalent, offering expressive and generalizable visual representations that benefit various downstream tasks. Recent studies on the emergent properties of these models have revealed their high-level geometric understanding, in particular in the context of depth perception. However, it remains unclear how depth perception arises in these models without explicit depth supervision provided during pre-training. To investigate this, we examine whether the monocular depth cues, similar to those used by the human visual system, emerge in these models. We introduce a new benchmark, DepthCues, designed to evaluate depth cue understanding, and present findings across 20 diverse and representative pre-trained vision models. Our analysis shows that human-like depth cues emerge in more recent larger models. We also explore enhancing depth perception in large vision models by fine-tuning on DepthCues, and find that even without dense depth supervision, this improves depth estimation. To support further research, our benchmark and evaluation code will be made publicly available for studying depth perception in vision models.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 1-21 |
Number of pages | 21 |
Publication status | Accepted/In press - 26 Feb 2025 |
Event | The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025 - Music City Center, Nashville, United States Duration: 11 Jun 2025 → 15 Jun 2025 https://cvpr.thecvf.com/ |
Publication series
Name | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
---|---|
Publisher | Institute of Electrical and Electronics Engineers |
ISSN (Print) | 1063-6919 |
ISSN (Electronic) | 2575-7075 |
Conference
Conference | The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025 |
---|---|
Abbreviated title | CVPR 2025 |
Country/Territory | United States |
City | Nashville |
Period | 11/06/25 → 15/06/25 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- computer vision and pattern recognition
Fingerprint
Dive into the research topics of 'DepthCues: Evaluating monocular depth perception in large vision models'. Together they form a unique fingerprint.Projects
- 2 Active
-
TEAMER : Teaching Machines to Reason Like Humans
Lapata, M. (Principal Investigator)
Engineering and Physical Sciences Research Council
1/10/21 → 30/09/26
Project: Research
-
Visual AI: An Open World Interpretable Visual Transformer
Bilen, H. (Principal Investigator)
Engineering and Physical Sciences Research Council
1/12/20 → 30/11/26
Project: Research