Abstract
Computing accurate depth from multiple views is a fundamental and longstanding challenge in computer vision. However, most existing approaches do not generalize well across different domains and scene types (e.g. indoor vs. outdoor). Training a general-purpose multi-view stereo model is challenging and raises several questions, e.g. how to best make use of transformer-based architectures, how to incorporate additional metadata when there is a variable number of input views, and how to estimate the range of valid depths which can vary considerably across different scenes and is typically not known a priori? To address these issues, we introduce MVSA, a novel and versatile Multi-View Stereo architecture that aims to work Anywhere by generalizing across diverse domains and depth ranges. MVSA combines monocular and multi-view cues with an adaptive cost volume to deal with scale-related issues. We demonstrate state-of-the-art zero-shot depth estimation on the Robust Multi-View Depth Benchmark, surpassing existing multi-view stereo and monocular baselines
Original language | English |
---|---|
Title of host publication | Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 1-28 |
Number of pages | 28 |
Publication status | Accepted/In press - 26 Feb 2025 |
Event | The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025 - Music City Center, Nashville, United States Duration: 11 Jun 2025 → 15 Jun 2025 https://cvpr.thecvf.com/ |
Publication series
Name | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
---|---|
Publisher | Institute of Electrical and Electronics Engineers |
ISSN (Print) | 1063-6919 |
ISSN (Electronic) | 2575-7075 |
Conference
Conference | The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025 |
---|---|
Abbreviated title | CVPR 2025 |
Country/Territory | United States |
City | Nashville |
Period | 11/06/25 → 15/06/25 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- computer vision and pattern recognition