Abstract
This paper studies the problem of zero-short sketch-based image retrieval (ZS-SBIR), however with two significant differentiators to prior art (i) we tackle all variants (inter-category, intra-category, and cross datasets) of ZS-SBIR with just one network ("everything"), and (ii) we would really like to understand how this sketch-photo matching operates ("explainable"). Our key innovation lies with the realization that such a cross-modal matching problem could be reduced to comparisons of groups of key local patches -- akin to the seasoned "bag-of-words" paradigm. Just with this change, we are able to achieve both of the aforementioned goals, with the added benefit of no longer requiring external semantic knowledge. Technically, ours is a transformer-based cross-modal network, with three novel components (i) a self-attention module with a learnable tokenizer to produce visual tokens that correspond to the most informative local regions, (ii) a cross-attention module to compute local correspondences between the visual tokens across two modalities, and finally (iii) a kernel-based relation network to assemble local putative matches and produce an overall similarity metric for a sketch-photo pair. Experiments show ours indeed delivers superior performances across all ZS-SBIR settings. The all important explainable goal is elegantly achieved by visualizing cross-modal token correspondences, and for the first time, via sketch to photo synthesis by universal replacement of all matched photo patches.
Original language | English |
---|---|
Title of host publication | 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 23349-23358 |
Number of pages | 10 |
ISBN (Electronic) | 9798350301298 |
ISBN (Print) | 9798350301304 |
DOIs | |
Publication status | Published - 22 Aug 2023 |
Event | The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 - Vancouver Convention Center, Vancouver, Canada Duration: 18 Jun 2023 → 22 Jun 2023 https://cvpr2023.thecvf.com/ |
Publication series
Name | IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
---|---|
Publisher | IEEE |
ISSN (Print) | 1063-6919 |
ISSN (Electronic) | 2575-7075 |
Conference
Conference | The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 |
---|---|
Abbreviated title | CVPR 2023 |
Country/Territory | Canada |
City | Vancouver |
Period | 18/06/23 → 22/06/23 |
Internet address |