Abstract / Description of output
We present StyleBabel, a unique open access dataset of natural language captions and free-form tags describing the artistic style of over 135K digital artworks, collected via a novel participatory method from experts studying at specialist art and design schools. StyleBabel was collected via an iterative method, inspired by ‘Grounded Theory’: a qualitative approach that enables annotation while co-evolving a shared language for fine-grained artistic style attribute description. We demonstrate several downstream tasks for StyleBabel, adapting the recent ALADIN architecture for fine-grained style similarity, to train cross-modal embeddings for: 1) free-form tag generation; 2) natural language description of artistic style; 3) fine-grained text search of style. To do so, we extend ALADIN with recent advances in Visual Transformer (ViT) and cross-modal representation learning, achieving a state of the art accuracy in fine-grained style retrieval.
Original language | English |
---|---|
Title of host publication | Computer Vision – ECCV 2022 |
Editors | Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella |
Publisher | Springer |
Pages | 219-236 |
Number of pages | 18 |
ISBN (Electronic) | 9783031200748 |
ISBN (Print) | 9783031200731 |
DOIs | |
Publication status | Published - 12 Nov 2022 |
Event | European Conference on Computer Vision 2022 - Israel, Tel Aviv, Israel Duration: 23 Oct 2022 → 27 Oct 2022 https://eccv2022.ecva.net/ |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | European Conference on Computer Vision 2022 |
---|---|
Abbreviated title | ECCV 2022 |
Country/Territory | Israel |
City | Tel Aviv |
Period | 23/10/22 → 27/10/22 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- datasets and evaluation
- image and video retrieval
- vision and language
- vision applications and systems