Abstract / Description of output
Coreference resolution aims to identify words and phrases which refer to same entity in a text, a core task in natural language processing. In this paper, we extend this task to resolving coreferences in long-form narrations of visual scenes. First we introduce a new dataset with annotated coreference chains and their bounding boxes, as most existing image-text datasets only contain short sentences without coreferring expressions or labeled chains. We propose a new technique that learns to identify coreference chains using weak supervision, only from image-text pairs and a regularization using prior linguistic knowledge. Our model yields large performance gains over several strong baselines in resolving coreferences. We also show that coreference resolution helps improving grounding narratives in images.
Original language | English |
---|---|
Title of host publication | 2023 IEEE/CVF International Conference on Computer Vision (ICCV) |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 15201-15212 |
Number of pages | 12 |
ISBN (Electronic) | 979-8-3503-0718-4 |
ISBN (Print) | 979-8-3503-0719-1 |
DOIs | |
Publication status | Published - 15 Jan 2024 |
Event | International Conference on Computer Vision 2023 - Paris, France Duration: 2 Oct 2023 → 6 Oct 2023 https://iccv2023.thecvf.com/ |
Publication series
Name | Proceedings of the IEEE International Conference on Computer Vision |
---|---|
Publisher | IEEE |
ISSN (Print) | 1550-5499 |
ISSN (Electronic) | 2380-7504 |
Conference
Conference | International Conference on Computer Vision 2023 |
---|---|
Abbreviated title | ICCV 2023 |
Country/Territory | France |
City | Paris |
Period | 2/10/23 → 6/10/23 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- cs.CV
- cs.CL