Projects per year
Abstract / Description of output
In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i.e., a narration is paired with an image. This poses significant challenges due to fine-grained image-text alignment, inherent ambiguity present in narrative language, and unavailability of large annotated training sets. To tackle these challenges, we present a data efficient semi-supervised approach that utilizes image-narration pairs to resolve coreferences and narrative grounding in a multimodal context. Our approach incorporates losses for both labeled and unlabeled data within a cross-modal framework. Our evaluation shows that the proposed approach outperforms strong baselines both quantitatively and qualitatively, for the tasks of coreference resolution and narrative grounding.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing |
Publisher | Association for Computational Linguistics |
Pages | 11067–11081 |
Number of pages | 15 |
ISBN (Electronic) | 979-8-89176-060-8 |
DOIs | |
Publication status | Published - 1 Dec 2023 |
Event | The 2023 Conference on Empirical Methods in Natural Language Processing - Resorts World Convention Centre, Sentosa, Singapore Duration: 6 Dec 2023 → 10 Dec 2023 Conference number: 28 https://2023.emnlp.org/ |
Conference
Conference | The 2023 Conference on Empirical Methods in Natural Language Processing |
---|---|
Abbreviated title | EMNLP 2023 |
Country/Territory | Singapore |
City | Sentosa |
Period | 6/12/23 → 10/12/23 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- cs.CL
- cs.CV
Fingerprint
Dive into the research topics of 'Semi-supervised multimodal coreference resolution in image narrations'. Together they form a unique fingerprint.Projects
- 1 Active
-
Visual AI: An Open World Interpretable Visual Transformer
Engineering and Physical Sciences Research Council
1/12/20 → 30/11/26
Project: Research