Abstract
Describing the main event of an image involves
identifying the objects depicted and
predicting the relationships between them.
Previous approaches have represented images
as unstructured bags of regions, which makes
it difficult to accurately predict meaningful
relationships between regions. In this paper,
we introduce visual dependency representations
to capture the relationships between
the objects in an image, and hypothesize that
this representation can improve image description.
We test this hypothesis using a
new data set of region-annotated images, associated
with visual dependency representations
and gold-standard descriptions. We describe
two template-based description generation
models that operate over visual dependency
representations. In an image description
task, we find that these models outperform
approaches that rely on object proximity
or corpus information to generate descriptions
on both automatic measures and on human
judgements.
identifying the objects depicted and
predicting the relationships between them.
Previous approaches have represented images
as unstructured bags of regions, which makes
it difficult to accurately predict meaningful
relationships between regions. In this paper,
we introduce visual dependency representations
to capture the relationships between
the objects in an image, and hypothesize that
this representation can improve image description.
We test this hypothesis using a
new data set of region-annotated images, associated
with visual dependency representations
and gold-standard descriptions. We describe
two template-based description generation
models that operate over visual dependency
representations. In an image description
task, we find that these models outperform
approaches that rely on object proximity
or corpus information to generate descriptions
on both automatic measures and on human
judgements.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL |
Pages | 1292-1302 |
Number of pages | 11 |
Publication status | Published - Oct 2013 |