Learning to Interpret and Describe Abstract Scenes

Luis Gilberto Mateos Ortiz, Clemens Wolff, Mirella Lapata

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Given a (static) scene, a human can effortlessly describe what is going on (who is doing what to whom, how, and why). The process requires knowledge about the world, how it is perceived, and described. In this paper we study the problem of interpreting and verbalizing visual information using abstract scenes created from collections of clip art images. We propose a model inspired by machine translation operating over a large parallel corpus of visual relations and linguistic descriptions. We demonstrate that this approach produces human-like scene descriptions which are both fluent and relevant, outperforming a number of competitive alternatives based on templates, sentence-based retrieval, and a multi-modal neural language model.
Original languageEnglish
Title of host publicationProceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Place of PublicationDenver, Colorado
PublisherAssociation for Computational Linguistics
Pages1505-1515
Number of pages11
Publication statusPublished - 1 May 2015

Fingerprint

Dive into the research topics of 'Learning to Interpret and Describe Abstract Scenes'. Together they form a unique fingerprint.

Cite this