A Shared Task on Multimodal Machine Translation and Crosslingual Image Description

Lucia Specia, Stella Frank, Khalil Sima'an, Desmond Elliott

Research output: Chapter in Book/Report/Conference proceedingConference contribution


This paper introduces and summarises the findings of a new shared task at the intersection of Natural Language Processing and Computer Vision: the generation of image descriptions in a target language, given an image and/or one or more descriptions in a different (source) language. This challenge was organised along with the Conference on Machine Translation (WMT16), and called for system submissions for two task variants: (i) a translation task, in which a source language image description needs to be translated to a target language, (optionally) with additional cues from the corresponding image, and (ii) a description generation task, in which a target language description needs to be generated for an image, (optionally) with additional cues from source language descriptions of the same image. In this first edition of the shared task, 16 systems were submitted for the translation task and seven for the image description task, from a total of 10 teams.
Original languageEnglish
Title of host publicationProceedings of the First Conference on Machine Translation, WMT 2016, colocated with ACL 2016, August 11-12, Berlin, Germany
PublisherAssociation for Computational Linguistics (ACL)
Number of pages11
Publication statusE-pub ahead of print - 12 Aug 2016
EventFirst Conference on Machine Translation - Berlin, Germany
Duration: 11 Aug 201612 Aug 2016


ConferenceFirst Conference on Machine Translation
Abbreviated titleWMT16
Internet address


Dive into the research topics of 'A Shared Task on Multimodal Machine Translation and Crosslingual Image Description'. Together they form a unique fingerprint.

Cite this