Multi30K: Multilingual English-German Image Descriptions

Desmond Elliott, Stella Frank, Khalil Sima'an, Lucia Specia

Research output: Chapter in Book/Report/Conference proceedingConference contribution


We introduce the Multi30K dataset to stimulate multilingual multimodal research. Recent advances in image description have been demonstrated on English language datasets almost exclusively, but image description should not be limited to English. This dataset extends the Flickr30K dataset with i) German translations created by professional translators over a subset of the English descriptions, and ii) German descriptions crowd sourced independently of the original English descriptions. We describe the data and outline how it can be used for multilingual image description and multimodal machine translation, but we anticipate the data will be useful for a broader range of tasks.
Original languageEnglish
Title of host publicationProceedings of the 5th Workshop on Vision and Language, hosted by the 54th Annual Meeting of the Association for Computational Linguistics, VL@ACL 2016, August 12, Berlin, Germany
PublisherAssociation for Computational Linguistics (ACL)
Number of pages5
Publication statusPublished - 12 Aug 2016
Event5th Workshop on Vision and Language - Berlin, Germany
Duration: 12 Aug 201612 Aug 2016


Conference5th Workshop on Vision and Language
Abbreviated titleVL 2016
Internet address

Cite this