Image Pivoting for Learning Multilingual Multimodal Representations

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model learns a common representation for images and their descriptions in two different languages (which need not be parallel) by considering the image as a pivot between two languages. We introduce a new pairwise ranking loss function which can handle both symmetric and asymmetric similarity between the two modalities. We evaluate our models on image-description ranking for German and English, and on semantic textual similarity of image descriptions in English. In both cases we achieve state-of-the-art performance.
Original languageEnglish
Title of host publicationProceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Place of PublicationCopenhagen, Denmark
PublisherAssociation for Computational Linguistics (ACL)
Pages2839-2845
Number of pages7
DOIs
Publication statusPublished - 11 Sep 2017
EventEMNLP 2017: Conference on Empirical Methods in Natural Language Processing - Copenhagen, Denmark
Duration: 7 Sep 201711 Sep 2017
http://emnlp2017.net/index.html
http://emnlp2017.net/

Conference

ConferenceEMNLP 2017: Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2017
Country/TerritoryDenmark
CityCopenhagen
Period7/09/1711/09/17
Internet address

Fingerprint

Dive into the research topics of 'Image Pivoting for Learning Multilingual Multimodal Representations'. Together they form a unique fingerprint.

Cite this