Sentence Compression for Arbitrary Languages via Multilingual Pivoting

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we advocate the use of bilingual corpora which are abundantly available for training sentence compression models. Our approach borrows much of its machinery from neural machine translation and leverages bilingual pivoting: compressions are obtained by translating a source string into a foreign language and then back-translating it into the source while controlling the translation length. Our model can be trained for any language as long as a bilingual corpus is available and performs arbitrary rewrites without access to compression specific data. We release1 MOSS, a new parallel Multilingual Compression dataset for English, German, and French which can be used to evaluate compression models across languages and genres.
Original languageEnglish
Title of host publication2018 Conference on Empirical Methods in Natural Language Processing
Place of PublicationBrussels, Belgium
PublisherAssociation for Computational Linguistics
Pages2453-2464
Number of pages12
Publication statusPublished - Nov 2018
Event2018 Conference on Empirical Methods in Natural Language Processing - Square Meeting Center, Brussels, Belgium
Duration: 31 Oct 20184 Nov 2018
http://emnlp2018.org/

Conference

Conference2018 Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2018
CountryBelgium
CityBrussels
Period31/10/184/11/18
Internet address

Fingerprint Dive into the research topics of 'Sentence Compression for Arbitrary Languages via Multilingual Pivoting'. Together they form a unique fingerprint.

Cite this