Parallel Sentence Mining by Constrained Decoding

Patrick Chen, Nikolay Bogoychev, Kenneth Heafield, Faheem Kirefu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a novel method to extract parallel sentences from two monolingual corpora, using neural machine translation. Our method relies on translating sentences in one corpus, but constraining the decoding by a prefix tree built on the other corpus. We argue that a neural machine translation system by itself can be a sentence similarity scorer and it efficiently approximates pairwise comparison with a modified beam search. When benchmarked on the BUCC shared task, our method achieves results comparable to other submissions.
Original languageEnglish
Title of host publicationProceedings of the 58th Annual Meeting of the Association for Computational Linguistics
PublisherAssociation for Computational Linguistics (ACL)
Pages1672–1678
Number of pages7
ISBN (Electronic)978-1-952148-25-5
DOIs
Publication statusPublished - 10 Jul 2020
Event2020 Annual Conference of the Association for Computational Linguistics - Hyatt Regency Seattle, Virtual conference, United States
Duration: 5 Jul 202010 Jul 2020
Conference number: 58
https://acl2020.org/

Conference

Conference2020 Annual Conference of the Association for Computational Linguistics
Abbreviated titleACL 2020
Country/TerritoryUnited States
CityVirtual conference
Period5/07/2010/07/20
Internet address

Fingerprint

Dive into the research topics of 'Parallel Sentence Mining by Constrained Decoding'. Together they form a unique fingerprint.

Cite this