Abstract
We present a novel method to extract parallel sentences from two monolingual corpora, using neural machine translation. Our method relies on translating sentences in one corpus, but constraining the decoding by a prefix tree built on the other corpus. We argue that a neural machine translation system by itself can be a sentence similarity scorer and it efficiently approximates pairwise comparison with a modified beam search. When benchmarked on the BUCC shared task, our method achieves results comparable to other submissions.
Original language | English |
---|---|
Title of host publication | Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1672–1678 |
Number of pages | 7 |
ISBN (Electronic) | 978-1-952148-25-5 |
DOIs | |
Publication status | Published - 10 Jul 2020 |
Event | 2020 Annual Conference of the Association for Computational Linguistics - Hyatt Regency Seattle, Virtual conference, United States Duration: 5 Jul 2020 → 10 Jul 2020 Conference number: 58 https://acl2020.org/ |
Conference
Conference | 2020 Annual Conference of the Association for Computational Linguistics |
---|---|
Abbreviated title | ACL 2020 |
Country/Territory | United States |
City | Virtual conference |
Period | 5/07/20 → 10/07/20 |
Internet address |