Multiword expression aware neural machine translation

Andreas Zaninello, Alexandra Birch-Mayne

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Multiword Expressions (MWEs) are a frequently occurring phenomenon found in all natural languages that is of great importance to linguistic theory, natural language processing applications, and machine translation systems. Neural Machine Translation (NMT) architectures do not handle these expressions well and previous studies have rarely addressed MWEs in this framework. In this work, we show that annotation and data augmentation, using external linguistic resources, can improve both translation of MWEs that occur in the source, and the generation of MWEs on the target, and increase performance by up to 5.09 BLEU points on MWE test sets. We also devise a MWE score to specifically assess the quality of MWE translation which agrees with human evaluation. We make available the MWE score implementation–along with MWE-annotated training sets and corpus-based lists of MWEs–for reproduction and extension.
Original languageEnglish
Title of host publicationProceedings of The 12th Language Resources and Evaluation Conference
PublisherEuropean Language Resources Association (ELRA)
Pages3816–3825
Number of pages10
ISBN (Print)979-10-95546-34-4
Publication statusPublished - 16 May 2020
Event12th Language Resources and Evaluation Conference - Le Palais du Pharo, Marseille, France
Duration: 11 May 202016 May 2020
Conference number: 12
https://lrec2020.lrec-conf.org/en/

Conference

Conference12th Language Resources and Evaluation Conference
Abbreviated titleLREC 2020
Country/TerritoryFrance
CityMarseille
Period11/05/2016/05/20
Internet address

Keywords

  • multiword expressions
  • neural machine translation
  • evaluation

Fingerprint

Dive into the research topics of 'Multiword expression aware neural machine translation'. Together they form a unique fingerprint.

Cite this