Language Modeling, Lexical Translation, Reordering: The Training Process of NMT through the Lens of Classical SMT

Elena Voita, Rico Sennrich, Ivan Titov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Differently from the traditional statistical MT that decomposes the translation task into distinct separately learned components, neural machine translation uses a single neural network to model the entire translation process. Despite neural machine translation being de-facto standard, it is still not clear how NMT models acquire different competences over the course of training, and how this mirrors the different models in traditional SMT. In this work, we look at the competences related to three core SMT components and find that during training, NMT first focuses on learning target-side language modeling, then improves translation quality approaching word-by-word translation, and finally learns more complicated reordering patterns. We show that this behavior holds for several models and language pairs. Additionally, we explain how such an understanding of the training process can be useful in practice and, as an example, show how it can be used to improve vanilla non-autoregressive neural machine translation by guiding teacher model selection.
Original languageEnglish
Title of host publicationProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Place of PublicationOnline and Punta Cana, Dominican Republic
PublisherAssociation for Computational Linguistics
Pages8478-8491
Number of pages14
ISBN (Electronic)978-1-955917-09-4
Publication statusPublished - 7 Nov 2021
Event2021 Conference on Empirical Methods in Natural Language Processing - Punta Cana, Dominican Republic
Duration: 7 Nov 202111 Nov 2021
https://2021.emnlp.org/

Conference

Conference2021 Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2021
Country/TerritoryDominican Republic
CityPunta Cana
Period7/11/2111/11/21
Internet address

Fingerprint

Dive into the research topics of 'Language Modeling, Lexical Translation, Reordering: The Training Process of NMT through the Lens of Classical SMT'. Together they form a unique fingerprint.

Cite this