Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English

Gongbo Tang, Rico Sennrich, Joakim Nivre

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent work has shown that deeper character-based neural machine translation (NMT) models can outperform subword-based models. However, it is still unclear what makes deeper character-based models successful. In this paper, we conduct an investigation into pure character-based models in the case of translating Finnish into English, including exploring the ability to learn word senses and morphological inflections and the attention mechanism. We demonstrate that word-level information is distributed over the entire character sequence rather than over a single character, and characters at different positions play different roles in learning linguistic knowledge. In addition, character-based models need more layers to encode word senses which explains why only deeper models outperform subword-based models. The attention distribution pattern shows that separators attract a lot of attention and we explore a sparse word-level attention to enforce character hidden states to capture the full word-level information. Experimental results show that the word-level attention with a single head results in 1.2 BLEU points drop.
Original languageEnglish
Title of host publicationProceedings of the 28th International Conference on Computational Linguistics
Place of PublicationBarcelona, Spain (Online)
PublisherInternational Committee on Computational Linguistics
Pages4251-4262
Number of pages12
ISBN (Print)978-1-952148-27-9
Publication statusPublished - 8 Dec 2020
EventThe 28th International Conference on Computational Linguistics - Online
Duration: 8 Dec 202013 Dec 2020
https://coling2020.org/

Conference

ConferenceThe 28th International Conference on Computational Linguistics
Abbreviated titleCOLING 2020
Period8/12/2013/12/20
Internet address

Fingerprint

Dive into the research topics of 'Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English'. Together they form a unique fingerprint.

Cite this