Abstract / Description of output
We study the relationship between word order freedom and preordering in statistical machine translation. To assess word order freedom, we first introduce a novel entropy measure which quantifies how difficult it is to predict word order given a source sentence and its syntactic analysis. We then address preordering for two target languages at the far ends of the word order freedom spectrum, German and Japanese, and argue that for languages with more word order freedom, attempting to predict a unique word order given source clues only is less justified. Subsequently, we examine lattices of n-best word order predictions as a unified representation for languages from across this broad spectrum and present an effective solution to a resulting technical issue, namely how to select a suitable source word order from the lattice during training. Our experiments show that lattices are crucial for good empirical performance for languages with freer word order (English–German) and can provide additional improvements for fixed word order languages (English–Japanese).
Original language | English |
---|---|
Title of host publication | Proceedings of the First Conference on Machine Translation, Volume 1: Research Papers |
Place of Publication | Berlin, Germany |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 118-130 |
Number of pages | 13 |
DOIs | |
Publication status | Published - 12 Aug 2016 |
Event | First Conference on Machine Translation - Berlin, Germany Duration: 11 Aug 2016 → 12 Aug 2016 http://www.statmt.org/wmt16/ |
Conference
Conference | First Conference on Machine Translation |
---|---|
Abbreviated title | WMT16 |
Country/Territory | Germany |
City | Berlin |
Period | 11/08/16 → 12/08/16 |
Internet address |