Abstract / Description of output
Recent research suggests that neural machine translation achieves parity with professional human translation on the WMT Chinese–English news translation task. We empirically test this claim with alternative evaluation protocols, contrasting the evaluation of single sentences and entire documents. In a pairwise ranking experiment, human raters assessing adequacy and fluency show a stronger preference for human over machine translation when evaluating documents as compared to isolated sentences. Our findings emphasise the need to shift towards document-level evaluation as machine translation improves to the degree that errors which are hard or impossible to spot at the sentence-level become decisive in discriminating quality of different translation outputs.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing |
Place of Publication | Brussels, Belgium |
Publisher | Association for Computational Linguistics |
Pages | 4791-4796 |
Number of pages | 6 |
Publication status | Published - Nov 2018 |
Event | 2018 Conference on Empirical Methods in Natural Language Processing - Square Meeting Center, Brussels, Belgium Duration: 31 Oct 2018 → 4 Nov 2018 http://emnlp2018.org/ |
Conference
Conference | 2018 Conference on Empirical Methods in Natural Language Processing |
---|---|
Abbreviated title | EMNLP 2018 |
Country/Territory | Belgium |
City | Brussels |
Period | 31/10/18 → 4/11/18 |
Internet address |