Abstract
Reordering is a serious challenge in statistical
machine translation. We propose
a method for analysing syntactic reordering
in parallel corpora and apply it to understanding
the differences in the performance
of SMT systems. Results at recent
large-scale evaluation campaigns show
that synchronous grammar-based statistical
machine translation models produce
superior results for language pairs such as
Chinese to English. However, for language
pairs such as Arabic to English, phrasebased
approaches continue to be competitive.
Until now, our understanding of these
results has been limited to differences in
BLEU scores. Our analysis shows that current
state-of-the-art systems fail to capture
the majority of reorderings found in real
data.
Original language | English |
---|---|
Title of host publication | Proceedings of the Fourth Workshop on Statistical Machine Translation 2009 |
Publisher | Association for Computational Linguistics |
Pages | 197-205 |
Number of pages | 9 |
Publication status | Published - 2009 |