TY - GEN
T1 - Empirical Methods for Compound Splitting
AU - Koehn, Philipp
AU - Knight, Kevin
PY - 2003
Y1 - 2003
N2 - Compounded words are a challenge for NLP applications such as machine translation (MT). We introduce methods to learn splitting rules from monolingual and parallel corpora. We evaluate them against a gold standard and measure their impact on performance of statistical MT systems. Results show accuracy of 99.1% and performance gains for MT of 0.039 BLEU on a German-English noun phrase translation task.
AB - Compounded words are a challenge for NLP applications such as machine translation (MT). We introduce methods to learn splitting rules from monolingual and parallel corpora. We evaluate them against a gold standard and measure their impact on performance of statistical MT systems. Results show accuracy of 99.1% and performance gains for MT of 0.039 BLEU on a German-English noun phrase translation task.
U2 - 10.3115/1067807.1067833
DO - 10.3115/1067807.1067833
M3 - Conference contribution
SN - 1-333-56789-0
T3 - EACL '03
SP - 187
EP - 193
BT - Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics - Volume 1
PB - Association for Computational Linguistics
CY - Stroudsburg, PA, USA
T2 - Tenth Conference on European Chapter of the Association for Computational Linguistics (EACL '03)
Y2 - 12 April 2003 through 17 April 2003
ER -