Empirical Methods for Compound Splitting

Philipp Koehn, Kevin Knight

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Compounded words are a challenge for NLP applications such as machine translation (MT). We introduce methods to learn splitting rules from monolingual and parallel corpora. We evaluate them against a gold standard and measure their impact on performance of statistical MT systems. Results show accuracy of 99.1% and performance gains for MT of 0.039 BLEU on a German-English noun phrase translation task.
Original languageEnglish
Title of host publicationProceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics - Volume 1
Place of PublicationStroudsburg, PA, USA
PublisherAssociation for Computational Linguistics
Pages187-193
Number of pages7
ISBN (Print)1-333-56789-0
DOIs
Publication statusPublished - 2003
EventTenth Conference on European Chapter of the Association for Computational Linguistics (EACL '03) - Agro Hotel, Budapest, Hungary
Duration: 12 Apr 200317 Apr 2003

Publication series

NameEACL '03
PublisherAssociation for Computational Linguistics

Conference

ConferenceTenth Conference on European Chapter of the Association for Computational Linguistics (EACL '03)
Country/TerritoryHungary
CityBudapest
Period12/04/0317/04/03

Fingerprint

Dive into the research topics of 'Empirical Methods for Compound Splitting'. Together they form a unique fingerprint.

Cite this