Influence of accurate compound noun splitting on bilingual vocabulary extraction

Marcin Junczys-Dowmunt

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

The influence of compound noun splitting on a German-Polish bilingual vocabulary extraction task is investigated. To accomplish this, several unsupervised methods for increasingly accurate compound noun splitting are introduced. Bilingual evidence from a parallel German-Polish corpus and co-occurrence counts from the web are used to disambiguate compound noun analyses directly. These collected splits serve as training data for a probabilistic model that abstracts away from the errors made by the direct methods and reaches an f-measure of 95.10%. Furthermore, these methods are evaluated in terms of word alignment quality and extraction accuracy where linguistically accurate methods are found to outperform the corpus-based methods proposed in the literature. A comparison of alignment quality achieved with the best splitting method and the baseline implies that the effort to build super- vised splitting methods might result in minimal or no performance gains.
Original languageEnglish
Title of host publicationText Resources and Lexical Knowledge
Subtitle of host publicationSelected Papers from the 9th Conference on Natural Language Processing KONVENS 2008
EditorsAngelika Storrer, Alexander Geyken, Alexander Siebert, Kay-Michael Würzner
PublisherDe Gruyter Mouton
Pages91-104
Number of pages14
Volume8
ISBN (Electronic)978-3-11-021181-8
DOIs
Publication statusPublished - 17 Oct 2008

Publication series

NameText, Translation, Computational Processing (TTCP)
PublisherDe Gruyter Mouton
Volume8
ISSN (Print)1861-4272

Fingerprint

Dive into the research topics of 'Influence of accurate compound noun splitting on bilingual vocabulary extraction'. Together they form a unique fingerprint.

Cite this