An Investigation of Decompounding for Cross-language Patent Search

Johannes Leveling, Walid Magdy, Gareth J.F. Jones

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Decompounding has been found to improve information retrieval (IR) effectiveness in general domains for languages such as German or Dutch. We investigate if cross-language patent retrieval can profit from decompounding. This poses two challenges: i) There may be few resources such as parallel corpora available for training an machine translation system for a compounding language. ii) Patents have a specific writing style and vocabulary ("patentese"), which may affect the performance of decompounding and translation methods. Experiments on data from the CLEF-IP 2010 task show that decompounding patents for translation can overcome out-of-vocabulary problems (OOV) and that decompounding improves IR performance significantly for small training corpora.
Original languageEnglish
Title of host publicationProceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval
Place of PublicationNew York, NY, USA
Number of pages2
ISBN (Print)978-1-4503-0757-4
Publication statusPublished - 2011

Publication series

NameSIGIR '11


Dive into the research topics of 'An Investigation of Decompounding for Cross-language Patent Search'. Together they form a unique fingerprint.

Cite this