Zmorge: A German Morphological Lexicon Extracted from Wiktionary

Rico Sennrich, Beat Kunz

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Resources for morphological analysis should ideally be open, permissively licensed, have a wide coverage, and be regularly updated to reflect language change. Current German morphology analysers fail to meet one or several of these requirements. The most open tool, Morphisto (Zielinski and Simon, 2009), combines the SMOR grammar (Schmid et al., 2004) with an open lexicon, but the lexicon is only licensed for non-commercial use, and no scalable workflow is in place to maintain and extend it.

To address these issues, we present a tool to automatically extract a German morphological lexicon from Wiktionary. Wiktionary is open, permissively licensed, and has a a respectable size, with about 48 000 noun stems and 5500 verb stems at the time of this writing. Also, the crowd-sourced architecture of Wiktionary and its active community ensure that the lexicon will be updated to include new word forms, and reflect future changes in orthography.

The result of our extraction tool is a morphological lexicon that is compatible with the SMOR grammar, and can thus be compiled into a finite-state morphological analyser. Finite-state morphological analysers are important for processing morphologically productive language such as German, and SMOR has been used to improve NLP tasks such as parsing (Seeker et al., 2010; Sennrich et al., 2013) and statistical machine translation (Fritzinger and Fraser, 2010; Williams and Koehn, 2011).
Original languageEnglish
Title of host publicationProceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Place of PublicationReykjavik, Iceland
PublisherEuropean Language Resources Association (ELRA)
Number of pages5
Publication statusPublished - 1 May 2014


Dive into the research topics of 'Zmorge: A German Morphological Lexicon Extracted from Wiktionary'. Together they form a unique fingerprint.

Cite this