Abstract
Resources for morphological analysis should ideally be open, permissively licensed, have a wide coverage, and be regularly updated to reflect language change. Current German morphology analysers fail to meet one or several of these requirements. The most open tool, Morphisto (Zielinski and Simon, 2009), combines the SMOR grammar (Schmid et al., 2004) with an open lexicon, but the lexicon is only licensed for non-commercial use, and no scalable workflow is in place to maintain and extend it.
To address these issues, we present a tool to automatically extract a German morphological lexicon from Wiktionary. Wiktionary is open, permissively licensed, and has a a respectable size, with about 48 000 noun stems and 5500 verb stems at the time of this writing. Also, the crowd-sourced architecture of Wiktionary and its active community ensure that the lexicon will be updated to include new word forms, and reflect future changes in orthography.
The result of our extraction tool is a morphological lexicon that is compatible with the SMOR grammar, and can thus be compiled into a finite-state morphological analyser. Finite-state morphological analysers are important for processing morphologically productive language such as German, and SMOR has been used to improve NLP tasks such as parsing (Seeker et al., 2010; Sennrich et al., 2013) and statistical machine translation (Fritzinger and Fraser, 2010; Williams and Koehn, 2011).
To address these issues, we present a tool to automatically extract a German morphological lexicon from Wiktionary. Wiktionary is open, permissively licensed, and has a a respectable size, with about 48 000 noun stems and 5500 verb stems at the time of this writing. Also, the crowd-sourced architecture of Wiktionary and its active community ensure that the lexicon will be updated to include new word forms, and reflect future changes in orthography.
The result of our extraction tool is a morphological lexicon that is compatible with the SMOR grammar, and can thus be compiled into a finite-state morphological analyser. Finite-state morphological analysers are important for processing morphologically productive language such as German, and SMOR has been used to improve NLP tasks such as parsing (Seeker et al., 2010; Sennrich et al., 2013) and statistical machine translation (Fritzinger and Fraser, 2010; Williams and Koehn, 2011).
Original language | English |
---|---|
Title of host publication | Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) |
Place of Publication | Reykjavik, Iceland |
Publisher | European Language Resources Association (ELRA) |
Number of pages | 5 |
Publication status | Published - 1 May 2014 |