Context Sensitive Neural Lemmatization with Lematus

Toms Bergmanis, Sharon Goldwater

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The main motivation for developing contextsensitive lemmatizers is to improve performance on unseen and ambiguous words. Yet previous systems have not carefully evaluated whether the use of context actually helps in these cases. We introduce Lematus, a lemmatizer based on a standard encoder-decoder architecture, which incorporates character-level sentence context. We evaluate its lemmatization accuracy across 20 languages in both a full data setting and a
lower-resource setting with 10k training examples in each language. In both settings, we show that including context significantly improves results against a context-free version of the model. Context helps more for ambiguous words than for unseen words, though the latter have a greater effect on overall performance differences between languages. We also compare to three previous context-sensitive lemmatization systems, which all use pre-extracted edit trees as well as
hand-selected features and/or additional sources of information such as tagged training data. Without using any of these, our context-sensitive model outperforms the best competitor system (Lemming) in the full-data setting, and performs on par in the lower-resource setting.
Original languageEnglish
Title of host publication16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Place of PublicationNew Orleans, Louisiana
PublisherAssociation for Computational Linguistics (ACL)
Pages1391-1400
Number of pages10
DOIs
Publication statusPublished - 6 Jun 2018
Event16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Hyatt Regency New Orleans Hotel, New Orleans, United States
Duration: 1 Jun 20186 Jun 2018
http://naacl2018.org/

Conference

Conference16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Abbreviated titleNAACL HLT 2018
Country/TerritoryUnited States
CityNew Orleans
Period1/06/186/06/18
Internet address

Cite this