A Neural Model for Part-of-Speech Tagging in Historical Texts

Christian Hardmeier

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Historical texts are challenging for natural language processing because they differ linguistically from modern texts and because of their lack of orthographical and grammatical standardisation. We use a character-level neural network to build a part-of-speech (POS) tagger that can process historical data directly without requiring a separate spelling normalisation stage. Its performance in a Swedish verb identification and a German POS tagging task is similar to that of a two-stage model. We analyse the performance of this tagger and a more traditional baseline system, discuss some of the remaining problems for tagging historical data and suggest how the flexibility of our neural tagger could be exploited to address diachronic divergences in morphology and syntax in early modern Swedish with the help of data from closely related languages.
Original languageEnglish
Title of host publicationProceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Place of PublicationOsaka, Japan
PublisherThe COLING 2016 Organizing Committee
Pages922-931
Number of pages10
ISBN (Electronic)978-4-87974-702-0
Publication statusPublished - 16 Dec 2016
Event26th International Conference on Computational Linguistics - Osaka, Japan
Duration: 11 Dec 201616 Dec 2016
http://coling2016.anlp.jp/

Conference

Conference26th International Conference on Computational Linguistics
Abbreviated titleCOLING 2016
CountryJapan
CityOsaka
Period11/12/1616/12/16
Internet address

Fingerprint Dive into the research topics of 'A Neural Model for Part-of-Speech Tagging in Historical Texts'. Together they form a unique fingerprint.

Cite this