Abstract / Description of output
We highlight several issues in the evaluation of historical text normalization systems that make it hard to tell how well these systems would actually work in practice—i.e., for new datasets or languages; in comparison to more naïve systems; or as a preprocessing step for downstream NLP tools. We illustrate these issues and exemplify our proposed evaluation practices by comparing two neural models against a naïve baseline system. We show that the neural models generalize well to unseen words in tests on five languages; nevertheless, they provide no clear benefit over the naïve baseline for downstream POS tagging of an English historical collection. We conclude that future work should include more rigorous evaluation, including both intrinsic and extrinsic measures where possible.
Original language | English |
---|---|
Title of host publication | 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
Place of Publication | New Orleans, Louisiana |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 720-725 |
Number of pages | 6 |
DOIs | |
Publication status | Published - 6 Jun 2018 |
Event | 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Hyatt Regency New Orleans Hotel, New Orleans, United States Duration: 1 Jun 2018 → 6 Jun 2018 http://naacl2018.org/ |
Conference
Conference | 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
---|---|
Abbreviated title | NAACL HLT 2018 |
Country/Territory | United States |
City | New Orleans |
Period | 1/06/18 → 6/06/18 |
Internet address |