Abstract / Description of output
This paper describes work in progress on devising automatic and parallel methods for geoparsing large digital historical textual data by combining the strengths of three natural language processing (NLP) tools, the Edinburgh Geoparser, spaCy and defoe, and employing different tokenisation and named entity recognition (NER) techniques. We apply these tools to a large collection of nineteenth century Scottish geographical dictionaries, and describe preliminary results obtained when processing this data.
Original language | English |
---|---|
Title of host publication | Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora |
Place of Publication | Marseille, France |
Publisher | European Language Resources Association (ELRA) |
Pages | 24–30 |
Number of pages | 7 |
ISBN (Electronic) | 979-10-95546-61-0 |
Publication status | Published - 16 May 2020 |
Event | 8th Workshop on the Challenges in the Management of Large Corpora - Duration: 16 May 2020 → 16 May 2020 http://corpora.ids-mannheim.de/cmlc-2020.html |
Workshop
Workshop | 8th Workshop on the Challenges in the Management of Large Corpora |
---|---|
Abbreviated title | CMLC-8 |
Period | 16/05/20 → 16/05/20 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- text mining
- geoparsing
- historical text
- Gazetteers of Scotland
- distributed queries
- Apache Spar
- digital tools