Geoparsing the Historical Gazetteers of Scotland: Accurately Computing Location in Mass Digitised Texts

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes work in progress on devising automatic and parallel methods for geoparsing large digital historical textual data by combining the strengths of three natural language processing (NLP) tools, the Edinburgh Geoparser, spaCy and defoe, and employing different tokenisation and named entity recognition (NER) techniques. We apply these tools to a large collection of nineteenth century Scottish geographical dictionaries, and describe preliminary results obtained when processing this data.
Original languageEnglish
Title of host publicationProceedings of the 8th Workshop on Challenges in the Management of Large Corpora
Place of PublicationMarseille, France
PublisherEuropean Language Resources Association (ELRA)
Pages24–30
Number of pages7
ISBN (Electronic)979-10-95546-61-0
Publication statusPublished - 16 May 2020
Event8th Workshop on the Challenges in the Management of Large Corpora -
Duration: 16 May 202016 May 2020
http://corpora.ids-mannheim.de/cmlc-2020.html

Workshop

Workshop8th Workshop on the Challenges in the Management of Large Corpora
Abbreviated titleCMLC-8
Period16/05/2016/05/20
Internet address

Keywords

  • text mining
  • geoparsing
  • historical text
  • Gazetteers of Scotland
  • distributed queries
  • Apache Spar
  • digital tools

Fingerprint Dive into the research topics of 'Geoparsing the Historical Gazetteers of Scotland: Accurately Computing Location in Mass Digitised Texts'. Together they form a unique fingerprint.

Cite this