Transformer based named entity recognition for place name extraction from unstructured text

Cillian Berragan*, Alex Singleton, Alessia Calafiore, Jeremy Morley

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Place names embedded in online natural language text present a useful source of geographic information. Despite this, many methods for the extraction of place names from text use pre-trained models that were not explicitly designed for this task. Our paper builds five custom-built Named Entity Recognition (NER) models and evaluates them against three popular pre-built models for place name extraction. The models are evaluated using a set of manually annotated Wikipedia articles with reference to the F1 score metric. Our best performing model achieves an F1 score of 0.939 compared with 0.730 for the best performing pre-built model. Our model is then used to extract all place names from Wikipedia articles in Great Britain, demonstrating the ability to more accurately capture unknown place names from volunteered sources of online geographic information.

Original languageEnglish
Pages (from-to)1-20
Number of pages20
JournalInternational Journal of Geographical Information Science
Early online date17 Oct 2022
DOIs
Publication statusE-pub ahead of print - 17 Oct 2022

Keywords / Materials (for Non-textual outputs)

  • named entity recognition
  • natural language processing
  • place name extraction
  • volunteered geographic information

Fingerprint

Dive into the research topics of 'Transformer based named entity recognition for place name extraction from unstructured text'. Together they form a unique fingerprint.

Cite this