Mapping Great Britain's semantic footprints through a large language model analysis of Reddit comments

Cillian Berragan*, Alex Singleton, Alessia Calafiore, Jeremy Morley

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Observed regional variation in geotagged social media text is often attributed to dialects, where features in language are assumed to exhibit region-specific properties. While dialects are seen as a key component in defining the identity of regions, there are a multitude of other geographic properties that may be captured within natural language text. In our work, we consider locational mentions that are directly embedded within comments on the social media website Reddit, providing a range of associated semantic information, and enabling deeper representations between locations to be captured. Using a large corpus of geoparsed Reddit comments from UK-related local discussion subreddits, we first extract embedded semantic information using a large language model, aggregated into local authority districts, representing the semantic footprint of these regions. These footprints broadly exhibit spatial autocorrelation, with clusters that conform with the national borders of Wales and Scotland. London, Wales, and Scotland also demonstrate notably different semantic footprints compared with the rest of Great Britain.
Original languageEnglish
Article number102121
Pages (from-to)1-12
Number of pages12
JournalComputers, Environment and Urban Systems
Volume110
Early online date26 Apr 2024
DOIs
Publication statusPublished - Jun 2024

Keywords / Materials (for Non-textual outputs)

  • Natural Language Processing
  • semantics
  • social media
  • vernacular geography

Fingerprint

Dive into the research topics of 'Mapping Great Britain's semantic footprints through a large language model analysis of Reddit comments'. Together they form a unique fingerprint.

Cite this