Abstract / Description of output

The communities that we live in affect our health in ways that are complex and hard to define. Moreover, our understanding of the place-based processes affecting health and inequalities is limited. This undermines the development of robust policy interventions to improve local health and well-being. News media provides social and community information that may be useful in health studies. Here we propose a methodology for characterising neighbourhoods by using local news articles. More specifically, we show how we can use Natural Language Processing (NLP) to unlock further information about neighbourhoods by analysing, geoparsing and clustering news articles. Our work is novel because we combine street-level geoparsing tailored to the locality with clustering of full news articles, enabling a more detailed examination of neighbourhood characteristics. We evaluate our outputs and show via a confluence of evidence, both from a qualitative and a quantitative perspective, that the themes we extract from news articles are sensible and reflect many characteristics of the real world. This is significant because it allows us to better understand the effects of neighbourhoods on health. Our findings on neighbourhood characterisation using news data will support a new generation of place-based research which examines a wider set of spatial processes and how they affect health, enabling new epidemiological research.

Original languageEnglish
Article number103910
Pages (from-to)1-23
Number of pages23
JournalInformation Processing and Management
Volume62
Issue number1
Early online date10 Oct 2024
DOIs
Publication statusE-pub ahead of print - 10 Oct 2024

Keywords / Materials (for Non-textual outputs)

  • clustering
  • Edinburgh
  • geoparsing
  • natural language processing
  • neighbourhood characteristics

Fingerprint

Dive into the research topics of 'Perceptions of Edinburgh: Capturing neighbourhood characteristics by clustering geoparsed local news'. Together they form a unique fingerprint.

Cite this