A comparison of rule-based and supervised machine learning approaches for record linkage of Italian historical data

Saverio Minardi* (Lead Author), Suzie Greco, Nicola Barban

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Parish and civil records are crucial sources for reconstructing historical socio-demographic processes. However, their analysis presents significant challenges, particularly the need to digitize data and link life events across documents that lack formal identifiers. With the growing availability of digitized records, the development and evaluation of automated linkage techniques have become increasingly important. This study compares rule-based and supervised machine learning approaches for linking birth and death records derived from crowdsourced transcriptions of Italian parish and civil registers. Using a set of hand-linked data as a benchmark, we assess the performance of both approaches in terms of precision and recall, under standard conditions and in scenarios where key disambiguating information is missing. Our findings suggest that the machine learning approach outperforms the rule-based method both under standard conditions and when information is incomplete, making it the preferred option when training data are available. Nonetheless, the rule-based method can still achieve high precision when configured with sufficiently strict matching thresholds. While the focus of this exercise is on linking birth and death records, the procedures can be adapted to a wide range of historical reconstruction projects based on names and dates.
Original languageEnglish
Pages (from-to)28-46
JournalHistorical Life Course Studies
Volume15
DOIs
Publication statusPublished - 3 Jun 2025

Keywords / Materials (for Non-textual outputs)

  • record linkage
  • parish records
  • historical demography

Fingerprint

Dive into the research topics of 'A comparison of rule-based and supervised machine learning approaches for record linkage of Italian historical data'. Together they form a unique fingerprint.

Cite this