Evaluating LLMs' Potential to Identify Rare Patient Identifiers in Patient Health Records

Matúš Falis, Franz Gruber, Samuel McInerney, Arlene Casey

Research output: Contribution to journalArticlepeer-review

Abstract

This study explores the utility of Large Language Models (LLMs) to support finding rare patient record details that could make a patient identifiable. Whilst most research has focused on what we call direct patient identifiers, indirect patient identifiers are not widely addressed. Our evaluation of patient records with mentions of indirect risks predicted by our LLM shows the potential to find these risks automatically. However, many risks highlighted were false positives or did not constitute identifiable risk. More work is needed to understand how we can harness the potential of LLMs as part of our de-identification pipelines for patient health records. Better de-identification of health records is important for safely improving data access and advancing research without compromising confidentiality.

Original languageEnglish
Pages (from-to)874-875
Number of pages2
JournalStudies in health technology and informatics
Volume327
DOIs
Publication statusPublished - 15 May 2025

Keywords / Materials (for Non-textual outputs)

  • Electronic Health Records/organization & administration
  • Humans
  • Natural Language Processing
  • Confidentiality
  • Patient Identification Systems/methods
  • Data Anonymization
  • Programming Languages

Fingerprint

Dive into the research topics of 'Evaluating LLMs' Potential to Identify Rare Patient Identifiers in Patient Health Records'. Together they form a unique fingerprint.

Cite this