Abstract
This study explores the utility of Large Language Models (LLMs) to support finding rare patient record details that could make a patient identifiable. Whilst most research has focused on what we call direct patient identifiers, indirect patient identifiers are not widely addressed. Our evaluation of patient records with mentions of indirect risks predicted by our LLM shows the potential to find these risks automatically. However, many risks highlighted were false positives or did not constitute identifiable risk. More work is needed to understand how we can harness the potential of LLMs as part of our de-identification pipelines for patient health records. Better de-identification of health records is important for safely improving data access and advancing research without compromising confidentiality.
| Original language | English |
|---|---|
| Pages (from-to) | 874-875 |
| Number of pages | 2 |
| Journal | Studies in health technology and informatics |
| Volume | 327 |
| DOIs | |
| Publication status | Published - 15 May 2025 |
Keywords / Materials (for Non-textual outputs)
- Electronic Health Records/organization & administration
- Humans
- Natural Language Processing
- Confidentiality
- Patient Identification Systems/methods
- Data Anonymization
- Programming Languages
Fingerprint
Dive into the research topics of 'Evaluating LLMs' Potential to Identify Rare Patient Identifiers in Patient Health Records'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver