Predicting incident dementia in community-dwelling older adults using primary and secondary care data from electronic health records

Konstantin Georgiev*, Yiqing Wang, Andrew Conkie, Annie Sinclair, Vyron Christodoulou, Saleh Seyedzadeh, Malcolm Price, Ann Wales, Nicholas L Mills, Susan Deborah Shenkin, Joanne McPeake, Jacques D Fleuriot, Atul Anand

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Predicting risk of future dementia is essential for primary prevention strategies, particularly in the era of novel immunotherapies. However, few studies have developed population-level prediction models using existing routine healthcare data. In this longitudinal retrospective cohort study, we predicted incident dementia using primary and secondary care health records at 5, 10 and 13 years in 144 113 Scottish older adults who were dementia-free prior to 1st April 2009. Gradient-boosting (XGBoost) prediction models were trained on two feature subsets: data-driven (using all 171 extracted variables) and clinically supervised (22 curated variables). We used a random-stratified internal validation set to rank top predictors in each model, assessing performance stratified by age and socioeconomic deprivation. Predictions were stratified into 10 equally sized risk deciles and ranked by response rate. Over 13 years of follow-up, 11 143 (8%) patients developed dementia. The data-driven models achieved marginally better precision-recall area-under-the-curve scores of 0.18, 0.26 and 0.30 compared to clinically supervised models with scores of 0.17, 0.27 and 0.29 for incident dementia at 5, 10 and 13 years, respectively. The clinically supervised model achieved comparable specificity 0.88 [95% confidence interval (CI) 0.87–0.88] and sensitivity (0.55, 95% CI 0.53–0.57) to the data-driven model for prediction at 13 years. The most important model features were age, deprivation and frailty, measured by a modified electronic frailty index excluding known cognitive deficits. Model precision was consistent across socioeconomic deprivation quintiles but lower in younger-onset (<70 years) dementia cases. At 13 years, dementia was diagnosed in 32% of the population classified as highest risk with 40% of individuals in this group below the age of 80. Personalized estimates of future dementia risk from routinely collected healthcare data could influence risk factor modification and help to target brain imaging and novel immunotherapies in selected individuals with pre-symptomatic disease.
Original languageEnglish
Article numberfcae469
Number of pages14
JournalBrain Communications
Volume7
Issue number1
DOIs
Publication statusPublished - 24 Dec 2024

Keywords / Materials (for Non-textual outputs)

  • primary prevention
  • health services
  • machine learning
  • geriatric care
  • risk identification

Fingerprint

Dive into the research topics of 'Predicting incident dementia in community-dwelling older adults using primary and secondary care data from electronic health records'. Together they form a unique fingerprint.

Cite this