Abstract
Background and aims: The Estonian Biobank (EstBB) comprises >200,000 volunteer Estonian adults with extensive phenotype and genotype data, linked toelectronic health records (EHR). These linkages enable stroke case identification using ICD-10 coded health data. We evaluated the accuracy of coded data in identifying ischaemic stroke and its subtypes.
Methods: We identified all 1283 participants from the EstBB dataset with ≥1ischeamic stroke code (I63) as primary diagnosis on their EHRs. We developed a segmentation algorithm that categorises individual stroke codes into distinct stroke episodes, and randomly selected 300 episodes for validation, including recurrences. We mapped ischaemic stroke subtype codes to TOAST subtypes:I63.4 cardioembolism (CE); I63.5 small vessel disease (SVD) and I63.3 / I63.0 large-artery atherosclerosis (LVD). Two neurologists reviewed EHRs for each episode and established reference diagnoses for 277 episodes (23 episodes were excluded due to incomplete medical records relating to the specific episode). We calculated the positive predictive value (PPV) for all ischaemic stroke codes and subtypes.
Results: PPV for any stroke was 87% (95% CI: 82% – 91%) and for ischaemic stroke 79 % (95% CI: 74% – 84%). Codes mapped to TOAST subtypes hadlow PPVs, ranging from 7% (LVD) to 62% (CE), with wide confidence intervals.However, a cardioembolic source was confirmed by a neurologist in 84% of cases with a CE subtype code.
Discussion: Linked coded data reliably identifies ischaemic stroke cases in EstBB with sufficient accuracy for many research studies. From TOAST subtypes only acardioembolic source has good enough accuracy.
Methods: We identified all 1283 participants from the EstBB dataset with ≥1ischeamic stroke code (I63) as primary diagnosis on their EHRs. We developed a segmentation algorithm that categorises individual stroke codes into distinct stroke episodes, and randomly selected 300 episodes for validation, including recurrences. We mapped ischaemic stroke subtype codes to TOAST subtypes:I63.4 cardioembolism (CE); I63.5 small vessel disease (SVD) and I63.3 / I63.0 large-artery atherosclerosis (LVD). Two neurologists reviewed EHRs for each episode and established reference diagnoses for 277 episodes (23 episodes were excluded due to incomplete medical records relating to the specific episode). We calculated the positive predictive value (PPV) for all ischaemic stroke codes and subtypes.
Results: PPV for any stroke was 87% (95% CI: 82% – 91%) and for ischaemic stroke 79 % (95% CI: 74% – 84%). Codes mapped to TOAST subtypes hadlow PPVs, ranging from 7% (LVD) to 62% (CE), with wide confidence intervals.However, a cardioembolic source was confirmed by a neurologist in 84% of cases with a CE subtype code.
Discussion: Linked coded data reliably identifies ischaemic stroke cases in EstBB with sufficient accuracy for many research studies. From TOAST subtypes only acardioembolic source has good enough accuracy.
| Original language | English |
|---|---|
| Title of host publication | XXIII Nordic Stroke Society Congress 15.-17.09.2025 Tartu, Estonia Abstract Book |
| Pages | 20-20 |
| Number of pages | 1 |
| Publication status | Published - 16 Sept 2025 |
| Event | Nordic Stroke Society Congress - Tartu, Estonia Duration: 16 Sept 2025 → 17 Sept 2025 Conference number: XXIII https://nordicstroke2025.publicon.ee/userfiles/nordic_stroke/NordicStroke_abstract_book_2025.pdf |
Conference
| Conference | Nordic Stroke Society Congress |
|---|---|
| Country/Territory | Estonia |
| City | Tartu |
| Period | 16/09/25 → 17/09/25 |
| Internet address |