Edinburgh Research Explorer

A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records

Research output: Contribution to journalArticle

Related Edinburgh Organisations

Open Access permissions

Open

Documents

  • Download as Adobe PDF

    Rights statement: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

    Final published version, 1 MB, PDF-document

    Licence: Creative Commons: Attribution (CC-BY)

Original languageEnglish
JournalBmc medical informatics and decision making
Early online date9 Sep 2019
DOIs
Publication statusE-pub ahead of print - 9 Sep 2019

Abstract

BACKGROUND: Manual coding of phenotypes in brain radiology reports is time consuming. We developed a natural language processing (NLP) algorithm to enable automatic identification of brain imaging in radiology reports performed in routine clinical practice in the UK National Health Service (NHS).
METHODS: We used anonymized text brain imaging reports from a cohort study of stroke/TIA patients and from a regional hospital to develop and test an NLP algorithm. Two experts marked up text in 1692 reports for 24 cerebrovascular and other neurological phenotypes. We developed and tested a rule-based NLP algorithm first within the cohort study, and further evaluated it in the reports from the regional hospital.
RESULTS: The agreement between expert readers was excellent (Cohen's κ =0.93) in both datasets. In the final test dataset (n = 700) in unseen regional hospital reports, the algorithm had very good performance for a report of any ischaemic stroke [sensitivity 89% (95% CI:81-94); positive predictive value (PPV) 85% (76-90); specificity 100% (95% CI:0.99-1.00)]; any haemorrhagic stroke [sensitivity 96% (95% CI: 80-99), PPV 72% (95% CI:55-84); specificity 100% (95% CI:0.99-1.00)]; brain tumours [sensitivity 96% (CI:87-99); PPV 84% (73-91); specificity: 100% (95% CI:0.99-1.00)] and cerebral small vessel disease and cerebral atrophy (sensitivity, PPV and specificity all > 97%). We obtained few reports of subarachnoid haemorrhage, microbleeds or subdural haematomas. In 110,695 reports from NHS Tayside, atrophy (n = 28,757, 26%), small vessel disease (15,015, 14%) and old, deep ischaemic strokes (10,636, 10%) were the commonest findings.
CONCLUSIONS: An NLP algorithm can be developed in UK NHS radiology records to allow identification of cohorts of patients with important brain imaging phenotypes at a scale that would otherwise not be possible.

ID: 110377904