Edinburgh Research Explorer

Automated data capture from free-text radiology reports to enhance accuracy of hospital inpatient stroke codes

Research output: Contribution to journalArticle

  • Robert W. V. Flynn
  • Thomas M. Macdonald
  • Nicola Schembri
  • Gordon D. Murray
  • Alexander S. F. Doney

Related Edinburgh Organisations

Original languageEnglish
Pages (from-to)843-847
Number of pages5
JournalPharmacoepidemiology and Drug Safety
Volume19
Issue number8
DOIs
Publication statusPublished - Aug 2010

Abstract

Purpose Much potentially useful clinical information for pharmacoepidemiological research is contained in unstructured free-text documents and is not readily available for analysis. Routine health data such as Scottish Morbidity Records (SMR01) frequently use generic 'stroke' codes. Free-text Computerised Radiology Information System (CRIS) reports have potential to provide this missing detail. We aimed to increase the number of stroke-type-specific diagnoses by augmenting SMR01 with data derived from CRIS reports and to assess the accuracy of this methodology.

Methods SMR01 codes describing first-ever-stroke admissions in Tayside, Scotland from 1994 to 2005 were linked to CRIS CT-brain scan reports occurring with 14 days of admission. Software was developed to parse the text and elicit details of stroke type using keyword matching. An algorithm was iteratively developed to differentiate intracerebral haemorrhage (ICH) from ischaemic stroke (IS) against a training set of reports with pathophysiologically precise SMR01 codes. This algorithm was then applied to CRIS reports associated with generic SMR01 codes. To establish the accuracy of the algorithm a sample of 150 ICH and 150 IS reports were independently classified by a stroke physician.

Results There were 8419 SMR01 coded first-ever strokes. The proportion of patients with pathophysiologically clear diagnoses doubled from 2745 (32.6%) to 5614 (66.7%). The positive predictive value was 94.7% (95%CI 89.8-97.3) for IS and 76.7% (95%CI 69.3-82.7) for haemorrhagic stroke.

Conclusions A free-text processing approach was acceptably accurate at identifying IS, but not ICH. This approach could be adapted to other studies where radiology reports may be informative.

    Research areas

  • stroke , cerebral haemorrhage, brain infarction, natural language processing, radiology information systems, medical records

ID: 2748379