Abstract

Using natural language processing it is possible to extract structured information from raw text in the Electronic Health Record (EHR) at reasonably high accuracy. However, the accurate distinction between negated and non-negated mentions of clinical terms remains a challenge. EHR text includes cases where diseases are stated not to be present or only hypothesised, meaning a disease can be mentioned in a report when it is not being reported as present. This makes tasks such as document classification and summarisation more difficult.

We have developed the rule-based EdIE-R-Neg, part of an existing text mining pipeline called EdIE-R (Edinburgh Information Extraction for Radiology reports), developed to process brain imaging reports, 1and two machine learning approaches; one using a bidirectional long short-term memory network and another using a feedforward neural network. These were developed on data from the Edinburgh Stroke Study, and tested on data from routine reports from NHS Tayside (Tayside). Both datasets consist of written reports from medical scans.

These models are compared with two existing rule-based models; pyConText [Harkema et al., 2009], a python implementation of a generalisation of NegEx, and NegBio [Penget al., 2017], which identifies negation scopes through patterns applied to a syntactic representation of the sentence. On both the test set of the dataset from which our models were developed, as well as the largely similar Tayside test set, the neural network models and our custom-built rule-based system outperformed the existing methods.

EdIE-R-Neg scored highest on F1 score, particularly on the test set of the Tayside dataset, from which no development data was used in these experiments, showing the power of custom-built rule-based systems for negation detection on datasets of this size.

The performance gap of the machine learning models to EdIE-R-Neg on the Tayside test set was reduced through adding development Tayside data into the ESS training set, demonstrating the adaptability of the neural network models
Original languageEnglish
Pages (from-to)203-224
Number of pages22
JournalNatural Language Engineering
Volume27
Issue number2
Early online date18 Nov 2020
DOIs
Publication statusPublished - 1 Mar 2021

Keywords

  • Machine Learning
  • Natural Language Processing for Biomedical Texts
  • Corpus annotation
  • information exploitation
  • Text Data Mining

Fingerprint Dive into the research topics of 'Comparison of Rule-based and Neural Network Models for Negation Detection in Radiology Reports'. Together they form a unique fingerprint.

Cite this