SemEHR: surfacing semantic data from clinical notes in electronic health records for tailored care, trial recruitment, and clinical research

Honghan Wu, Giulia Toti, Katherine I Morley, Zina Ibrahim, Amos Folarin, Ismail Kartoglu, Richard Jackson, Asha Agrawal, Clive Stringer, Darren Gale, Genevieve M Gorrell, Angus Roberts, Matthew Broadbent, Robert Stewart, Richard J B Dobson

Research output: Contribution to journalMeeting abstractpeer-review

Abstract / Description of output

Deriving structured data from unstructured clinical notes in electronic health records (EHRs) requires natural language processing and clinical expertise, which is often costly, and frequently a one-off investment. We implemented SemEHR, a semantic search system that reduces the expertise and effort required in this context. We aimed to use it to characterise and select patients for projects such as the UK Department of Health 100,000 Genome Project.
Built upon the off-the-shelf toolkits, Bio-YODIE and CogStack, SemEHR integrates heterogeneous EHR documents and identifies contextualised (negation, temporality, and experiencer) mentions of a wide range of biomedical concepts including SNOMED CT, ICD-10, LOINC, and Drug Ontology. Text mining and semantics techniques are incorporated to derive a longitudinal patient panorama, combining structured profiles and unstructured records, available through semantic search interfaces.
We deployed SemEHR in various UK hospital EHRs, including the South London and Maudsley NHS Foundation Trust, where 46 million concept mentions were identified from 18 million documents. In a liver disease study, SemEHR identified 94 of 100 hepatitis C positive manually annotated patients. In a HIV study, SemEHR identified 21 of 23 true positives in a 1000-patient cohort. At King's College Hospital, SemEHR is being used to recruit patients into the 100,000 Genomes Project, where ontological associations are integrated to match recruitment criteria and populate complex phenotype models. A preliminary evaluation suggests that the tool is able to validate previously submitted cases and is very fast in searching phenotypes.
Using SemEHR, a query such as “find patients with a family history of hepatitis C”, which previously might have required the user to have natural language processing expertise, becomes a simple search, for which SemEHR retrieves a relevant patient cohort, populates patient-level summaries, and provides a link to each mention in the original source. Results and feedback from the multiple studies have proven its efficiency: previously weeks or months of work can be done within minutes in some cases.
Original languageEnglish
Pages (from-to)S97
JournalThe Lancet
Publication statusPublished - 1 Nov 2017


Dive into the research topics of 'SemEHR: surfacing semantic data from clinical notes in electronic health records for tailored care, trial recruitment, and clinical research'. Together they form a unique fingerprint.

Cite this