Edinburgh Research Explorer

UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER

Research output: Contribution to journalArticle

  • Spiros Denaxas
  • Arturo Gonzalez-Izquierdo
  • Kenan Direk
  • Natalie K Fitzpatrick
  • Ghazaleh Fatemifar
  • Amitava Banerjee
  • Richard J B Dobson
  • Laurence J Howe
  • Valerie Kuan
  • R Tom Lumbers
  • Laura Pasea
  • Riyaz S Patel
  • Anoop D Shah
  • Aroon D Hingorani
  • Harry Hemingway
  • Catherine Sudlow

Related Edinburgh Organisations

Open Access permissions



  • Download as Adobe PDF

    Rights statement: This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

    Final published version, 613 KB, PDF document

    Licence: Creative Commons: Attribution (CC-BY)

Original languageEnglish
Pages (from-to)1545–1559
JournalJournal of the American Medical Informatics Association : JAMIA
Issue number12
Early online date22 Jul 2019
Publication statusE-pub ahead of print - 22 Jul 2019


OBJECTIVE: Electronic health records (EHRs) are a rich source of information on human diseases, but the information is variably structured, fragmented, curated using different coding systems, and collected for purposes other than medical research. We describe an approach for developing, validating, and sharing reproducible phenotypes from national structured EHR in the United Kingdom with applications for translational research.

MATERIALS AND METHODS: We implemented a rule-based phenotyping framework, with up to 6 approaches of validation. We applied our framework to a sample of 15 million individuals in a national EHR data source (population-based primary care, all ages) linked to hospitalization and death records in England. Data comprised continuous measurements (for example, blood pressure; medication information; coded diagnoses, symptoms, procedures, and referrals), recorded using 5 controlled clinical terminologies: (1) read (primary care, subset of SNOMED-CT [Systematized Nomenclature of Medicine Clinical Terms]), (2) International Classification of Diseases-Ninth Revision and Tenth Revision (secondary care diagnoses and cause of mortality), (3) Office of Population Censuses and Surveys Classification of Surgical Operations and Procedures, Fourth Revision (hospital surgical procedures), and (4) DM+D prescription codes.

RESULTS: Using the CALIBER phenotyping framework, we created algorithms for 51 diseases, syndromes, biomarkers, and lifestyle risk factors and provide up to 6 validation approaches. The EHR phenotypes are curated in the open-access CALIBER Portal (https://www.caliberresearch.org/portal) and have been used by 40 national and international research groups in 60 peer-reviewed publications.

CONCLUSIONS: We describe a UK EHR phenomics approach within the CALIBER EHR data platform with initial evidence of validity and use, as an important step toward international use of UK EHR data for health research.

ID: 103707083