Improved profiling and classification of the breast cancer transcriptome

Research output: ThesisDoctoral Thesis

Abstract

Breast cancer, diagnosed in over 55,000 women annually in the UK, is increasingly being recognised as a massively diverse disease. These differences in underlying biology require robust tumour characterisation to optimally determine likely prognosis and most appropriate treatment. High-throughput gene expression quantification has highlighted the complex molecular heterogeneity of breast cancer and has contributed to recent improvements in clinical patient stratification. Difficulties in accuracy, sensitivity and reproducibility of molecular classifiers persist, and there exists a requirement for robustly-derived, fully categorised and reproducible molecular subtypes of clinical significance. This thesis outlines the identification of reproducible, potential clinically significant molecular subtypes. Development and implementation of a robust, two-step unsupervised clustering and molecular profiling pipeline was implemented on two of the largest gene expression datasets. METABRIC BeadArray Discovery (n=997) and Validation (n=989) data was optimally batchcorrected using a novel approach that preserves vital biological signal more effectively than standard techniques. TCGA RNA-Seq (n=989) data was pre-processed appropriately for unsupervised analysis, and gene features were consolidated with the batch-corrected METABRIC dataset. Each dataset was divided into sample cohorts representing the approximate major disease branches of ER₊ and ER- tumours. A custom software pipeline was built in R to perform multi-algorithm consensus clustering in each sample cohort of both datasets. Five statistically robust clusters were identified in the ER₊ cohorts and four in the ER- sample groups. These clusters were shown to be strongly underpinned by known cancer hallmark processes and profiled as potential clinically significant molecular subtypes. Clearly demonstrated are the considered pre-processing steps required to optimise complex gene expression data for effective unsupervised analysis. The molecular insight gained from this multi-step analysis, representative of underlying tumour biology, is extensively detailed. Furthermore, a prototype molecular classifier for assigning unseen tumours to the nine derived subtypes was subsequently developed and evaluated. These new molecular subtypes contribute to our understanding of breast cancer molecular heterogeneity and have the potential to assist with tailored therapeutic decisions. Work contained herein constitutes a significant and important contribution towards improved understanding of breast cancer, and contains relevant methodological insights to inform future work on transcriptomics in this and adjacent research areas.
Original languageEnglish
QualificationPh.D.
Awarding Institution
  • University of Edinburgh
Supervisors/Advisors
  • Simpson, Ian, Supervisor
  • Langdon, Simon, Supervisor
Award date17 Jul 2024
DOIs
Publication statusPublished - 8 Jul 2024

Fingerprint

Dive into the research topics of 'Improved profiling and classification of the breast cancer transcriptome'. Together they form a unique fingerprint.

Cite this