SFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data

Magdalena Navarro, Ian Simpson

Research output: Other contribution

Abstract

Motivation: Autism spectrum disorder (ASD) has a strong, yet heterogeneous, genetic component. Among the various methods that are being developed to help reveal the underlying molecular aetiology of the disease, one that is gaining popularity is the combination of gene expression and clinical genetic data. For ASD, the SFARI-gene database comprises lists of curated genes in which presumed causative mutations have been identified in patients. In order to predict novel candidate SFARI-genes we built classification models combining differential gene expression data for ASD patients and unaffected individuals with a gene's status in the SFARI-gene list. Results: SFARI-genes were not found to be significantly associated with differential gene expression patterns, nor were they enriched in gene co-expression network modules that had a strong correlation with ASD diagnosis. However, network analysis and machine learning models that incorporate information from the whole gene co-expression network were able to predict novel candidate genes that share features of existing SFARI genes and have support for roles in ASD in the literature. We found a statistically significant bias related to the absolute level of gene expression for existing SFARI genes and their scores. It is essential that this bias be taken into account when studies interpret ASD gene expression data at gene, module and whole-network levels. Availability: Source code is available from GitHub (https://doi.org/10.5281/zenodo.4463693) and the accompanying data from The University of Edinburgh DataStore (https://doi.org/10.7488/ds/2980)
Original languageEnglish
PublisherbioRxiv, at Cold Spring Harbor Laboratory
DOIs
Publication statusPublished - 1 Feb 2021

Fingerprint

Dive into the research topics of 'SFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data'. Together they form a unique fingerprint.

Cite this