TY - CONF
T1 - Genomic prediction for complex traits following feature selection: results from Bayes C and genomic best linear unbiased prediction (G-BLUP).
AU - Bermingham, Mairead
AU - Pong-Wong, Ricardo
AU - Wilson, Jim
AU - Spiliopoulou, Athina
AU - Hayward, Caroline
AU - Rudan, Igor
AU - Campbell, Harry
AU - Wright, Alan
AU - Wilson, Jim
AU - Agakov, Felix
AU - Navarro, Pau
AU - Haley, Chris
PY - 2014/4/1
Y1 - 2014/4/1
N2 - Genome-wide association studies (GWAS) have identified thousands of SNPs associated with health-related traits, and thus provide a source of information about useful predictors for these traits. The best practices in the implementation of genomic prediction approaches using these high-dimensional GWAS data have yet to be determined. One important issue is feature selection (i.e. selection of SNPs exhibiting non-redundant information) which could reduce model complexity and computational requirements. In this study we investigated the effect of supervised feature selection on the performance of two widely used prediction methods: Bayes C and genomic best linear unbiased prediction (G-BLUP). We explored prediction of the complex traits height, high density lipoproteins (HDL) and body mass index (BMI) within 2,186 Croatian and into a replication population of 810 UK individuals (ORCADES). Using all 263,357 markers, Bayes C and G-BLUP had similar prediction accuracy across all traits within the Croatian data, and for the highly polygenic traits height and BMI when predicting into the ORCADES data. Although Bayes C outperformed G-BLUP in the prediction of HDL (which is influenced by fewer quantitative trait loci than BMI and height) into the ORCADES data, it was more than 3000 times slower computationally than G-BLUP. However, the application of supervised feature selection allowed GBLUP to achieve equivalent predictive performance to Bayes C with greatly reduced computational effort. Feature selection in the G-BLUP framework therefore provides a flexible and more efficient alternative to computationally expensive Bayes C for all considered traits in this study.
AB - Genome-wide association studies (GWAS) have identified thousands of SNPs associated with health-related traits, and thus provide a source of information about useful predictors for these traits. The best practices in the implementation of genomic prediction approaches using these high-dimensional GWAS data have yet to be determined. One important issue is feature selection (i.e. selection of SNPs exhibiting non-redundant information) which could reduce model complexity and computational requirements. In this study we investigated the effect of supervised feature selection on the performance of two widely used prediction methods: Bayes C and genomic best linear unbiased prediction (G-BLUP). We explored prediction of the complex traits height, high density lipoproteins (HDL) and body mass index (BMI) within 2,186 Croatian and into a replication population of 810 UK individuals (ORCADES). Using all 263,357 markers, Bayes C and G-BLUP had similar prediction accuracy across all traits within the Croatian data, and for the highly polygenic traits height and BMI when predicting into the ORCADES data. Although Bayes C outperformed G-BLUP in the prediction of HDL (which is influenced by fewer quantitative trait loci than BMI and height) into the ORCADES data, it was more than 3000 times slower computationally than G-BLUP. However, the application of supervised feature selection allowed GBLUP to achieve equivalent predictive performance to Bayes C with greatly reduced computational effort. Feature selection in the G-BLUP framework therefore provides a flexible and more efficient alternative to computationally expensive Bayes C for all considered traits in this study.
M3 - Paper
T2 - 42nd European Mathematical Genetics Meeting (EMGM)
Y2 - 1 April 2014 through 2 April 2014
ER -