Application of a genomic model for high-dimensional chemometric analysis

Xia Shen*, Ying Li, Lars Ronnegard, Peter Uden, Orjan Carlborg

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The rapid development of newtechnologies for large-scale analysis of genetic variation in the genomes of individuals and populations has presented statistical geneticists with a grand challenge to develop efficient methods for identifying the small proportion of all identified genetic polymorphisms that have effects on traits of interest. To address such a "large p small n" problem, we have developed a heteroscedastic effects model (HEM) that has been shown to be powerful in high-throughput genetic analyses. Here, we describe how this whole-genome model can also be utilized in chemometric analysis. As a proof of concept, we use HEM to predict analyte concentrations in silage using Fourier transform infrared spectroscopy signals. The results show that HEM often outperforms the classic methods and in addition to this presents a substantial computational advantage in the analyses of such high-dimensional data. The results thus show the value of taking an interdisciplinary approach to chemometric analysis and indicate that large-scale genomic models can be a promising new approach for chemometric analysis that deserve to be evaluated more by experts in the field. The software used for our analyses is freely available as an R package at http://cran.r-project.org/web/packages/bigRR/. Copyright (C) 2014 JohnWiley & Sons, Ltd.

Original languageEnglish
Pages (from-to)548-557
Number of pages10
JournalJournal of chemometrics
Volume28
Issue number7
Early online date25 Mar 2014
DOIs
Publication statusPublished - 15 Jul 2014

Keywords

  • genomics
  • chemometrics
  • heteroscedastic effects model
  • generalized ridge regression
  • high-dimensional data
  • GENERALIZED LINEAR-MODELS
  • RIDGE-REGRESSION
  • NONORTHOGONAL PROBLEMS
  • IN-VITRO
  • PROTEIN
  • PLS

Cite this