The rapid development of newtechnologies for large-scale analysis of genetic variation in the genomes of individuals and populations has presented statistical geneticists with a grand challenge to develop efficient methods for identifying the small proportion of all identified genetic polymorphisms that have effects on traits of interest. To address such a "large p small n" problem, we have developed a heteroscedastic effects model (HEM) that has been shown to be powerful in high-throughput genetic analyses. Here, we describe how this whole-genome model can also be utilized in chemometric analysis. As a proof of concept, we use HEM to predict analyte concentrations in silage using Fourier transform infrared spectroscopy signals. The results show that HEM often outperforms the classic methods and in addition to this presents a substantial computational advantage in the analyses of such high-dimensional data. The results thus show the value of taking an interdisciplinary approach to chemometric analysis and indicate that large-scale genomic models can be a promising new approach for chemometric analysis that deserve to be evaluated more by experts in the field. The software used for our analyses is freely available as an R package at http://cran.r-project.org/web/packages/bigRR/. Copyright (C) 2014 JohnWiley & Sons, Ltd.
- heteroscedastic effects model
- generalized ridge regression
- high-dimensional data
- GENERALIZED LINEAR-MODELS
- NONORTHOGONAL PROBLEMS