## Abstract

Background: The dimensionality of genomic information is limited by the number of independent chromosome

segments (Me), which is a function of the effective population size. This dimensionality can be determined approximately

by singular value decomposition of the gene content matrix, by eigenvalue decomposition of the genomic

relationship matrix (GRM), or by the number of core animals in the algorithm for proven and young (APY) that

maximizes the accuracy of genomic prediction. In the latter, core animals act as proxies to linear combinations of Me.

Field studies indicate that a moderate accuracy of genomic selection is achieved with a small dataset, but that further

improvement of the accuracy requires much more data. When only one quarter of the optimal number of core animals

are used in the APY algorithm, the accuracy of genomic selection is only slightly below the optimal value. This

suggests that genomic selection works on clusters of Me.

Results: The simulation included datasets with different population sizes and amounts of phenotypic information.

Computations were done by genomic best linear unbiased prediction (GBLUP) with selected eigenvalues and corresponding

eigenvectors of the GRM set to zero. About four eigenvalues in the GRM explained 10% of the genomic

variation, and less than 2% of the total eigenvalues explained 50% of the genomic variation. With limited phenotypic

information, the accuracy of GBLUP was close to the peak where most of the smallest eigenvalues were set to zero.

With a large amount of phenotypic information, accuracy increased as smaller eigenvalues were added.

Conclusions: A small amount of phenotypic data is sufficient to estimate only the effects of the largest eigenvalues

and the associated eigenvectors that contain a large fraction of the genomic information, and a very large amount

of data is required to estimate the remaining eigenvalues that account for a limited amount of genomic information.

Core animals in the APY algorithm act as proxies of almost the same number of eigenvalues. By using an eigenvaluesbased

approach, it was possible to explain why the moderate accuracy of genomic selection based on small datasets

only increases slowly as more data are added.

segments (Me), which is a function of the effective population size. This dimensionality can be determined approximately

by singular value decomposition of the gene content matrix, by eigenvalue decomposition of the genomic

relationship matrix (GRM), or by the number of core animals in the algorithm for proven and young (APY) that

maximizes the accuracy of genomic prediction. In the latter, core animals act as proxies to linear combinations of Me.

Field studies indicate that a moderate accuracy of genomic selection is achieved with a small dataset, but that further

improvement of the accuracy requires much more data. When only one quarter of the optimal number of core animals

are used in the APY algorithm, the accuracy of genomic selection is only slightly below the optimal value. This

suggests that genomic selection works on clusters of Me.

Results: The simulation included datasets with different population sizes and amounts of phenotypic information.

Computations were done by genomic best linear unbiased prediction (GBLUP) with selected eigenvalues and corresponding

eigenvectors of the GRM set to zero. About four eigenvalues in the GRM explained 10% of the genomic

variation, and less than 2% of the total eigenvalues explained 50% of the genomic variation. With limited phenotypic

information, the accuracy of GBLUP was close to the peak where most of the smallest eigenvalues were set to zero.

With a large amount of phenotypic information, accuracy increased as smaller eigenvalues were added.

Conclusions: A small amount of phenotypic data is sufficient to estimate only the effects of the largest eigenvalues

and the associated eigenvectors that contain a large fraction of the genomic information, and a very large amount

of data is required to estimate the remaining eigenvalues that account for a limited amount of genomic information.

Core animals in the APY algorithm act as proxies of almost the same number of eigenvalues. By using an eigenvaluesbased

approach, it was possible to explain why the moderate accuracy of genomic selection based on small datasets

only increases slowly as more data are added.

Original language | English |
---|---|

Journal | Genetics Selection Evolution |

DOIs | |

Publication status | Published - 12 Dec 2019 |