Abstract: We are entering the era of mega-scale genomics, which is causing computational issues for standard genomic evaluation models due to their cubic computational complexity. A number of scalable genomic evaluation models have been proposed, like the APY model, where genotyped animals are randomly partitioned into core and noncore subsets. While the APY model is a good approximation of the full standard model, the random partitioning can make results unstable, possibly affecting accuracy or even reranking individuals. In this contribution, we present alternative optimised constructions of the core subset and show how to use them to update the core subset with the arrival of new data. We compared constructions that were either (1) random; (2) optimised based on the value of diagonals of genomic relationship matrix; (3) optimised via random sampling with weights from (2); and (4) optimised using conditional sequential sampling algorithm. We have compared proposed constructions with the GBLUP setting and assessed their effect on accuracy and continuous rank probability scores (CRPS) of predictions. To understand the different constructions we have visualised the core subsets using non-linear dimension reduction technique UMAP - uniform manifold approximation and projection for dimension reduction. While the accuracy and CRPS of the proposed core subset constructions were mainly governed by the size of the core subset, the optimisation reduced variation compared to the standard random sampling construction. In addition to addressing the challenges caused by random sampling, sequential sampling algorithm was equally accurate when applied to the reduced-rank genotype matrix instead of the full one, and was easily expandable with the arrival of new data. Furthermore, there is an indication that the sequential sampling strategy is capturing the fine-scale population structure (e.g., paternal half-sib families in our study) as visualised by UMAP, spreading the core individuals across the given genotype space. We are further exploring the benefits of the proposed core subset constructions in non-homogeneous populations or populations with the unbalanced structure.
|Publication status||Published - 14 Jun 2022|
|Event||Centre for Statistics (CFS) Annual Conference - James Clerk Maxwell Building (JCMB), King’s Buildings, EH9 3FD, Edinburgh, United Kingdom|
Duration: 14 Jun 2022 → …
|Conference||Centre for Statistics (CFS) Annual Conference|
|Period||14/06/22 → …|