Optimised core subset construction for the APY model

Ivan Pocrnic, Finn Lindgren, Daniel Tolhurst, WO Herring, Gregor Gorjanc

Research output: Contribution to conferencePoster

Abstract / Description of output

Abstract: We are entering the era of mega-scale genomics, which is causing computational issues for standard genomic evaluation models due to their cubic computational complexity. A number of scalable genomic evaluation models have been proposed, like the APY model, where genotyped animals are randomly partitioned into core and noncore subsets. While the APY model is a good approximation of the full standard model, the random partitioning can make results unstable, possibly affecting accuracy or even reranking individuals. In this contribution, we present alternative optimised constructions of the core subset and show how to use them to update the core subset with the arrival of new data. We compared constructions that were either (1) random; (2) optimised based on the value of diagonals of genomic relationship matrix; (3) optimised via random sampling with weights from (2); and (4) optimised using conditional sequential sampling algorithm. We have compared proposed constructions with the GBLUP setting and assessed their effect on accuracy and continuous rank probability scores (CRPS) of predictions. To understand the different constructions we have visualised the core subsets using non-linear dimension reduction technique UMAP - uniform manifold approximation and projection for dimension reduction. While the accuracy and CRPS of the proposed core subset constructions were mainly governed by the size of the core subset, the optimisation reduced variation compared to the standard random sampling construction. In addition to addressing the challenges caused by random sampling, sequential sampling algorithm was equally accurate when applied to the reduced-rank genotype matrix instead of the full one, and was easily expandable with the arrival of new data. Furthermore, there is an indication that the sequential sampling strategy is capturing the fine-scale population structure (e.g., paternal half-sib families in our study) as visualised by UMAP, spreading the core individuals across the given genotype space. We are further exploring the benefits of the proposed core subset constructions in non-homogeneous populations or populations with the unbalanced structure.
Original languageEnglish
Publication statusPublished - 14 Jun 2022
EventCentre for Statistics (CFS) Annual Conference - James Clerk Maxwell Building (JCMB), King’s Buildings, EH9 3FD, Edinburgh, United Kingdom
Duration: 14 Jun 2022 → …


ConferenceCentre for Statistics (CFS) Annual Conference
Country/TerritoryUnited Kingdom
Period14/06/22 → …
Internet address


Dive into the research topics of 'Optimised core subset construction for the APY model'. Together they form a unique fingerprint.

Cite this