Edinburgh Research Explorer

Accuracy of whole-genome sequence imputation using hybrid peeling in large pedigreed livestock populations

Research output: Contribution to journalArticle

Related Edinburgh Organisations

Open Access permissions

Open

Documents

  • Download as Adobe PDF

    Final published version, 1.89 MB, PDF document

    Licence: Creative Commons: Attribution (CC-BY)

Original languageEnglish
JournalGenetics Selection Evolution
DOIs
Publication statusPublished - 6 Apr 2020

Abstract

Abstract
Background: The coupling of appropriate sequencing strategies and imputation methods is critical for assembling large whole-genome sequence datasets from livestock populations for research and breeding. In this paper,
we describe and validate the coupling of a sequencing strategy with the imputation method hybrid peeling in real animal breeding settings.
Methods: We used data from four pig populations of diferent size (18,349 to 107,815 individuals) that were widely
genotyped at densities between 15,000 and 75,000 markers genome-wide. Around 2% of the individuals in each pop‑
ulation were sequenced (most of them at 1×or 2× and 37–92 individuals per population, totalling 284, at 15–30×).
We imputed whole-genome sequence data with hybrid peeling. We evaluated the imputation accuracy by removing
the sequence data of the 284 individuals with high coverage, using a leave-one-out design. We simulated data that
mimicked the sequencing strategy used in the real populations to quantify the factors that afected the individualwise and variant-wise imputation accuracies using regression trees.
Results: Imputation accuracy was high for the majority of individuals in all four populations (median individual-wise
dosage correlation: 0.97). Imputation accuracy was lower for individuals in the earliest generations of each popula‑
tion than for the rest, due to the lack of marker array data for themselves and their ancestors. The main factors that
determined the individual-wise imputation accuracy were the genotyping status, the availability of marker array data
for immediate ancestors, and the degree of connectedness to the rest of the population, but sequencing coverage of
the relatives had no efect. The main factors that determined variant-wise imputation accuracy were the minor allele
frequency and the number of individuals with sequencing coverage at each variant site. Results were validated with the empirical observations.
Conclusions: We demonstrate that the coupling of an appropriate sequencing strategy and hybrid peeling is a
powerful strategy for generating whole-genome sequence data with high accuracy in large pedigreed populations
where only a small fraction of individuals (2%) had been sequenced, mostly at low coverage. This is a critical step for
the successful implementation of whole-genome sequence data for genomic prediction and fne-mapping of causal
variants.

Download statistics

No data available

ID: 141703265