Projects per year
Abstract
Background:
This paper describes a heuristic method for allocating low
‑coverage sequencing resources by targeting haplotypes rather than individuals. Low‑coverage sequencing assembles high‑coverage sequence information
for every individual by accumulating data from the genome segments that they share with many other individuals into consensus haplotypes. Deriving the consensus haplotypes accurately is critical for achieving a high phasing and imputation accuracy. In order to enable accurate phasing and imputation of sequence information for the whole population, we allocate the available sequencing resources among individuals with existing phased genomic data by targeting the sequencing coverage of their haplotypes.
Results:
Our method, called AlphaSeqOpt, prioritizes haplotypes using a score function that is based on the frequency of the haplotypes in the sequencing set relative to the target coverage. AlphaSeqOpt has two steps: (1) selection of an initial set of individuals by iteratively choosing the individuals that have the maximum score conditional on the current set, and (2) refinement of the set through several rounds of exchanges of individuals. AlphaSeqOpt is very effective for distributing a fixed amount of sequencing resources evenly across haplotypes, which results in a reduction of the proportion of haplotypes that are sequenced below the target coverage. AlphaSeqOpt can provide a greater proportion of haplotypes sequenced at the target coverage by sequencing less individuals, as compared
with other methods that use a score function based on haplotype frequencies in the population. A refinement of
the initially selected set can provide a larger more diverse set with more unique individuals, which is beneficial in the context of low ‑coverage sequencing. We extend the method with an approach for filtering rare haplotypes based on
their flanking haplotypes, so that only those that are likely to derive from a recombination event are targeted.
Conclusions:
We present a method for allocating sequencing resources so that a greater proportion of haplotypes are sequenced at a coverage that is sufficiently high for population-based imputation with low ‑coverage sequencing.
The haplotype score function, the refinement step, and the new approach for filtering rare haplotypes make AlphaSeqOpt more effective for that purpose than previously reported methods for reducing sequencing redundancy.
This paper describes a heuristic method for allocating low
‑coverage sequencing resources by targeting haplotypes rather than individuals. Low‑coverage sequencing assembles high‑coverage sequence information
for every individual by accumulating data from the genome segments that they share with many other individuals into consensus haplotypes. Deriving the consensus haplotypes accurately is critical for achieving a high phasing and imputation accuracy. In order to enable accurate phasing and imputation of sequence information for the whole population, we allocate the available sequencing resources among individuals with existing phased genomic data by targeting the sequencing coverage of their haplotypes.
Results:
Our method, called AlphaSeqOpt, prioritizes haplotypes using a score function that is based on the frequency of the haplotypes in the sequencing set relative to the target coverage. AlphaSeqOpt has two steps: (1) selection of an initial set of individuals by iteratively choosing the individuals that have the maximum score conditional on the current set, and (2) refinement of the set through several rounds of exchanges of individuals. AlphaSeqOpt is very effective for distributing a fixed amount of sequencing resources evenly across haplotypes, which results in a reduction of the proportion of haplotypes that are sequenced below the target coverage. AlphaSeqOpt can provide a greater proportion of haplotypes sequenced at the target coverage by sequencing less individuals, as compared
with other methods that use a score function based on haplotype frequencies in the population. A refinement of
the initially selected set can provide a larger more diverse set with more unique individuals, which is beneficial in the context of low ‑coverage sequencing. We extend the method with an approach for filtering rare haplotypes based on
their flanking haplotypes, so that only those that are likely to derive from a recombination event are targeted.
Conclusions:
We present a method for allocating sequencing resources so that a greater proportion of haplotypes are sequenced at a coverage that is sufficiently high for population-based imputation with low ‑coverage sequencing.
The haplotype score function, the refinement step, and the new approach for filtering rare haplotypes make AlphaSeqOpt more effective for that purpose than previously reported methods for reducing sequencing redundancy.
Original language | English |
---|---|
Article number | 78 |
Journal | Genetics Selection Evolution |
Volume | 49 |
Issue number | 1 |
DOIs | |
Publication status | Published - 25 Oct 2017 |
Fingerprint
Dive into the research topics of 'A method for allocating low-coverage sequencing resources by targeting haplotypes rather than individuals'. Together they form a unique fingerprint.Projects
- 10 Finished
-
Analysis of quantitative genetic traits in a huge data set
Hickey, J. & Hill, D.
1/05/16 → 30/04/19
Project: Research
-
Methods and tools to explore genomic data in animal breeding workshop
Hickey, J.
UK central government bodies/local authorities, health and hospital authorities
1/04/16 → 31/03/17
Project: Research
-
Precision Breeding: Broilers from Sequence to Consequence
Hickey, J. & Woolliams, J.
1/11/15 → 31/10/18
Project: Research