Projects per year
Abstract
BACKGROUND: The targeted capture and sequencing of genomic regions has rapidly
demonstrated its utility in genetic studies. Inherent in this technology is
considerable heterogeneity of target coverage and this is expected to
systematically impact our sensitivity to detect genuine polymorphisms. To fully
interpret the polymorphisms identified in a genetic study it is often essential
to both detect polymorphisms and to understand where and with what probability
real polymorphisms may have been missed.
RESULTS: Using down-sampling of 30 deeply sequenced exomes and a set of
gold-standard single nucleotide variant (SNV) genotype calls for each sample, we
developed an empirical model relating the read depth at a polymorphic site to the
probability of calling the correct genotype at that site. We find that measured
sensitivity in SNV detection is substantially worse than that predicted from the
naive expectation of sampling from a binomial. This calibrated model allows us to
produce single nucleotide resolution SNV sensitivity estimates which can be
merged to give summary sensitivity measures for any arbitrary partition of the
target sequences (nucleotide, exon, gene, pathway, exome). These metrics are
directly comparable between platforms and can be combined between samples to give
"power estimates" for an entire study. We estimate a local read depth of 13X is
required to detect the alleles and genotype of a heterozygous SNV 95% of the
time, but only 3X for a homozygous SNV. At a mean on-target read depth of 20X,
commonly used for rare disease exome sequencing studies, we predict 5-15% of
heterozygous and 1-4% of homozygous SNVs in the targeted regions will be missed.
CONCLUSIONS: Non-reference alleles in the heterozygote state have a high chance
of being missed when commonly applied read coverage thresholds are used despite
the widely held assumption that there is good polymorphism detection at these
coverage levels. Such alleles are likely to be of functional importance in
population based studies of rare diseases, somatic mutations in cancer and
explaining the "missing heritability" of quantitative traits.
demonstrated its utility in genetic studies. Inherent in this technology is
considerable heterogeneity of target coverage and this is expected to
systematically impact our sensitivity to detect genuine polymorphisms. To fully
interpret the polymorphisms identified in a genetic study it is often essential
to both detect polymorphisms and to understand where and with what probability
real polymorphisms may have been missed.
RESULTS: Using down-sampling of 30 deeply sequenced exomes and a set of
gold-standard single nucleotide variant (SNV) genotype calls for each sample, we
developed an empirical model relating the read depth at a polymorphic site to the
probability of calling the correct genotype at that site. We find that measured
sensitivity in SNV detection is substantially worse than that predicted from the
naive expectation of sampling from a binomial. This calibrated model allows us to
produce single nucleotide resolution SNV sensitivity estimates which can be
merged to give summary sensitivity measures for any arbitrary partition of the
target sequences (nucleotide, exon, gene, pathway, exome). These metrics are
directly comparable between platforms and can be combined between samples to give
"power estimates" for an entire study. We estimate a local read depth of 13X is
required to detect the alleles and genotype of a heterozygous SNV 95% of the
time, but only 3X for a homozygous SNV. At a mean on-target read depth of 20X,
commonly used for rare disease exome sequencing studies, we predict 5-15% of
heterozygous and 1-4% of homozygous SNVs in the targeted regions will be missed.
CONCLUSIONS: Non-reference alleles in the heterozygote state have a high chance
of being missed when commonly applied read coverage thresholds are used despite
the widely held assumption that there is good polymorphism detection at these
coverage levels. Such alleles are likely to be of functional importance in
population based studies of rare diseases, somatic mutations in cancer and
explaining the "missing heritability" of quantitative traits.
Original language | English |
---|---|
Article number | 195 |
Journal | BMC Bioinformatics |
Volume | 14 |
Issue number | 195 |
DOIs | |
Publication status | Published - 18 Jun 2013 |
Fingerprint
Dive into the research topics of 'Quantifying single nucleotide variant detection sensitivity in exome sequencing'. Together they form a unique fingerprint.Projects
- 1 Finished