Projects per year
Abstract
Many applications of high throughput sequencing rely on the availability of an accurate reference genome. Variant calling often produces large data sets that cannot be realistically validated and which likely contain large numbers of false-positives. Errors in the reference assembly increase the number of false-positives. While resources are available to aid in the filtering of variants from human data, for other species resources are limited and strict filtering techniques must be employed which are more likely to exclude true-positives. This work assesses the accuracy of the pig reference genome (Sscrofa10.2) using Illumina whole genome sequencing reads from the Duroc sow whose genome the assembly was based on. Indicators of structural variation including high regional coverage, unexpected insert sizes, improper pairing and homozygous variants were used to identify low quality (LQ) regions of the assembly. Low coverage (LC) regions were also identified and analyzed separately. The LQ regions covered 13.85% of the genome, the LC regions covered 26.6% of the genome and combined (LQLC) they covered 33.07% of the genome. Over half of dbSNP variants were located in the LQLC regions. Excluding variants in the LQ, LC or LQLC from future analyses will help reduce the number of false-positive variant calls. Researchers using WGS data should be aware of the current pig reference genome's draft status and that for many regions it does not give an accurate representation of the original Duroc sow’s genome. It is likely similar inaccuracies increase false-positives in other species with draft assemblies.
Original language | English |
---|---|
Publication status | Published - 9 Jan 2016 |
Event | Plant and Animal Genome XXIV - Town and Country Hotel, San Diego, United States Duration: 8 Jan 2016 → 13 Jan 2016 |
Conference
Conference | Plant and Animal Genome XXIV |
---|---|
Country/Territory | United States |
City | San Diego |
Period | 8/01/16 → 13/01/16 |
Keywords / Materials (for Non-textual outputs)
- pig
- reference genome
- genome sequence
Fingerprint
Dive into the research topics of 'Identifying low-confidence regions in the Sscrofa10.2 reference genome'. Together they form a unique fingerprint.Projects
- 3 Finished
-
Exome sequencing to inform pig breeding and the development of biomedical models
1/10/14 → 30/09/18
Project: Research
-
-
ISP1: Analysis and prediction in complex animal systems
Tenesa, A., Archibald, A., Beard, P., Bishop, S., Bronsvoort, M., Burt, D., Freeman, T., Haley, C., Hocking, P., Houston, R., Hume, D., Joshi, A., Law, A., Michoel, T., Summers, K., Vernimmen, D., Watson, M., Wiener, P., Wilson, A., Woolliams, J., Ait-Ali, T., Barnett, M., Carlisle, A., Finlayson, H., Haga, I., Karavolos, M., Matika, O., Paterson, T., Paton, B., Pong-Wong, R., Robert, C. & Robertson, G.
1/04/12 → 31/03/17
Project: Research