Understanding the evolution and diversity of viral pathogens using next generation sequencing technologies

Project Details

Description

The overall aim of our research proposal is to develop an integrated computational framework for addressing questions pertaining to the nature and dynamics of genomic diversity of viral pathogens using next generation sequence data. Our specific objectives are:
Objective 1. Development of a computational module (package 1) for the generation of probabilistic genetic diversity profiles from next generation sequence data:
1.1 Develop a model that describes the non-uniformly distributed error generated by next generation sequencing platforms.
1.2 Develop a model that describes genetic diversity in the context of specific viral 'species' from which the data is derived.
1.3 Combine the models created in objectives 1.1 and 1.2 in order to produce a system for the assembly of genome alignments that incorporates expectation of sequencing error as well as genome and region-specific characteristics.
1.4 Train and test the system using available datasets for specific viral populations.
Objective 2. Development of downstream modules (package 2) for usage on the probabilistic diversity profiles obtained from objective 1 with specific emphasis on temporally sampled data. These will include:
2.1 Polymorphism frequency analyses.
2.2 Inference of phylogenetic relationships.
2.3 Prediction of co-infection occurrence and estimation of distinct lineages.
Objective 3: Development of a user-friendly desktop software environment that integrates packages 1 and 2.
Objective 4: To apply software to published data sets and those provided by collaborators.

Layman's description

We have developed a software framework for the comparison of next generation sequencing (NGS) data derived from multiple viral data sets. A particular focus is on characterising genetic diversity, i.e., the depth of variation. The framework has been used for comparison of platform dependent error rates present within data generated on both the 454 Life Sciences and Illumina platforms. We have also compared Ion Torrent, PacBio, Illumina and 454 data sets and demonstrated that all platforms have utility for detecting drug resistance. Our software was particularly focused on data sets for which temporal samples exist and we demonstrated its use with HIV data. We also developed a statistical approach in which we use the signal in temporal data sets to discriminate real low frequency variation from sequencing error.
StatusFinished
Effective start/end date1/06/1030/09/13

Funding

  • BBSRC: £124,111.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.