The application of next generation sequencing technologies and bioinformatic scripts to identify high frequency SNPs distributed throughout the peach genome is described. Three peach genomes were sequenced using Roche 454 and Illumina/Solexa technologies to obtain long contigs for alignment to the draft 'Lovell' peach sequence as well as sufficient depth of coverage for 'in silico' SNP discovery.
The sequences were aligned to the 'Lovell' peach genome released April 01, 2010 by the International Peach Genome Initiative (IPGI). 'Dr. Davis', 'F8, 1-42' and 'Georgia Belle' were sequenced to add SNPs segregating in two breeding populations, Pop DF ('Dr. Davis' × 'F8, 1-42') and Pop DG ('Dr. Davis' × 'Georgia Belle'). Roche 454 sequencing produced 980,000 total reads with 236 Mb sequence for 'Dr. Davis' and 735,000 total reads with 172 Mb sequence for 'F8, 1-42'. 84 bp × 84 bp paired end Illumina/Solexa sequences yielded 25.5, 21.4, 25.5 million sequences for 'Dr. Davis', 'F8, 1-42' and 'Georgia Belle', respectively. BWA/SAMtools were used for alignment of raw reads and SNP detection, with custom PERL scripts for SNP filtering. Velvet's Columbus module was used for sequence assembly. Comparison of aligned and overlapping sequences from both Roche 454 and Illumina/Solexa resulted in the selection of 6654 high quality SNPs for 'Dr. Davis' vs. 'F8, 1-42' and 'Georgia Belle', distributed on eight major peach genome scaffolds as defined from the 'Lovell' assembly.
The eight scaffolds contained about 215-225 Mb of peach genomic sequences with one SNP/~ 40,000 bases. All sequences from Roche 454 and Illumina/Solexa have been submitted to NCBI for public use in the Short Read Archive database. SNPs have been deposited in the NCBI SNP database.
- Computational Biology
- Genome, Plant
- Polymorphism, Single Nucleotide