TY - JOUR
T1 - Optimization of next-generation sequencing transcriptome annotation for species lacking sequenced genomes
AU - Ockendon, Nina F.
AU - O'Connell, Lauren A.
AU - Bush, Stephen J.
AU - Monzón-Sandoval, Jimena
AU - Barnes, Holly
AU - Székely, Tamás
AU - Hofmann, Hans A.
AU - Dorus, Steve
AU - Urrutia, Araxi O.
PY - 2015/10/14
Y1 - 2015/10/14
N2 - Next-generation sequencing methods, such as RNA-seq, have permitted the exploration of gene expression in a range of organisms which have been studied in ecological contexts but lack a sequenced genome. However, the efficacy and accuracy of RNA-seq annotation methods using reference genomes from related species have yet to be robustly characterized. Here we conduct a comprehensive power analysis employing RNA-seq data from Drosophila melanogaster in conjunction with 11 additional genomes from related Drosophila species to compare annotation methods and quantify the impact of evolutionary divergence between transcriptome and the reference genome. Our analyses demonstrate that, regardless of the level of sequence divergence, direct genome mapping (DGM), where transcript short reads are aligned directly to the reference genome, significantly outperforms the widely used de novo and guided assembly-based methods in both the quantity and accuracy of gene detection. Our analysis also reveals that DGM recovers a more representative profile of Gene Ontology functional categories, which are often used to interpret emergent patterns in genomewide expression analyses. Lastly, analysis of available primate RNA-seq data demonstrates the applicability of our observations across diverse taxa. Our quantification of annotation accuracy and reduced gene detection associated with sequence divergence thus provides empirically derived guidelines for the design of future gene expression studies in species without sequenced genomes.
AB - Next-generation sequencing methods, such as RNA-seq, have permitted the exploration of gene expression in a range of organisms which have been studied in ecological contexts but lack a sequenced genome. However, the efficacy and accuracy of RNA-seq annotation methods using reference genomes from related species have yet to be robustly characterized. Here we conduct a comprehensive power analysis employing RNA-seq data from Drosophila melanogaster in conjunction with 11 additional genomes from related Drosophila species to compare annotation methods and quantify the impact of evolutionary divergence between transcriptome and the reference genome. Our analyses demonstrate that, regardless of the level of sequence divergence, direct genome mapping (DGM), where transcript short reads are aligned directly to the reference genome, significantly outperforms the widely used de novo and guided assembly-based methods in both the quantity and accuracy of gene detection. Our analysis also reveals that DGM recovers a more representative profile of Gene Ontology functional categories, which are often used to interpret emergent patterns in genomewide expression analyses. Lastly, analysis of available primate RNA-seq data demonstrates the applicability of our observations across diverse taxa. Our quantification of annotation accuracy and reduced gene detection associated with sequence divergence thus provides empirically derived guidelines for the design of future gene expression studies in species without sequenced genomes.
KW - Drosophila
KW - Gene ontology
KW - Nonmodel species
KW - Primate
KW - RNA-seq
KW - Transcriptome assembly
UR - http://www.scopus.com/inward/record.url?scp=84949667753&partnerID=8YFLogxK
U2 - 10.1111/1755-0998.12465
DO - 10.1111/1755-0998.12465
M3 - Article
VL - 16
SP - 446
EP - 458
JO - Molecular Ecology Resources
JF - Molecular Ecology Resources
SN - 1755-098X
IS - 2
ER -