Improving the annotation of the Heterorhabditis bacteriophora genome

Florence Mclean, Duncan Berger, Dominik R Laetsch, Hillel T Schwartz, Mark Blaxter

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Background
Genome assembly and annotation remain exacting tasks. As the tools available for these tasks improve, it is useful to return to data produced with earlier techniques to assess their credibility and correctness. The entomopathogenic nematode Heterorhabditis bacteriophora is widely used to control insect pests in horticulture. The genome sequence for this species was reported to encode an unusually high proportion of unique proteins and a paucity of secreted proteins compared to other related nematodes.

Findings
We revisited the H. bacteriophora genome assembly and gene predictions to determine whether these unusual characteristics were biological or methodological in origin. We mapped an independent resequencing dataset to the genome and used the blobtools pipeline to identify potential contaminants. While present (0.2% of the genome span, 0.4% of predicted proteins), assembly contamination was not significant.

Conclusions
Re-prediction of the gene set using BRAKER1 and published transcriptome data generated a predicted proteome that was very different from the published one. The new gene set had a much reduced complement of unique proteins, better completeness values that were in line with other related species’ genomes, and an increased number of proteins predicted to be secreted. It is thus likely that methodological issues drove the apparent uniqueness of the initial H. bacteriophora genome annotation and that similar contamination and misannotation issues affect other published genome assemblies.
Original languageEnglish
JournalGigaScience
Volume7
Issue number4
Early online date2 Apr 2018
DOIs
Publication statusE-pub ahead of print - 2 Apr 2018

Fingerprint

Dive into the research topics of 'Improving the annotation of the Heterorhabditis bacteriophora genome'. Together they form a unique fingerprint.

Cite this