TY - JOUR
T1 - Transcript- and annotation-guided genome assembly of the European starling
AU - Stuart, Katarina
AU - Edwards, Richard
AU - Cheng, Yuanyuan
AU - Warren, Wesley
AU - Burt, Dave
AU - Sherwin, William
AU - Hofmeister, Natalie
AU - Werner, Scott
AU - Ball, Gregory F
AU - Bateson, Melissa
AU - Brandley, Matthew
AU - Buchanan, Katherine L
AU - Cassey, Phillip
AU - Clayton, David F
AU - De Meyer, Tim
AU - Meddle, Simone
AU - Rollins, Lee
N1 - Funding Information:
We thank nonauthor members of the Starling Genome Consortium for their support of this project including Wim Vanden Berghe. Thank you to Stella Loke, Annabel Whibley, and Mark Richardson for their guidance of Nanopore sequencing and analysis. Thank you to Daniel Selechnik for assistance with RNA extractions. Art credit (Figure 6 illustration) to Megan Bishop. Richard J. Edwards was funded by the Australian Research Council (LP160100610 and LP18010072). David W. Burt and Yuanyuan Cheng acknowledge grant funding from the Human Sciences Frontier Programme (grant RGP0030/2015). Simone L. Meddle acknowledges Roslin Institute Strategic Grant funding from the UK Biotechnology and Biological Sciences Research Council (BB/P013759/1). Lee A. Rollins was supported by a Scientia Fellowship from UNSW. Finally, we thank the four anonymous reviewers whose guidance has greatly improved this manuscript. Open access publishing facilitated by University of New South Wales, as part of the Wiley ‐ University of New South Wales agreement via the Council of Australian University Librarians.
Publisher Copyright:
© 2022 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd.
PY - 2022/11
Y1 - 2022/11
N2 - The European starling, Sturnus vulgaris, is an ecologically significant, globally invasive avian species that is also suffering from a major decline in its native range. Here, we present the genome assembly and long-read transcriptome of an Australian-sourced European starling (S. vulgaris vAU), and a second, North American, short-read genome assembly (S. vulgaris vNA), as complementary reference genomes for population genetic and evolutionary characterisation. S. vulgaris vAU combined 10x Genomics linked-reads, low-coverage Nanopore sequencing, and PacBio Iso-Seq full-length transcript scaffolding to generate a 1050 Mb assembly on 6,222 scaffolds (7.6 Mb scaffold N50, 94.6% BUSCO completeness). Further scaffolding against the high-quality zebra finch (Taeniopygia guttata) genome assigned 98.6% of the assembly to 32 putative nuclear chromosome scaffolds. Species-specific transcript mapping and gene annotation revealed good gene-level assembly and high functional completeness. Using S. vulgaris vAU, we demonstrate how the multifunctional use of PacBio Iso-Seq transcript data and complementary homology-based annotation of sequential assembly steps (assessed using a new tool, SAAGA) can be used to assess, inform, and validate assembly workflow decisions. We also highlight some counter-intuitive behaviour in traditional BUSCO metrics, and present BUSCOMP, a complementary tool for assembly comparison designed to be robust to differences in assembly size and base-calling quality. This work expands our knowledge of avian genomes and the available toolkit for assessing and improving genome quality. The new genomic resources presented will facilitate further global genomic and transcriptomic analysis on this ecologically important species.
AB - The European starling, Sturnus vulgaris, is an ecologically significant, globally invasive avian species that is also suffering from a major decline in its native range. Here, we present the genome assembly and long-read transcriptome of an Australian-sourced European starling (S. vulgaris vAU), and a second, North American, short-read genome assembly (S. vulgaris vNA), as complementary reference genomes for population genetic and evolutionary characterisation. S. vulgaris vAU combined 10x Genomics linked-reads, low-coverage Nanopore sequencing, and PacBio Iso-Seq full-length transcript scaffolding to generate a 1050 Mb assembly on 6,222 scaffolds (7.6 Mb scaffold N50, 94.6% BUSCO completeness). Further scaffolding against the high-quality zebra finch (Taeniopygia guttata) genome assigned 98.6% of the assembly to 32 putative nuclear chromosome scaffolds. Species-specific transcript mapping and gene annotation revealed good gene-level assembly and high functional completeness. Using S. vulgaris vAU, we demonstrate how the multifunctional use of PacBio Iso-Seq transcript data and complementary homology-based annotation of sequential assembly steps (assessed using a new tool, SAAGA) can be used to assess, inform, and validate assembly workflow decisions. We also highlight some counter-intuitive behaviour in traditional BUSCO metrics, and present BUSCOMP, a complementary tool for assembly comparison designed to be robust to differences in assembly size and base-calling quality. This work expands our knowledge of avian genomes and the available toolkit for assessing and improving genome quality. The new genomic resources presented will facilitate further global genomic and transcriptomic analysis on this ecologically important species.
KW - Sturnus vulgaris
KW - full-length transcripts
KW - genome annotation
KW - genome assembly
KW - genome assessment
U2 - 10.1111/1755-0998.13679
DO - 10.1111/1755-0998.13679
M3 - Article
C2 - 35763352
SN - 1755-098X
VL - 22
SP - 3141
EP - 3160
JO - Molecular Ecology Resources
JF - Molecular Ecology Resources
IS - 8
ER -