Edinburgh Research Explorer

Preprocessed gene trees from "Xenolog Classification"


Related Edinburgh Organisations

PublisherEdinburgh DataShare
Date made available14 Oct 2016


alltreesNewick.zip: 13623 preprocessed trees in Newick format. The start of each filename gives the orthologous group number. Tip labels in trees consist of the protein accession, followed by an underscore, then the three-letter abbreviation for the taxon. Orthologous group numbers and taxon abbreviations are as in Darby et al. (2016, "Xenolog classification", Bioinformatics, submitted) and Latysheva et al. (2012, 10.1093/bioinformatics/bts008). alltreesNotung.zip: 13623 preprocessed trees in Notung format. The start of each filename gives the orthologous group number. Tips are labelled as in alltreesNewick.zip.


Preprocessed gene trees from 49 taxa of Cyanobacteria and 16 Proteobacteria used by Darby et al. (2017, Bioinformatics 33:640-649; doi:10.1093/bioinformatics/btw686) are provided here. These consist of trees for 13623 of the gene families of Latysheva et al. (2012, doi:10.1093/bioinformatics/bts008). Using Notung- (Stolzer et al. 2012, doi:10.1093/bioinformatics/bts386), unrooted trees with bootstrap support (out of 200; doi:10.7488/ds/1485) were preprocessed with the following steps: (1) root with DTL (duplication, transfer, loss) model --costdup 3 --costtrans 2.5 --costloss 2; (2) rearrange with DL model --costdup 3 --costloss 2 --threshold 90%; (3) reroot with DTL model --costdup 3 --costtrans 2.5 --costloss 2. Both zip files contain the output from this pipeline. The Newick format trees reflect the tree topology after this process. The Notung format trees also include the reconciliation, and other meta data. The Notung format is described in detail in Appendix A (File Formats) of the Notung manual (http://www.cs.cmu.edu/~durand/Notung).

Data Citation

Darby, Charlotte; Stolzer, Maureen; Ropp, Patrick; Barker, Daniel; Durand, Dannie. (2016). Preprocessed gene trees from "Xenolog Classification", [dataset]. http://dx.doi.org/10.7488/ds/1503.

ID: 30211189