Assembly and validation of conserved long non-coding RNAs in the ruminant transcriptome

  • Stephen Bush (Creator)
  • Charity Muriuki (Creator)
  • Mary McCulloch (Creator)
  • Iseabail Farquhar (Creator)
  • Emily Clark (Creator)
  • David Hume (Creator)



mRNA-like long non-coding RNAs (lncRNA) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. This dataset demonstrates that few lncRNA are fully captured by biological replicates of the same RNA-seq library. In a transcriptional atlas of the domestic sheep (, 31 diverse tissues/cell types were sampled in each of 6 individual adults (3 females, 3 males, all unrelated virgin animals approximately 2 years of age). By taking a subset of 31 common tissues per individual, each of the 6 adults (f1, f2, f3, m1, m2, and m3) was represented by ~0.75 billion reads. In a typical lncRNA assembly pipeline, read alignments from all individuals are merged, to maximise the number of candidate gene models (using, for instance, StringTie --merge). With n = 6 adults (and ~0.75 billion reads per adult), there are (2^n)-1 = 63 possible combinations of data for which GTFs can be made with StringTie --merge. This dataset comprises those GTFs.

Data Citation

Bush, Stephen; Muriuki, Charity; McCulloch, Mary E. B.; Farquhar, Iseabail L.; Clark, Emily L.; Hume, David A.. (2018). Assembly and validation of conserved long non-coding RNAs in the ruminant transcriptome, [dataset]. Roslin Institute. University of Edinburgh.
Date made available8 Jan 2018
PublisherEdinburgh DataShare

Cite this