Abstract
The advent of next-generation sequencing, and in particular RNA-sequencing (RNA-seq), technologies has expanded our knowledge of the transcriptional capacity of human and other animal, genomes. In particular, recent RNA-seq studies have revealed that transcription is widespread across the mammalian genome, resulting in a large increase in the number of putative transcripts from both within, and intervening between, known protein-coding genes. Long transcripts that appear to lack protein-coding potential (long non-coding RNAs, lncRNAs) have been the focus of much recent research, in part owing to observations of their cell-type and developmental time-point restricted expression patterns. A variety of sequencing protocols are currently available for identifying lncRNAs including RNA polymerase II occupancy, chromatin state maps and - the focus of this review - deep RNA sequencing. In addition, there are numerous analytical methods available for mapping reads and assembling transcript models that predict the presence and structure of lncRNAs from RNA-seq data. Here we review current methods for identifying lncRNAs using large-scale sequencing data from RNA-seq experiments and highlight analytical considerations that are required when undertaking such projects.
Original language | English |
---|---|
Pages (from-to) | 50-9 |
Number of pages | 10 |
Journal | Methods |
Volume | 63 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Sep 2013 |
Keywords
- Base Sequence
- Chromatin
- High-Throughput Nucleotide Sequencing
- Humans
- RNA Polymerase II
- RNA, Long Noncoding
- Transcription, Genetic