TY - UNPB
T1 - Evaluation of tools for differential gene expression analysis by RNA-seq on a 48 biological replicate experiment
AU - Schurch, Nicholas J.
AU - Schofield, Pieta
AU - Gierliński, Marek
AU - Cole, Christian
AU - Sherstnev, Alexander
AU - Singh, Vijender
AU - Wrobel, Nicola
AU - Gharbi, Karim
AU - Simpson, Gordon G.
AU - Owen-Hughes, Tom
AU - Blaxter, Mark
AU - Barton, Geoffrey J.
N1 - 21 Pages and 4 Figures in main text. 9 Figures in Supplement attached to PDF. Revision to correct a minor error in the abstract
PY - 2015/5/8
Y1 - 2015/5/8
N2 - An RNA-seq experiment with 48 biological replicates in each of 2 conditions was performed to determine the number of biological replicates ($n_r$) required, and to identify the most effective statistical analysis tools for identifying differential gene expression (DGE). When $n_r=3$, seven of the nine tools evaluated give true positive rates (TPR) of only 20 to 40 percent. For high fold-change genes ($|log_{2}(FC)|\gt2$) the TPR is $\gt85$ percent. Two tools performed poorly; over- or under-predicting the number of differentially expressed genes. Increasing replication gives a large increase in TPR when considering all DE genes but only a small increase for high fold-change genes. Achieving a TPR $\gt85$% across all fold-changes requires $n_r\gt20$. For future RNA-seq experiments these results suggest $n_r\gt6$, rising to $n_r\gt12$ when identifying DGE irrespective of fold-change is important. For $6 \lt n_r \lt 12$, superior TPR makes edgeR the leading tool tested. For $n_r \ge12$, minimizing false positives is more important and DESeq outperforms the other tools.
AB - An RNA-seq experiment with 48 biological replicates in each of 2 conditions was performed to determine the number of biological replicates ($n_r$) required, and to identify the most effective statistical analysis tools for identifying differential gene expression (DGE). When $n_r=3$, seven of the nine tools evaluated give true positive rates (TPR) of only 20 to 40 percent. For high fold-change genes ($|log_{2}(FC)|\gt2$) the TPR is $\gt85$ percent. Two tools performed poorly; over- or under-predicting the number of differentially expressed genes. Increasing replication gives a large increase in TPR when considering all DE genes but only a small increase for high fold-change genes. Achieving a TPR $\gt85$% across all fold-changes requires $n_r\gt20$. For future RNA-seq experiments these results suggest $n_r\gt6$, rising to $n_r\gt12$ when identifying DGE irrespective of fold-change is important. For $6 \lt n_r \lt 12$, superior TPR makes edgeR the leading tool tested. For $n_r \ge12$, minimizing false positives is more important and DESeq outperforms the other tools.
KW - q-bio.GN
M3 - Working paper
BT - Evaluation of tools for differential gene expression analysis by RNA-seq on a 48 biological replicate experiment
PB - ArXiv
ER -