How well do RNA-Seq differential gene expression tools perform in a complex eukaryote? A case study in A. thaliana

Kimon Froussios, Nicholas Schurch, Katarzyna Mackinnon, Marek Gierlinski, Celine Duc, Gordon Simpson (Lead / Corresponding author), Geoffrey Barton (Lead / Corresponding author)

Research output: Contribution to journalArticle

32 Downloads (Pure)

Abstract

Motivation: RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, Differential Gene Expression (DGE) tools typically assume the form of the underlying gene expression distribution. In this paper, the statistical properties of gene expression from RNA-seq are investigated in the complex eukaryote, Arabidopsis thaliana, extending and generalizing the results of previous work in the simple eukaryote Saccharomyces cerevisiae (Gierlinski et al. 2015; Schurch et al. 2016).

Results: We show that, consistent with the results in S. cerevisiae, more gene expression measurements in A. thaliana are consistent with being drawn from an underlying negative binomial distribution than either a log-normal distribution or a normal distribution, and that the size and complexity of the A. thaliana transcriptome does not influence the false positive rate performance of nine widely used DGE tools tested here. We therefore recommend the use of DGE tools that are based on the negative binomial distribution.

Supplementary information: Supplementary data are available at Bioinformatics online.

Original languageEnglish
JournalBioinformatics
Early online date6 Feb 2019
DOIs
Publication statusE-pub ahead of print - 6 Feb 2019

Fingerprint

Differential Expression
Eukaryota
RNA
Gene expression
Gene Expression
Arabidopsis Thaliana
Arabidopsis
Binomial Distribution
Negative binomial distribution
Normal Distribution
Saccharomyces Cerevisiae
Normal distribution
Yeast
Saccharomyces cerevisiae
Log Normal Distribution
Bioinformatics
Computational Biology
False Positive
Transcriptome
Statistical property

Cite this

@article{c233b57cd1ea4908979447522343e37f,
title = "How well do RNA-Seq differential gene expression tools perform in a complex eukaryote? A case study in A. thaliana",
abstract = "Motivation: RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, Differential Gene Expression (DGE) tools typically assume the form of the underlying gene expression distribution. In this paper, the statistical properties of gene expression from RNA-seq are investigated in the complex eukaryote, Arabidopsis thaliana, extending and generalizing the results of previous work in the simple eukaryote Saccharomyces cerevisiae (Gierlinski et al. 2015; Schurch et al. 2016).Results: We show that, consistent with the results in S. cerevisiae, more gene expression measurements in A. thaliana are consistent with being drawn from an underlying negative binomial distribution than either a log-normal distribution or a normal distribution, and that the size and complexity of the A. thaliana transcriptome does not influence the false positive rate performance of nine widely used DGE tools tested here. We therefore recommend the use of DGE tools that are based on the negative binomial distribution.Supplementary information: Supplementary data are available at Bioinformatics online.",
author = "Kimon Froussios and Nicholas Schurch and Katarzyna Mackinnon and Marek Gierlinski and Celine Duc and Gordon Simpson and Geoffrey Barton",
note = "This work has been supported by the BBSRC grants BB/H002286/1, BB/J00247X/1, BB/M010066/1 and BB/M004155/1.",
year = "2019",
month = "2",
day = "6",
doi = "10.1093/bioinformatics/btz089",
language = "English",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",

}

TY - JOUR

T1 - How well do RNA-Seq differential gene expression tools perform in a complex eukaryote? A case study in A. thaliana

AU - Froussios, Kimon

AU - Schurch, Nicholas

AU - Mackinnon, Katarzyna

AU - Gierlinski, Marek

AU - Duc, Celine

AU - Simpson, Gordon

AU - Barton, Geoffrey

N1 - This work has been supported by the BBSRC grants BB/H002286/1, BB/J00247X/1, BB/M010066/1 and BB/M004155/1.

PY - 2019/2/6

Y1 - 2019/2/6

N2 - Motivation: RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, Differential Gene Expression (DGE) tools typically assume the form of the underlying gene expression distribution. In this paper, the statistical properties of gene expression from RNA-seq are investigated in the complex eukaryote, Arabidopsis thaliana, extending and generalizing the results of previous work in the simple eukaryote Saccharomyces cerevisiae (Gierlinski et al. 2015; Schurch et al. 2016).Results: We show that, consistent with the results in S. cerevisiae, more gene expression measurements in A. thaliana are consistent with being drawn from an underlying negative binomial distribution than either a log-normal distribution or a normal distribution, and that the size and complexity of the A. thaliana transcriptome does not influence the false positive rate performance of nine widely used DGE tools tested here. We therefore recommend the use of DGE tools that are based on the negative binomial distribution.Supplementary information: Supplementary data are available at Bioinformatics online.

AB - Motivation: RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, Differential Gene Expression (DGE) tools typically assume the form of the underlying gene expression distribution. In this paper, the statistical properties of gene expression from RNA-seq are investigated in the complex eukaryote, Arabidopsis thaliana, extending and generalizing the results of previous work in the simple eukaryote Saccharomyces cerevisiae (Gierlinski et al. 2015; Schurch et al. 2016).Results: We show that, consistent with the results in S. cerevisiae, more gene expression measurements in A. thaliana are consistent with being drawn from an underlying negative binomial distribution than either a log-normal distribution or a normal distribution, and that the size and complexity of the A. thaliana transcriptome does not influence the false positive rate performance of nine widely used DGE tools tested here. We therefore recommend the use of DGE tools that are based on the negative binomial distribution.Supplementary information: Supplementary data are available at Bioinformatics online.

U2 - 10.1093/bioinformatics/btz089

DO - 10.1093/bioinformatics/btz089

M3 - Article

C2 - 30726870

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

ER -