Detection and Mitigation of Spurious Antisense RNA-seq Reads with RoSA

Research output: Contribution to journalArticle

20 Downloads (Pure)

Abstract

Motivation: Antisense transcription is known to have a range of impacts on sense gene expression, including (but not limited to) impeding transcription initiation, disrupting post-transcriptional processes, and enhancing, slowing, or even preventing transcription of the sense gene. Strand-specific RNA-Seq protocols preserve the strand information of the original RNA in the data, and so can be used to identify where antisense transcription may be implicated in regulating gene expression. However, our analysis of 199 strand-specific RNA-Seq experiments reveals that spurious antisense reads are often present in these datasets at levels greater than 1% of sense gene expression levels. Furthermore, these levels can vary substantially even between replicates in the same experiment, potentially disrupting any downstream analysis, if the incorrectly assigned antisense counts dominate the set of genes with high antisense transcription levels. Currently, no tools exist to detect or correct for this spurious antisense signal.

Results: Our tool, RoSA (Removal of Spurious Antisense), detects the presence of high levels of spurious antisense read alignments in strand-specific RNA-Seq datasets. It uses incorrectly spliced reads on the antisense strand and/or ERCC spike-ins (if present in the data) to calculate both global and gene-specific antisense correction factors. We demonstrate the utility of our tool to filter out spurious antisense transcript counts in an Arabidopsis thaliana RNA-Seq experiment.
Original languageEnglish
Article number819
Number of pages7
JournalF1000 Research
Volume8
DOIs
Publication statusE-pub ahead of print - 10 Jun 2019

Fingerprint

Antisense RNA
RNA
Gene Expression
Genes
Arabidopsis

Cite this

@article{6b9f18e24dee419e8a5fe4b218a7040e,
title = "Detection and Mitigation of Spurious Antisense RNA-seq Reads with RoSA",
abstract = "Motivation: Antisense transcription is known to have a range of impacts on sense gene expression, including (but not limited to) impeding transcription initiation, disrupting post-transcriptional processes, and enhancing, slowing, or even preventing transcription of the sense gene. Strand-specific RNA-Seq protocols preserve the strand information of the original RNA in the data, and so can be used to identify where antisense transcription may be implicated in regulating gene expression. However, our analysis of 199 strand-specific RNA-Seq experiments reveals that spurious antisense reads are often present in these datasets at levels greater than 1{\%} of sense gene expression levels. Furthermore, these levels can vary substantially even between replicates in the same experiment, potentially disrupting any downstream analysis, if the incorrectly assigned antisense counts dominate the set of genes with high antisense transcription levels. Currently, no tools exist to detect or correct for this spurious antisense signal. Results: Our tool, RoSA (Removal of Spurious Antisense), detects the presence of high levels of spurious antisense read alignments in strand-specific RNA-Seq datasets. It uses incorrectly spliced reads on the antisense strand and/or ERCC spike-ins (if present in the data) to calculate both global and gene-specific antisense correction factors. We demonstrate the utility of our tool to filter out spurious antisense transcript counts in an Arabidopsis thaliana RNA-Seq experiment.",
author = "Kira Mourao and Nicholas Schurch and Radoslaw Lukoszek and Kimon Froussios and Katarzyna Mackinnon and Celine Duc and Gordon Simpson and Geoffrey Barton",
note = "This work has been supported by the Biotechnology and Biological Sciences Research Council [BB/M004155/1, BB/M010066/1] to G.J.B. and G.G.S.",
year = "2019",
month = "6",
day = "10",
doi = "10.1101/425900",
language = "English",
volume = "8",
journal = "F1000 Research",
issn = "2046-1402",
publisher = "F1000Research",

}

TY - JOUR

T1 - Detection and Mitigation of Spurious Antisense RNA-seq Reads with RoSA

AU - Mourao, Kira

AU - Schurch, Nicholas

AU - Lukoszek, Radoslaw

AU - Froussios, Kimon

AU - Mackinnon, Katarzyna

AU - Duc, Celine

AU - Simpson, Gordon

AU - Barton, Geoffrey

N1 - This work has been supported by the Biotechnology and Biological Sciences Research Council [BB/M004155/1, BB/M010066/1] to G.J.B. and G.G.S.

PY - 2019/6/10

Y1 - 2019/6/10

N2 - Motivation: Antisense transcription is known to have a range of impacts on sense gene expression, including (but not limited to) impeding transcription initiation, disrupting post-transcriptional processes, and enhancing, slowing, or even preventing transcription of the sense gene. Strand-specific RNA-Seq protocols preserve the strand information of the original RNA in the data, and so can be used to identify where antisense transcription may be implicated in regulating gene expression. However, our analysis of 199 strand-specific RNA-Seq experiments reveals that spurious antisense reads are often present in these datasets at levels greater than 1% of sense gene expression levels. Furthermore, these levels can vary substantially even between replicates in the same experiment, potentially disrupting any downstream analysis, if the incorrectly assigned antisense counts dominate the set of genes with high antisense transcription levels. Currently, no tools exist to detect or correct for this spurious antisense signal. Results: Our tool, RoSA (Removal of Spurious Antisense), detects the presence of high levels of spurious antisense read alignments in strand-specific RNA-Seq datasets. It uses incorrectly spliced reads on the antisense strand and/or ERCC spike-ins (if present in the data) to calculate both global and gene-specific antisense correction factors. We demonstrate the utility of our tool to filter out spurious antisense transcript counts in an Arabidopsis thaliana RNA-Seq experiment.

AB - Motivation: Antisense transcription is known to have a range of impacts on sense gene expression, including (but not limited to) impeding transcription initiation, disrupting post-transcriptional processes, and enhancing, slowing, or even preventing transcription of the sense gene. Strand-specific RNA-Seq protocols preserve the strand information of the original RNA in the data, and so can be used to identify where antisense transcription may be implicated in regulating gene expression. However, our analysis of 199 strand-specific RNA-Seq experiments reveals that spurious antisense reads are often present in these datasets at levels greater than 1% of sense gene expression levels. Furthermore, these levels can vary substantially even between replicates in the same experiment, potentially disrupting any downstream analysis, if the incorrectly assigned antisense counts dominate the set of genes with high antisense transcription levels. Currently, no tools exist to detect or correct for this spurious antisense signal. Results: Our tool, RoSA (Removal of Spurious Antisense), detects the presence of high levels of spurious antisense read alignments in strand-specific RNA-Seq datasets. It uses incorrectly spliced reads on the antisense strand and/or ERCC spike-ins (if present in the data) to calculate both global and gene-specific antisense correction factors. We demonstrate the utility of our tool to filter out spurious antisense transcript counts in an Arabidopsis thaliana RNA-Seq experiment.

U2 - 10.1101/425900

DO - 10.1101/425900

M3 - Article

VL - 8

JO - F1000 Research

JF - F1000 Research

SN - 2046-1402

M1 - 819

ER -