Abstract
Background: Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis.
Results: We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts-twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage.
Conclusions: AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species.
| Original language | English |
|---|---|
| Article number | 149 |
| Number of pages | 37 |
| Journal | Genome Biology |
| Volume | 23 |
| DOIs | |
| Publication status | Published - 7 Jul 2022 |
Keywords
- Arabidopsis
- Iso-seq
- Reference transcript dataset
- Splice junction
- Transcription start and end sites
- Alternative splicing
- Alternative polyadenylation
Fingerprint
Dive into the research topics of 'A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis'. Together they form a unique fingerprint.Projects
- 3 Finished
-
A Reference Transcript Database for Improved Analysis of RNA-seq Data from Barley
Brown, J. (Investigator) & Waugh, R. (Investigator)
Biotechnology and Biological Sciences Research Council
1/10/18 → 30/09/20
Project: Research
-
16 ERA-CAPS Barley Yield Associated Networks (Joint with IPK Gatersleben as lead and University of Minnesota)
Waugh, R. (Investigator)
Biotechnology and Biological Sciences Research Council
13/08/18 → 30/06/22
Project: Research
-
Dynamic Re-programming of the Cold Transcriptome in Arabidopsis (Joint with JHI)
Brown, J. (Investigator)
Biotechnology and Biological Sciences Research Council
1/04/17 → 31/07/20
Project: Research
Research output
- 81 Citations
- 1 Other contribution
-
A high resolution single molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis
Zhang, R., 6 Jun 2022, Zenodo.Research output: Other contribution
Open Access
Datasets
-
A high resolution single molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis Accession: PRJNA755474 ID: 755474
Zhang, R. (Creator), Kuo, R. I. (Creator), Coulter, M. (Creator), Calixto, C. (Creator), Entizne, J. (Creator) & Guo, W. (Creator), National Institutes of Health, 2022
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA755474
Dataset
-
Additional file 1 of A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis
Zhang, R. (Creator), Kuo, R. (Creator), Coulter, M. (Creator), Calixto, C. P. G. (Creator), Entizne, J. C. (Creator), Guo, W. (Creator), Marquez, Y. (Creator), Milne, L. (Creator), Riegler, S. (Creator), Matsui, A. (Creator), Tanaka, M. (Creator), Harvey, S. (Creator), Gao, Y. (Creator), Wießner-Kroh, T. (Creator), Paniagua, A. (Creator), Crespi, M. (Creator), Denby, K. (Creator), Hur, A. B. (Creator), Huq, E. (Creator), Jantsch, M. (Creator), Jarmolowski, A. (Creator), Koester, T. (Creator), Laubinger, S. (Creator), Li, Q. Q. (Creator), Gu, L. (Creator), Seki, M. (Creator), Staiger, D. (Creator), Sunkar, R. (Creator), Szweykowska-Kulinska, Z. (Creator), Tu, S.-L. (Creator), Wachter, A. (Creator), Waugh, R. (Creator), Xiong, L. (Creator), Zhang, X.-N. (Creator), Conesa, A. (Creator), Reddy, A. S. N. (Creator), Barta, A. (Creator), Kalyna, M. (Creator) & Brown, J. W. S. (Creator), figshare, 8 Jul 2022
DOI: 10.6084/m9.figshare.20267296
Dataset
-
Arabidopsis Thaliana Reference Transcript Dataset 3 (AtRTD3)
Zhang, R. (Creator), Kuo, R. I. (Creator), Coulter, M. (Creator), Calixto, C. (Creator), Entizne, J. (Creator) & Guo, W. (Creator), Information & Computational Sciences - James Hutton Institute, 21 Feb 2022
Dataset
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver