BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq

Paulo Rapazote-Flores, Micha Bayer, Linda Milne, Claus-Dieter Mayer, John Fuller, Wenbin Guo, Pete E Hedley, Jenny Morris, Claire Halpin, Jason Kam, Sarah M McKim, Monika Zwirek, M Cristina Casao, Abdellah Barakate, Miriam Schreiber, Gordon Stephen, Runxuan Zhang, John W S Brown, Robbie Waugh, Craig G Simpson (Lead / Corresponding author)

Research output: Contribution to journalArticle

19 Downloads (Pure)

Abstract

BACKGROUND: The time required to analyse RNA-seq data varies considerably, due to discrete steps for computational assembly, quantification of gene expression and splicing analysis. Recent fast non-alignment tools such as Kallisto and Salmon overcome these problems, but these tools require a high quality, comprehensive reference transcripts dataset (RTD), which are rarely available in plants.

RESULTS: A high-quality, non-redundant barley gene RTD and database (Barley Reference Transcripts - BaRTv1.0) has been generated. BaRTv1.0, was constructed from a range of tissues, cultivars and abiotic treatments and transcripts assembled and aligned to the barley cv. Morex reference genome (Mascher et al. Nature; 544: 427-433, 2017). Full-length cDNAs from the barley variety Haruna nijo (Matsumoto et al. Plant Physiol; 156: 20-28, 2011) determined transcript coverage, and high-resolution RT-PCR validated alternatively spliced (AS) transcripts of 86 genes in five different organs and tissue. These methods were used as benchmarks to select an optimal barley RTD. BaRTv1.0-Quantification of Alternatively Spliced Isoforms (QUASI) was also made to overcome inaccurate quantification due to variation in 5' and 3' UTR ends of transcripts. BaRTv1.0-QUASI was used for accurate transcript quantification of RNA-seq data of five barley organs/tissues. This analysis identified 20,972 significant differentially expressed genes, 2791 differentially alternatively spliced genes and 2768 transcripts with differential transcript usage.

CONCLUSION: A high confidence barley reference transcript dataset consisting of 60,444 genes with 177,240 transcripts has been generated. Compared to current barley transcripts, BaRTv1.0 transcripts are generally longer, have less fragmentation and improved gene models that are well supported by splice junction reads. Precise transcript quantification using BaRTv1.0 allows routine analysis of gene expression and AS.

Original languageEnglish
Article number968
Number of pages17
JournalBMC Genomics
Volume20
Issue number1
DOIs
Publication statusPublished - 11 Dec 2019

Fingerprint

Hordeum
Transcriptome
RNA
Genes
Protein Isoforms
Gene Expression
Benchmarking
Recombinant DNA
Datasets
Salmon
5' Untranslated Regions
3' Untranslated Regions
Complementary DNA
Genome
Databases
Polymerase Chain Reaction

Cite this

Rapazote-Flores, P., Bayer, M., Milne, L., Mayer, C-D., Fuller, J., Guo, W., ... Simpson, C. G. (2019). BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq. BMC Genomics, 20(1), [968]. https://doi.org/10.1186/s12864-019-6243-7
Rapazote-Flores, Paulo ; Bayer, Micha ; Milne, Linda ; Mayer, Claus-Dieter ; Fuller, John ; Guo, Wenbin ; Hedley, Pete E ; Morris, Jenny ; Halpin, Claire ; Kam, Jason ; McKim, Sarah M ; Zwirek, Monika ; Casao, M Cristina ; Barakate, Abdellah ; Schreiber, Miriam ; Stephen, Gordon ; Zhang, Runxuan ; Brown, John W S ; Waugh, Robbie ; Simpson, Craig G. / BaRTv1.0 : an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq. In: BMC Genomics. 2019 ; Vol. 20, No. 1.
@article{4b4091d1413f464d96e8a89389e22e3f,
title = "BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq",
abstract = "BACKGROUND: The time required to analyse RNA-seq data varies considerably, due to discrete steps for computational assembly, quantification of gene expression and splicing analysis. Recent fast non-alignment tools such as Kallisto and Salmon overcome these problems, but these tools require a high quality, comprehensive reference transcripts dataset (RTD), which are rarely available in plants.RESULTS: A high-quality, non-redundant barley gene RTD and database (Barley Reference Transcripts - BaRTv1.0) has been generated. BaRTv1.0, was constructed from a range of tissues, cultivars and abiotic treatments and transcripts assembled and aligned to the barley cv. Morex reference genome (Mascher et al. Nature; 544: 427-433, 2017). Full-length cDNAs from the barley variety Haruna nijo (Matsumoto et al. Plant Physiol; 156: 20-28, 2011) determined transcript coverage, and high-resolution RT-PCR validated alternatively spliced (AS) transcripts of 86 genes in five different organs and tissue. These methods were used as benchmarks to select an optimal barley RTD. BaRTv1.0-Quantification of Alternatively Spliced Isoforms (QUASI) was also made to overcome inaccurate quantification due to variation in 5' and 3' UTR ends of transcripts. BaRTv1.0-QUASI was used for accurate transcript quantification of RNA-seq data of five barley organs/tissues. This analysis identified 20,972 significant differentially expressed genes, 2791 differentially alternatively spliced genes and 2768 transcripts with differential transcript usage.CONCLUSION: A high confidence barley reference transcript dataset consisting of 60,444 genes with 177,240 transcripts has been generated. Compared to current barley transcripts, BaRTv1.0 transcripts are generally longer, have less fragmentation and improved gene models that are well supported by splice junction reads. Precise transcript quantification using BaRTv1.0 allows routine analysis of gene expression and AS.",
author = "Paulo Rapazote-Flores and Micha Bayer and Linda Milne and Claus-Dieter Mayer and John Fuller and Wenbin Guo and Hedley, {Pete E} and Jenny Morris and Claire Halpin and Jason Kam and McKim, {Sarah M} and Monika Zwirek and Casao, {M Cristina} and Abdellah Barakate and Miriam Schreiber and Gordon Stephen and Runxuan Zhang and Brown, {John W S} and Robbie Waugh and Simpson, {Craig G}",
year = "2019",
month = "12",
day = "11",
doi = "10.1186/s12864-019-6243-7",
language = "English",
volume = "20",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "Springer Verlag",
number = "1",

}

Rapazote-Flores, P, Bayer, M, Milne, L, Mayer, C-D, Fuller, J, Guo, W, Hedley, PE, Morris, J, Halpin, C, Kam, J, McKim, SM, Zwirek, M, Casao, MC, Barakate, A, Schreiber, M, Stephen, G, Zhang, R, Brown, JWS, Waugh, R & Simpson, CG 2019, 'BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq', BMC Genomics, vol. 20, no. 1, 968. https://doi.org/10.1186/s12864-019-6243-7

BaRTv1.0 : an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq. / Rapazote-Flores, Paulo; Bayer, Micha; Milne, Linda; Mayer, Claus-Dieter; Fuller, John; Guo, Wenbin; Hedley, Pete E; Morris, Jenny; Halpin, Claire; Kam, Jason; McKim, Sarah M; Zwirek, Monika; Casao, M Cristina; Barakate, Abdellah; Schreiber, Miriam; Stephen, Gordon; Zhang, Runxuan; Brown, John W S; Waugh, Robbie; Simpson, Craig G (Lead / Corresponding author).

In: BMC Genomics, Vol. 20, No. 1, 968, 11.12.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - BaRTv1.0

T2 - an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq

AU - Rapazote-Flores, Paulo

AU - Bayer, Micha

AU - Milne, Linda

AU - Mayer, Claus-Dieter

AU - Fuller, John

AU - Guo, Wenbin

AU - Hedley, Pete E

AU - Morris, Jenny

AU - Halpin, Claire

AU - Kam, Jason

AU - McKim, Sarah M

AU - Zwirek, Monika

AU - Casao, M Cristina

AU - Barakate, Abdellah

AU - Schreiber, Miriam

AU - Stephen, Gordon

AU - Zhang, Runxuan

AU - Brown, John W S

AU - Waugh, Robbie

AU - Simpson, Craig G

PY - 2019/12/11

Y1 - 2019/12/11

N2 - BACKGROUND: The time required to analyse RNA-seq data varies considerably, due to discrete steps for computational assembly, quantification of gene expression and splicing analysis. Recent fast non-alignment tools such as Kallisto and Salmon overcome these problems, but these tools require a high quality, comprehensive reference transcripts dataset (RTD), which are rarely available in plants.RESULTS: A high-quality, non-redundant barley gene RTD and database (Barley Reference Transcripts - BaRTv1.0) has been generated. BaRTv1.0, was constructed from a range of tissues, cultivars and abiotic treatments and transcripts assembled and aligned to the barley cv. Morex reference genome (Mascher et al. Nature; 544: 427-433, 2017). Full-length cDNAs from the barley variety Haruna nijo (Matsumoto et al. Plant Physiol; 156: 20-28, 2011) determined transcript coverage, and high-resolution RT-PCR validated alternatively spliced (AS) transcripts of 86 genes in five different organs and tissue. These methods were used as benchmarks to select an optimal barley RTD. BaRTv1.0-Quantification of Alternatively Spliced Isoforms (QUASI) was also made to overcome inaccurate quantification due to variation in 5' and 3' UTR ends of transcripts. BaRTv1.0-QUASI was used for accurate transcript quantification of RNA-seq data of five barley organs/tissues. This analysis identified 20,972 significant differentially expressed genes, 2791 differentially alternatively spliced genes and 2768 transcripts with differential transcript usage.CONCLUSION: A high confidence barley reference transcript dataset consisting of 60,444 genes with 177,240 transcripts has been generated. Compared to current barley transcripts, BaRTv1.0 transcripts are generally longer, have less fragmentation and improved gene models that are well supported by splice junction reads. Precise transcript quantification using BaRTv1.0 allows routine analysis of gene expression and AS.

AB - BACKGROUND: The time required to analyse RNA-seq data varies considerably, due to discrete steps for computational assembly, quantification of gene expression and splicing analysis. Recent fast non-alignment tools such as Kallisto and Salmon overcome these problems, but these tools require a high quality, comprehensive reference transcripts dataset (RTD), which are rarely available in plants.RESULTS: A high-quality, non-redundant barley gene RTD and database (Barley Reference Transcripts - BaRTv1.0) has been generated. BaRTv1.0, was constructed from a range of tissues, cultivars and abiotic treatments and transcripts assembled and aligned to the barley cv. Morex reference genome (Mascher et al. Nature; 544: 427-433, 2017). Full-length cDNAs from the barley variety Haruna nijo (Matsumoto et al. Plant Physiol; 156: 20-28, 2011) determined transcript coverage, and high-resolution RT-PCR validated alternatively spliced (AS) transcripts of 86 genes in five different organs and tissue. These methods were used as benchmarks to select an optimal barley RTD. BaRTv1.0-Quantification of Alternatively Spliced Isoforms (QUASI) was also made to overcome inaccurate quantification due to variation in 5' and 3' UTR ends of transcripts. BaRTv1.0-QUASI was used for accurate transcript quantification of RNA-seq data of five barley organs/tissues. This analysis identified 20,972 significant differentially expressed genes, 2791 differentially alternatively spliced genes and 2768 transcripts with differential transcript usage.CONCLUSION: A high confidence barley reference transcript dataset consisting of 60,444 genes with 177,240 transcripts has been generated. Compared to current barley transcripts, BaRTv1.0 transcripts are generally longer, have less fragmentation and improved gene models that are well supported by splice junction reads. Precise transcript quantification using BaRTv1.0 allows routine analysis of gene expression and AS.

U2 - 10.1186/s12864-019-6243-7

DO - 10.1186/s12864-019-6243-7

M3 - Article

C2 - 31829136

VL - 20

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 968

ER -