Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-seq and ESTs

Nicholas J. Schurch, Christian Cole, Alexander Sherstnev, Junfang Song, Celine Duc, Kate G. Storey, W. H. Irwin McLean, Sara J. Brown, Gordon G. Simpson, Geoffrey J. Barton (Lead / Corresponding author)

    Research output: Contribution to journalArticle

    12 Citations (Scopus)
    306 Downloads (Pure)

    Abstract

    The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct and complete annotation in addition to the underlying genomic sequence is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3' untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3' polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3' UTR re-annotation (including extension of one 3' UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data.
    Original languageEnglish
    Article numbere94270
    Number of pages20
    JournalPLoS ONE
    Volume9
    Issue number4
    DOIs
    Publication statusPublished - Apr 2014

    Fingerprint

    RNA Sequence Analysis
    Expressed Sequence Tags
    3' Untranslated Regions
    3' untranslated regions
    sequence analysis
    Genes
    Genome
    RNA
    loci
    genome
    Polyadenylation
    genes
    Introns
    exons
    introns
    Chickens
    Exons
    chickens
    Technology
    mutation

    Cite this

    @article{da18390fc7474a3887c6830f06a60459,
    title = "Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-seq and ESTs",
    abstract = "The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct and complete annotation in addition to the underlying genomic sequence is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3' untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3' polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3' UTR re-annotation (including extension of one 3' UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data.",
    author = "Schurch, {Nicholas J.} and Christian Cole and Alexander Sherstnev and Junfang Song and Celine Duc and Storey, {Kate G.} and McLean, {W. H. Irwin} and Brown, {Sara J.} and Simpson, {Gordon G.} and Barton, {Geoffrey J.}",
    year = "2014",
    month = "4",
    doi = "10.1371/journal.pone.0094270",
    language = "English",
    volume = "9",
    journal = "PLoS ONE",
    issn = "1932-6203",
    publisher = "Public Library of Science",
    number = "4",

    }

    TY - JOUR

    T1 - Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-seq and ESTs

    AU - Schurch, Nicholas J.

    AU - Cole, Christian

    AU - Sherstnev, Alexander

    AU - Song, Junfang

    AU - Duc, Celine

    AU - Storey, Kate G.

    AU - McLean, W. H. Irwin

    AU - Brown, Sara J.

    AU - Simpson, Gordon G.

    AU - Barton, Geoffrey J.

    PY - 2014/4

    Y1 - 2014/4

    N2 - The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct and complete annotation in addition to the underlying genomic sequence is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3' untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3' polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3' UTR re-annotation (including extension of one 3' UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data.

    AB - The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct and complete annotation in addition to the underlying genomic sequence is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3' untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3' polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3' UTR re-annotation (including extension of one 3' UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data.

    U2 - 10.1371/journal.pone.0094270

    DO - 10.1371/journal.pone.0094270

    M3 - Article

    C2 - 24722185

    VL - 9

    JO - PLoS ONE

    JF - PLoS ONE

    SN - 1932-6203

    IS - 4

    M1 - e94270

    ER -