Improved annotation with de novo transcriptome assembly in four social amoeba species

Reema Singh, Hajara M. Lawal, Christina Schilde, Gernot Glöckner, Geoffrey J. Barton, Pauline Schaap, Christian Cole (Lead / Corresponding author)

Research output: Contribution to journalArticle

1 Citation (Scopus)
45 Downloads (Pure)

Abstract

BACKGROUND: Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies generated from RNA-seq data in four Dictyostelid species: D. discoideum, P. pallidum, D. fasciculatum and D. lacteum. The assemblies were incorporated with existing gene models to determine corrections and improvement on a whole-genome scale. This is the first time this has been performed in these eukaryotic species.

RESULTS: An initial de novo transcriptome assembly was generated by Trinity for each species and then refined with Program to Assemble Spliced Alignments (PASA). The completeness and quality were assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO) and Transrate tools at each stage of the assemblies. The final datasets of 11,315-12,849 transcripts contained 5,610-7,712 updates and corrections to >50% of existing gene models including changes to hundreds or thousands of protein products. Putative novel genes are also identified and alternative splice isoforms were observed for the first time in P. pallidum, D. lacteum and D. fasciculatum.

CONCLUSIONS: In taking a whole transcriptome approach to genome annotation with empirical data we have been able to enrich the annotations of four existing genome sequencing projects. In doing so we have identified updates to the majority of the gene annotations across all four species under study and found putative novel genes and transcripts which could be worthy for follow-up. The new transcriptome data we present here will be a valuable resource for genome curators in the Dictyostelia and we propose this effective methodology for use in other genome annotation projects.

Original languageEnglish
Article number120
Pages (from-to)1-17
Number of pages17
JournalBMC Genomics
Volume18
DOIs
Publication statusPublished - 31 Jan 2017

Fingerprint

Amoeba
Transcriptome
Genome
RNA Sequence Analysis
Molecular Sequence Annotation
Globus Pallidus
Genes
Benchmarking
Protein Isoforms
Proteins

Keywords

  • Dictyostelia
  • Social amoeba
  • De novo
  • Transcriptome assembly
  • RNA-seq

Cite this

@article{620e0f68d1694c58b10ac181270c65ea,
title = "Improved annotation with de novo transcriptome assembly in four social amoeba species",
abstract = "BACKGROUND: Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies generated from RNA-seq data in four Dictyostelid species: D. discoideum, P. pallidum, D. fasciculatum and D. lacteum. The assemblies were incorporated with existing gene models to determine corrections and improvement on a whole-genome scale. This is the first time this has been performed in these eukaryotic species.RESULTS: An initial de novo transcriptome assembly was generated by Trinity for each species and then refined with Program to Assemble Spliced Alignments (PASA). The completeness and quality were assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO) and Transrate tools at each stage of the assemblies. The final datasets of 11,315-12,849 transcripts contained 5,610-7,712 updates and corrections to >50{\%} of existing gene models including changes to hundreds or thousands of protein products. Putative novel genes are also identified and alternative splice isoforms were observed for the first time in P. pallidum, D. lacteum and D. fasciculatum.CONCLUSIONS: In taking a whole transcriptome approach to genome annotation with empirical data we have been able to enrich the annotations of four existing genome sequencing projects. In doing so we have identified updates to the majority of the gene annotations across all four species under study and found putative novel genes and transcripts which could be worthy for follow-up. The new transcriptome data we present here will be a valuable resource for genome curators in the Dictyostelia and we propose this effective methodology for use in other genome annotation projects.",
keywords = "Dictyostelia , Social amoeba , De novo , Transcriptome assembly , RNA-seq",
author = "Reema Singh and Lawal, {Hajara M.} and Christina Schilde and Gernot Gl{\"o}ckner and Barton, {Geoffrey J.} and Pauline Schaap and Christian Cole",
note = "RS, CS, HLM and PS are funded by BBSRC grant BB/K000799/1 and Wellcome Trust grant 100293/Z/12/Z. The GSU was funded under the Wellcome Trust Strategic Award 098439/Z/12/Z.",
year = "2017",
month = "1",
day = "31",
doi = "10.1186/s12864-017-3505-0",
language = "English",
volume = "18",
pages = "1--17",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "Springer Verlag",

}

Improved annotation with de novo transcriptome assembly in four social amoeba species. / Singh, Reema; Lawal, Hajara M.; Schilde, Christina; Glöckner, Gernot; Barton, Geoffrey J.; Schaap, Pauline; Cole, Christian (Lead / Corresponding author).

In: BMC Genomics, Vol. 18, 120, 31.01.2017, p. 1-17.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Improved annotation with de novo transcriptome assembly in four social amoeba species

AU - Singh, Reema

AU - Lawal, Hajara M.

AU - Schilde, Christina

AU - Glöckner, Gernot

AU - Barton, Geoffrey J.

AU - Schaap, Pauline

AU - Cole, Christian

N1 - RS, CS, HLM and PS are funded by BBSRC grant BB/K000799/1 and Wellcome Trust grant 100293/Z/12/Z. The GSU was funded under the Wellcome Trust Strategic Award 098439/Z/12/Z.

PY - 2017/1/31

Y1 - 2017/1/31

N2 - BACKGROUND: Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies generated from RNA-seq data in four Dictyostelid species: D. discoideum, P. pallidum, D. fasciculatum and D. lacteum. The assemblies were incorporated with existing gene models to determine corrections and improvement on a whole-genome scale. This is the first time this has been performed in these eukaryotic species.RESULTS: An initial de novo transcriptome assembly was generated by Trinity for each species and then refined with Program to Assemble Spliced Alignments (PASA). The completeness and quality were assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO) and Transrate tools at each stage of the assemblies. The final datasets of 11,315-12,849 transcripts contained 5,610-7,712 updates and corrections to >50% of existing gene models including changes to hundreds or thousands of protein products. Putative novel genes are also identified and alternative splice isoforms were observed for the first time in P. pallidum, D. lacteum and D. fasciculatum.CONCLUSIONS: In taking a whole transcriptome approach to genome annotation with empirical data we have been able to enrich the annotations of four existing genome sequencing projects. In doing so we have identified updates to the majority of the gene annotations across all four species under study and found putative novel genes and transcripts which could be worthy for follow-up. The new transcriptome data we present here will be a valuable resource for genome curators in the Dictyostelia and we propose this effective methodology for use in other genome annotation projects.

AB - BACKGROUND: Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies generated from RNA-seq data in four Dictyostelid species: D. discoideum, P. pallidum, D. fasciculatum and D. lacteum. The assemblies were incorporated with existing gene models to determine corrections and improvement on a whole-genome scale. This is the first time this has been performed in these eukaryotic species.RESULTS: An initial de novo transcriptome assembly was generated by Trinity for each species and then refined with Program to Assemble Spliced Alignments (PASA). The completeness and quality were assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO) and Transrate tools at each stage of the assemblies. The final datasets of 11,315-12,849 transcripts contained 5,610-7,712 updates and corrections to >50% of existing gene models including changes to hundreds or thousands of protein products. Putative novel genes are also identified and alternative splice isoforms were observed for the first time in P. pallidum, D. lacteum and D. fasciculatum.CONCLUSIONS: In taking a whole transcriptome approach to genome annotation with empirical data we have been able to enrich the annotations of four existing genome sequencing projects. In doing so we have identified updates to the majority of the gene annotations across all four species under study and found putative novel genes and transcripts which could be worthy for follow-up. The new transcriptome data we present here will be a valuable resource for genome curators in the Dictyostelia and we propose this effective methodology for use in other genome annotation projects.

KW - Dictyostelia

KW - Social amoeba

KW - De novo

KW - Transcriptome assembly

KW - RNA-seq

U2 - 10.1186/s12864-017-3505-0

DO - 10.1186/s12864-017-3505-0

M3 - Article

C2 - 28143409

VL - 18

SP - 1

EP - 17

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

M1 - 120

ER -