TY - CHAP
T1 - Euglena gracilis Genome and Transcriptome
T2 - Organelles, Nuclear Genome Assembly Strategies and Initial Features
AU - Ebenezer, ThankGod Echezona
AU - Carrington, Mark
AU - Lebert, Michael
AU - Kelly, Steven
AU - Field, Mark C.
N1 - © Springer International Publishing AG 2017
PY - 2017/4/21
Y1 - 2017/4/21
N2 - Euglena gracilis is a major component of the aquatic ecosystem and together with closely related species, is ubiquitous worldwide. Euglenoids are an important group of protists, possessing a secondarily acquired plastid and are relatives to the Kinetoplastidae, which themselves have global impact as disease agents. To understand the biology of E. gracilis, as well as to provide further insight into the evolution and origins of the Kinetoplastidae, we embarked on sequencing the nuclear genome; the plastid and mitochondrial genomes are already in the public domain. Earlier studies suggested an extensive nuclear DNA content, with likely a high degree of repetitive sequence, together with significant extrachromosomal elements. To produce a list of coding sequences we have combined transcriptome data from both published and new sources, as well as embarked on de novo sequencing using a combination of 454, Illumina paired end libraries and long PacBio reads. Preliminary analysis suggests a surprisingly large genome approaching 2 Gbp, with a highly fragmented architecture and extensive repeat composition. Over 80% of the RNAseq reads from E. gracilis maps to the assembled genome sequence, which is comparable with the well assembled genomes of T. brucei and T. cruzi. In order to achieve this level of assembly we employed multiple informatics pipelines, which are discussed here. Finally, as a preliminary view of the genome architecture, we discuss the tubulin and calmodulin genes, which highlight potential novel splicing mechanisms.
AB - Euglena gracilis is a major component of the aquatic ecosystem and together with closely related species, is ubiquitous worldwide. Euglenoids are an important group of protists, possessing a secondarily acquired plastid and are relatives to the Kinetoplastidae, which themselves have global impact as disease agents. To understand the biology of E. gracilis, as well as to provide further insight into the evolution and origins of the Kinetoplastidae, we embarked on sequencing the nuclear genome; the plastid and mitochondrial genomes are already in the public domain. Earlier studies suggested an extensive nuclear DNA content, with likely a high degree of repetitive sequence, together with significant extrachromosomal elements. To produce a list of coding sequences we have combined transcriptome data from both published and new sources, as well as embarked on de novo sequencing using a combination of 454, Illumina paired end libraries and long PacBio reads. Preliminary analysis suggests a surprisingly large genome approaching 2 Gbp, with a highly fragmented architecture and extensive repeat composition. Over 80% of the RNAseq reads from E. gracilis maps to the assembled genome sequence, which is comparable with the well assembled genomes of T. brucei and T. cruzi. In order to achieve this level of assembly we employed multiple informatics pipelines, which are discussed here. Finally, as a preliminary view of the genome architecture, we discuss the tubulin and calmodulin genes, which highlight potential novel splicing mechanisms.
KW - Euglena
KW - Next generation sequencing
KW - Genome assembly
KW - Tubulin
KW - Genome architecture
KW - Splicing
KW - Secondary endosymbiosis
U2 - 10.1007/978-3-319-54910-1_7
DO - 10.1007/978-3-319-54910-1_7
M3 - Chapter (peer-reviewed)
C2 - 28429320
SN - 9783319549088
T3 - Advances in Experimental Medicine and Biology
SP - 125
EP - 140
BT - Euglena
A2 - Schwartzbach, Steven D.
A2 - Shigeoka , Shigeru
PB - Springer International Publishing
CY - Switzerland
ER -