A critical comparison of technologies for a plant genome sequencing project

Pirita Paajanen, George Kettleborough, Elena Lopez-Girona, Michael Giolai, Darren Heavens, David Baker, Ashleigh Lister, Gail Wilde, Ingo Hein, Iain Macaulay, Glenn J. Bryan, Matthew D. Clark (Lead / Corresponding author)

Research output: Contribution to journalArticle

Abstract

A high quality genome sequence of your model organism is an essential starting point for many studies. Old clone based methods are slow and expensive, whereas faster, cheaper short read only assemblies can be incomplete and highly fragmented, which minimises their usefulness. The last few years have seen the introduction of many new technologies for genome assembly. These new technologies and new algorithms are typically benchmarked on microbial genomes or, if they scale appropriately, human. However, plant genomes can be much more repetitive and larger than human, and plant biology makes obtaining high quality DNA free from contaminants difficult. Reflecting their challenging nature we observe that plant genome assembly statistics are typically poorer than for vertebrates. Here we compare Illumina short read, PacBio long read, 10x Genomics linked reads, Dovetail Hi-C and BioNano Genomics optical maps, singly and combined, in producing high quality long range genome assemblies of the potato species S. verrucosum. We benchmark the assemblies for completeness and accuracy, as well as DNA, compute requirements and sequencing costs. We expect our results will be helpful to other genome projects, and that these datasets will be used in benchmarking by assembly algorithm developers.

LanguageEnglish
JournalGiga Science
Early online date9 Jan 2019
DOIs
Publication statusE-pub ahead of print - 9 Jan 2019

Fingerprint

genome assembly
genome
genomics
plant biology
DNA
statistics
vertebrates
potatoes
clones
organisms
methodology

Keywords

  • assembly
  • long reads
  • short reads
  • optical mapping
  • Pacific Biosciences
  • PacBio
  • 10x Genomics

Cite this

Paajanen, P., Kettleborough, G., Lopez-Girona, E., Giolai, M., Heavens, D., Baker, D., ... Clark, M. D. (2019). A critical comparison of technologies for a plant genome sequencing project. Giga Science. https://doi.org/10.1093/gigascience/giy163
Paajanen, Pirita ; Kettleborough, George ; Lopez-Girona, Elena ; Giolai, Michael ; Heavens, Darren ; Baker, David ; Lister, Ashleigh ; Wilde, Gail ; Hein, Ingo ; Macaulay, Iain ; Bryan, Glenn J. ; Clark, Matthew D. / A critical comparison of technologies for a plant genome sequencing project. In: Giga Science. 2019.
@article{86cfedc9ccf44c86b96f1db7b85aac7b,
title = "A critical comparison of technologies for a plant genome sequencing project",
abstract = "A high quality genome sequence of your model organism is an essential starting point for many studies. Old clone based methods are slow and expensive, whereas faster, cheaper short read only assemblies can be incomplete and highly fragmented, which minimises their usefulness. The last few years have seen the introduction of many new technologies for genome assembly. These new technologies and new algorithms are typically benchmarked on microbial genomes or, if they scale appropriately, human. However, plant genomes can be much more repetitive and larger than human, and plant biology makes obtaining high quality DNA free from contaminants difficult. Reflecting their challenging nature we observe that plant genome assembly statistics are typically poorer than for vertebrates. Here we compare Illumina short read, PacBio long read, 10x Genomics linked reads, Dovetail Hi-C and BioNano Genomics optical maps, singly and combined, in producing high quality long range genome assemblies of the potato species S. verrucosum. We benchmark the assemblies for completeness and accuracy, as well as DNA, compute requirements and sequencing costs. We expect our results will be helpful to other genome projects, and that these datasets will be used in benchmarking by assembly algorithm developers.",
keywords = "assembly, long reads, short reads, optical mapping, Pacific Biosciences, PacBio, 10x Genomics",
author = "Pirita Paajanen and George Kettleborough and Elena Lopez-Girona and Michael Giolai and Darren Heavens and David Baker and Ashleigh Lister and Gail Wilde and Ingo Hein and Iain Macaulay and Bryan, {Glenn J.} and Clark, {Matthew D.}",
note = "We thank Lawrence Percival-Alwyn and Walter Verweij for their assistance in library preparation and analysis, and Michael Bevan for critical reading of this manuscript. This work was funded with BBSRC project grants (BB/K019325/1) and (BB/K019090/1). This work was strategically funded by the BBSRC, Core Strategic Programme Grant (BB/CSP17270/1) at the Earlham Institute. Highthroughput sequencing and library construction was delivered via the BBSRC National Capability in Genomics (BB/CCG1720/1) at the Earlham Institute (EI, formerly The Genome Analysis Centre, Norwich), by members of the Platforms and Pipelines Group.",
year = "2019",
month = "1",
day = "9",
doi = "10.1093/gigascience/giy163",
language = "English",
journal = "Giga Science",
issn = "2047-217X",
publisher = "Oxford University Press",

}

Paajanen, P, Kettleborough, G, Lopez-Girona, E, Giolai, M, Heavens, D, Baker, D, Lister, A, Wilde, G, Hein, I, Macaulay, I, Bryan, GJ & Clark, MD 2019, 'A critical comparison of technologies for a plant genome sequencing project', Giga Science. https://doi.org/10.1093/gigascience/giy163

A critical comparison of technologies for a plant genome sequencing project. / Paajanen, Pirita; Kettleborough, George ; Lopez-Girona, Elena ; Giolai, Michael; Heavens, Darren; Baker, David; Lister, Ashleigh ; Wilde, Gail; Hein, Ingo; Macaulay, Iain ; Bryan, Glenn J.; Clark, Matthew D. (Lead / Corresponding author).

In: Giga Science, 09.01.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A critical comparison of technologies for a plant genome sequencing project

AU - Paajanen, Pirita

AU - Kettleborough, George

AU - Lopez-Girona, Elena

AU - Giolai, Michael

AU - Heavens, Darren

AU - Baker, David

AU - Lister, Ashleigh

AU - Wilde, Gail

AU - Hein, Ingo

AU - Macaulay, Iain

AU - Bryan, Glenn J.

AU - Clark, Matthew D.

N1 - We thank Lawrence Percival-Alwyn and Walter Verweij for their assistance in library preparation and analysis, and Michael Bevan for critical reading of this manuscript. This work was funded with BBSRC project grants (BB/K019325/1) and (BB/K019090/1). This work was strategically funded by the BBSRC, Core Strategic Programme Grant (BB/CSP17270/1) at the Earlham Institute. Highthroughput sequencing and library construction was delivered via the BBSRC National Capability in Genomics (BB/CCG1720/1) at the Earlham Institute (EI, formerly The Genome Analysis Centre, Norwich), by members of the Platforms and Pipelines Group.

PY - 2019/1/9

Y1 - 2019/1/9

N2 - A high quality genome sequence of your model organism is an essential starting point for many studies. Old clone based methods are slow and expensive, whereas faster, cheaper short read only assemblies can be incomplete and highly fragmented, which minimises their usefulness. The last few years have seen the introduction of many new technologies for genome assembly. These new technologies and new algorithms are typically benchmarked on microbial genomes or, if they scale appropriately, human. However, plant genomes can be much more repetitive and larger than human, and plant biology makes obtaining high quality DNA free from contaminants difficult. Reflecting their challenging nature we observe that plant genome assembly statistics are typically poorer than for vertebrates. Here we compare Illumina short read, PacBio long read, 10x Genomics linked reads, Dovetail Hi-C and BioNano Genomics optical maps, singly and combined, in producing high quality long range genome assemblies of the potato species S. verrucosum. We benchmark the assemblies for completeness and accuracy, as well as DNA, compute requirements and sequencing costs. We expect our results will be helpful to other genome projects, and that these datasets will be used in benchmarking by assembly algorithm developers.

AB - A high quality genome sequence of your model organism is an essential starting point for many studies. Old clone based methods are slow and expensive, whereas faster, cheaper short read only assemblies can be incomplete and highly fragmented, which minimises their usefulness. The last few years have seen the introduction of many new technologies for genome assembly. These new technologies and new algorithms are typically benchmarked on microbial genomes or, if they scale appropriately, human. However, plant genomes can be much more repetitive and larger than human, and plant biology makes obtaining high quality DNA free from contaminants difficult. Reflecting their challenging nature we observe that plant genome assembly statistics are typically poorer than for vertebrates. Here we compare Illumina short read, PacBio long read, 10x Genomics linked reads, Dovetail Hi-C and BioNano Genomics optical maps, singly and combined, in producing high quality long range genome assemblies of the potato species S. verrucosum. We benchmark the assemblies for completeness and accuracy, as well as DNA, compute requirements and sequencing costs. We expect our results will be helpful to other genome projects, and that these datasets will be used in benchmarking by assembly algorithm developers.

KW - assembly

KW - long reads

KW - short reads

KW - optical mapping

KW - Pacific Biosciences

KW - PacBio

KW - 10x Genomics

UR - https://discovery.dundee.ac.uk/en/publications/86cfedc9-ccf4-4c86-b96f-1db7b85aac7b

U2 - 10.1093/gigascience/giy163

DO - 10.1093/gigascience/giy163

M3 - Article

JO - Giga Science

T2 - Giga Science

JF - Giga Science

SN - 2047-217X

ER -

Paajanen P, Kettleborough G, Lopez-Girona E, Giolai M, Heavens D, Baker D et al. A critical comparison of technologies for a plant genome sequencing project. Giga Science. 2019 Jan 9. https://doi.org/10.1093/gigascience/giy163