A critical comparison of technologies for a plant genome sequencing project

Pirita Paajanen, George Kettleborough, Elena Lopez-Girona, Michael Giolai, Darren Heavens, David Baker, Ashleigh Lister, Gail Wilde, Ingo Hein, Iain Macaulay, Glenn J. Bryan, Matthew D. Clark (Lead / Corresponding author)

Research output: Contribution to journalArticle

10 Downloads (Pure)

Abstract

BACKGROUND: A high-quality genome sequence of any model organism is an essential starting point for genetic and other studies. Older clone-based methods are slow and expensive, whereas faster, cheaper short-read-only assemblies can be incomplete and highly fragmented, which minimizes their usefulness. The last few years have seen the introduction of many new technologies for genome assembly. These new technologies and associated new algorithms are typically benchmarked on microbial genomes or, if they scale appropriately, on larger (e.g., human) genomes. However, plant genomes can be much more repetitive and larger than the human genome, and plant biochemistry often makes obtaining high-quality DNA that is free from contaminants difficult. Reflecting their challenging nature, we observe that plant genome assembly statistics are typically poorer than for vertebrates. RESULTS: Here, we compare Illumina short read, Pacific Biosciences long read, 10x Genomics linked reads, Dovetail Hi-C, and BioNano Genomics optical maps, singly and combined, in producing high-quality long-range genome assemblies of the potato species Solanum verrucosum. We benchmark the assemblies for completeness and accuracy, as well as DNA compute requirements and sequencing costs. CONCLUSIONS: The field of genome sequencing and assembly is reaching maturity, and the differences we observe between assemblies are surprisingly small. We expect that our results will be helpful to other genome projects, and that these datasets will be used in benchmarking by assembly algorithm developers.

Original languageEnglish
JournalGiga Science
Volume8
Issue number3
Early online date9 Jan 2019
DOIs
Publication statusPublished - 9 Jan 2019

Fingerprint

Plant Genome
Genes
Genome
Technology
Benchmarking
Human Genome
Genomics
Microbial Genome
Solanum
DNA
Solanum tuberosum
Biochemistry
Vertebrates
Clone Cells
Costs and Cost Analysis
Statistics
Impurities

Keywords

  • assembly
  • long reads
  • short reads
  • optical mapping
  • Pacific Biosciences
  • PacBio
  • 10x Genomics

Cite this

Paajanen, P., Kettleborough, G., Lopez-Girona, E., Giolai, M., Heavens, D., Baker, D., ... Clark, M. D. (2019). A critical comparison of technologies for a plant genome sequencing project. Giga Science, 8(3). https://doi.org/10.1093/gigascience/giy163
Paajanen, Pirita ; Kettleborough, George ; Lopez-Girona, Elena ; Giolai, Michael ; Heavens, Darren ; Baker, David ; Lister, Ashleigh ; Wilde, Gail ; Hein, Ingo ; Macaulay, Iain ; Bryan, Glenn J. ; Clark, Matthew D. / A critical comparison of technologies for a plant genome sequencing project. In: Giga Science. 2019 ; Vol. 8, No. 3.
@article{86cfedc9ccf44c86b96f1db7b85aac7b,
title = "A critical comparison of technologies for a plant genome sequencing project",
abstract = "BACKGROUND: A high-quality genome sequence of any model organism is an essential starting point for genetic and other studies. Older clone-based methods are slow and expensive, whereas faster, cheaper short-read-only assemblies can be incomplete and highly fragmented, which minimizes their usefulness. The last few years have seen the introduction of many new technologies for genome assembly. These new technologies and associated new algorithms are typically benchmarked on microbial genomes or, if they scale appropriately, on larger (e.g., human) genomes. However, plant genomes can be much more repetitive and larger than the human genome, and plant biochemistry often makes obtaining high-quality DNA that is free from contaminants difficult. Reflecting their challenging nature, we observe that plant genome assembly statistics are typically poorer than for vertebrates. RESULTS: Here, we compare Illumina short read, Pacific Biosciences long read, 10x Genomics linked reads, Dovetail Hi-C, and BioNano Genomics optical maps, singly and combined, in producing high-quality long-range genome assemblies of the potato species Solanum verrucosum. We benchmark the assemblies for completeness and accuracy, as well as DNA compute requirements and sequencing costs. CONCLUSIONS: The field of genome sequencing and assembly is reaching maturity, and the differences we observe between assemblies are surprisingly small. We expect that our results will be helpful to other genome projects, and that these datasets will be used in benchmarking by assembly algorithm developers.",
keywords = "assembly, long reads, short reads, optical mapping, Pacific Biosciences, PacBio, 10x Genomics",
author = "Pirita Paajanen and George Kettleborough and Elena Lopez-Girona and Michael Giolai and Darren Heavens and David Baker and Ashleigh Lister and Gail Wilde and Ingo Hein and Iain Macaulay and Bryan, {Glenn J.} and Clark, {Matthew D.}",
note = "We thank Lawrence Percival-Alwyn and Walter Verweij for their assistance in library preparation and analysis, and Michael Bevan for critical reading of this manuscript. This work was funded with BBSRC project grants (BB/K019325/1) and (BB/K019090/1). This work was strategically funded by the BBSRC, Core Strategic Programme Grant (BB/CSP17270/1) at the Earlham Institute. Highthroughput sequencing and library construction was delivered via the BBSRC National Capability in Genomics (BB/CCG1720/1) at the Earlham Institute (EI, formerly The Genome Analysis Centre, Norwich), by members of the Platforms and Pipelines Group.",
year = "2019",
month = "1",
day = "9",
doi = "10.1093/gigascience/giy163",
language = "English",
volume = "8",
journal = "Giga Science",
issn = "2047-217X",
publisher = "Oxford University Press",
number = "3",

}

Paajanen, P, Kettleborough, G, Lopez-Girona, E, Giolai, M, Heavens, D, Baker, D, Lister, A, Wilde, G, Hein, I, Macaulay, I, Bryan, GJ & Clark, MD 2019, 'A critical comparison of technologies for a plant genome sequencing project' Giga Science, vol. 8, no. 3. https://doi.org/10.1093/gigascience/giy163

A critical comparison of technologies for a plant genome sequencing project. / Paajanen, Pirita; Kettleborough, George ; Lopez-Girona, Elena ; Giolai, Michael; Heavens, Darren; Baker, David; Lister, Ashleigh ; Wilde, Gail; Hein, Ingo; Macaulay, Iain ; Bryan, Glenn J.; Clark, Matthew D. (Lead / Corresponding author).

In: Giga Science, Vol. 8, No. 3, 09.01.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A critical comparison of technologies for a plant genome sequencing project

AU - Paajanen, Pirita

AU - Kettleborough, George

AU - Lopez-Girona, Elena

AU - Giolai, Michael

AU - Heavens, Darren

AU - Baker, David

AU - Lister, Ashleigh

AU - Wilde, Gail

AU - Hein, Ingo

AU - Macaulay, Iain

AU - Bryan, Glenn J.

AU - Clark, Matthew D.

N1 - We thank Lawrence Percival-Alwyn and Walter Verweij for their assistance in library preparation and analysis, and Michael Bevan for critical reading of this manuscript. This work was funded with BBSRC project grants (BB/K019325/1) and (BB/K019090/1). This work was strategically funded by the BBSRC, Core Strategic Programme Grant (BB/CSP17270/1) at the Earlham Institute. Highthroughput sequencing and library construction was delivered via the BBSRC National Capability in Genomics (BB/CCG1720/1) at the Earlham Institute (EI, formerly The Genome Analysis Centre, Norwich), by members of the Platforms and Pipelines Group.

PY - 2019/1/9

Y1 - 2019/1/9

N2 - BACKGROUND: A high-quality genome sequence of any model organism is an essential starting point for genetic and other studies. Older clone-based methods are slow and expensive, whereas faster, cheaper short-read-only assemblies can be incomplete and highly fragmented, which minimizes their usefulness. The last few years have seen the introduction of many new technologies for genome assembly. These new technologies and associated new algorithms are typically benchmarked on microbial genomes or, if they scale appropriately, on larger (e.g., human) genomes. However, plant genomes can be much more repetitive and larger than the human genome, and plant biochemistry often makes obtaining high-quality DNA that is free from contaminants difficult. Reflecting their challenging nature, we observe that plant genome assembly statistics are typically poorer than for vertebrates. RESULTS: Here, we compare Illumina short read, Pacific Biosciences long read, 10x Genomics linked reads, Dovetail Hi-C, and BioNano Genomics optical maps, singly and combined, in producing high-quality long-range genome assemblies of the potato species Solanum verrucosum. We benchmark the assemblies for completeness and accuracy, as well as DNA compute requirements and sequencing costs. CONCLUSIONS: The field of genome sequencing and assembly is reaching maturity, and the differences we observe between assemblies are surprisingly small. We expect that our results will be helpful to other genome projects, and that these datasets will be used in benchmarking by assembly algorithm developers.

AB - BACKGROUND: A high-quality genome sequence of any model organism is an essential starting point for genetic and other studies. Older clone-based methods are slow and expensive, whereas faster, cheaper short-read-only assemblies can be incomplete and highly fragmented, which minimizes their usefulness. The last few years have seen the introduction of many new technologies for genome assembly. These new technologies and associated new algorithms are typically benchmarked on microbial genomes or, if they scale appropriately, on larger (e.g., human) genomes. However, plant genomes can be much more repetitive and larger than the human genome, and plant biochemistry often makes obtaining high-quality DNA that is free from contaminants difficult. Reflecting their challenging nature, we observe that plant genome assembly statistics are typically poorer than for vertebrates. RESULTS: Here, we compare Illumina short read, Pacific Biosciences long read, 10x Genomics linked reads, Dovetail Hi-C, and BioNano Genomics optical maps, singly and combined, in producing high-quality long-range genome assemblies of the potato species Solanum verrucosum. We benchmark the assemblies for completeness and accuracy, as well as DNA compute requirements and sequencing costs. CONCLUSIONS: The field of genome sequencing and assembly is reaching maturity, and the differences we observe between assemblies are surprisingly small. We expect that our results will be helpful to other genome projects, and that these datasets will be used in benchmarking by assembly algorithm developers.

KW - assembly

KW - long reads

KW - short reads

KW - optical mapping

KW - Pacific Biosciences

KW - PacBio

KW - 10x Genomics

UR - http://www.scopus.com/inward/record.url?scp=85063272046&partnerID=8YFLogxK

U2 - 10.1093/gigascience/giy163

DO - 10.1093/gigascience/giy163

M3 - Article

VL - 8

JO - Giga Science

JF - Giga Science

SN - 2047-217X

IS - 3

ER -

Paajanen P, Kettleborough G, Lopez-Girona E, Giolai M, Heavens D, Baker D et al. A critical comparison of technologies for a plant genome sequencing project. Giga Science. 2019 Jan 9;8(3). https://doi.org/10.1093/gigascience/giy163