Inevitability and containment of replication errors for eukaryotic genome lengths spanning Megabase to Gigabase

Mohammed Al Mamun, Luca Albergante, Alberto Moreno, Jamie T. Carrington, John Blow, Timothy J. Newman (Lead / Corresponding author)

Research output: Contribution to journalArticle

11 Citations (Scopus)
259 Downloads (Pure)

Abstract

The replication of DNA is initiated at particular sites on the genome called replication origins (ROs). Understanding the constraints that regulate the distribution of ROs across different organisms is fundamental for quantifying the degree of replication errors and their downstream consequences. Using a simple probabilistic model we generate a set of predictions on the extreme sensitivity of error rates to the distribution of ROs, and how this distribution must therefore be tuned for genomes of vastly different sizes. As genome size changes from Megabases to Gigabases we predict that regularity of RO spacing is lost, that large gaps between ROs dominate error rates but are heavily constrained by the mean stalling distance of replication forks, and that for genomes spanning ~100 Megabases to ~10 Gigabases errors become increasingly inevitable but their number remains very small (three or less). Our theory predicts that the number of errors becomes significantly higher for genome sizes greater than ~10 Gigabases. We test these predictions against datasets in yeast, Arabidopsis, Drosophila and human, and also through direct experimentation on two different human cell lines. Agreement of theoretical predictions with experiment and datasets is found in all cases, resulting in a picture of great simplicity, whereby the density and positioning of ROs explain the replication error rates for the entire range of eukaryotes for which data is available. The theory highlights three domains of error rates: negligible (yeast), tolerable (metazoan) and high (some plants), with the human genome at the extreme end of the middle domain.
Original languageEnglish
Pages (from-to)E5765-E5774
Number of pages10
JournalProceedings of the National Academy of Sciences
Volume113
Issue number39
Early online date14 Sep 2016
DOIs
Publication statusPublished - 27 Sep 2016

Fingerprint

Replication Origin
Genome
Genome Size
Yeasts
Statistical Models
Human Genome
Eukaryota
DNA Replication
Arabidopsis
Drosophila
Cell Line

Keywords

  • eukaryotes
  • genome length
  • replication error
  • Poisson distribution
  • mathematical modeling

Cite this

Al Mamun, Mohammed ; Albergante, Luca ; Moreno, Alberto ; Carrington, Jamie T. ; Blow, John ; Newman, Timothy J. / Inevitability and containment of replication errors for eukaryotic genome lengths spanning Megabase to Gigabase. In: Proceedings of the National Academy of Sciences. 2016 ; Vol. 113, No. 39. pp. E5765-E5774.
@article{0115d9e6746b49ac8f4addec8ae5725e,
title = "Inevitability and containment of replication errors for eukaryotic genome lengths spanning Megabase to Gigabase",
abstract = "The replication of DNA is initiated at particular sites on the genome called replication origins (ROs). Understanding the constraints that regulate the distribution of ROs across different organisms is fundamental for quantifying the degree of replication errors and their downstream consequences. Using a simple probabilistic model we generate a set of predictions on the extreme sensitivity of error rates to the distribution of ROs, and how this distribution must therefore be tuned for genomes of vastly different sizes. As genome size changes from Megabases to Gigabases we predict that regularity of RO spacing is lost, that large gaps between ROs dominate error rates but are heavily constrained by the mean stalling distance of replication forks, and that for genomes spanning ~100 Megabases to ~10 Gigabases errors become increasingly inevitable but their number remains very small (three or less). Our theory predicts that the number of errors becomes significantly higher for genome sizes greater than ~10 Gigabases. We test these predictions against datasets in yeast, Arabidopsis, Drosophila and human, and also through direct experimentation on two different human cell lines. Agreement of theoretical predictions with experiment and datasets is found in all cases, resulting in a picture of great simplicity, whereby the density and positioning of ROs explain the replication error rates for the entire range of eukaryotes for which data is available. The theory highlights three domains of error rates: negligible (yeast), tolerable (metazoan) and high (some plants), with the human genome at the extreme end of the middle domain.",
keywords = "eukaryotes, genome length, replication error, Poisson distribution, mathematical modeling",
author = "{Al Mamun}, Mohammed and Luca Albergante and Alberto Moreno and Carrington, {Jamie T.} and John Blow and Newman, {Timothy J.}",
note = "Funding for this research was provided by: National Institutes of Health (U54 CA143682); Wellcome Trust (WT096598MA); Cancer Research UK (C303/A14301).",
year = "2016",
month = "9",
day = "27",
doi = "10.1073/pnas.1603241113",
language = "English",
volume = "113",
pages = "E5765--E5774",
journal = "Proceedings of the National Academy of Sciences",
issn = "0027-8424",
publisher = "National Academy of Sciences",
number = "39",

}

Inevitability and containment of replication errors for eukaryotic genome lengths spanning Megabase to Gigabase. / Al Mamun, Mohammed; Albergante, Luca; Moreno, Alberto; Carrington, Jamie T.; Blow, John; Newman, Timothy J. (Lead / Corresponding author).

In: Proceedings of the National Academy of Sciences, Vol. 113, No. 39, 27.09.2016, p. E5765-E5774.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Inevitability and containment of replication errors for eukaryotic genome lengths spanning Megabase to Gigabase

AU - Al Mamun, Mohammed

AU - Albergante, Luca

AU - Moreno, Alberto

AU - Carrington, Jamie T.

AU - Blow, John

AU - Newman, Timothy J.

N1 - Funding for this research was provided by: National Institutes of Health (U54 CA143682); Wellcome Trust (WT096598MA); Cancer Research UK (C303/A14301).

PY - 2016/9/27

Y1 - 2016/9/27

N2 - The replication of DNA is initiated at particular sites on the genome called replication origins (ROs). Understanding the constraints that regulate the distribution of ROs across different organisms is fundamental for quantifying the degree of replication errors and their downstream consequences. Using a simple probabilistic model we generate a set of predictions on the extreme sensitivity of error rates to the distribution of ROs, and how this distribution must therefore be tuned for genomes of vastly different sizes. As genome size changes from Megabases to Gigabases we predict that regularity of RO spacing is lost, that large gaps between ROs dominate error rates but are heavily constrained by the mean stalling distance of replication forks, and that for genomes spanning ~100 Megabases to ~10 Gigabases errors become increasingly inevitable but their number remains very small (three or less). Our theory predicts that the number of errors becomes significantly higher for genome sizes greater than ~10 Gigabases. We test these predictions against datasets in yeast, Arabidopsis, Drosophila and human, and also through direct experimentation on two different human cell lines. Agreement of theoretical predictions with experiment and datasets is found in all cases, resulting in a picture of great simplicity, whereby the density and positioning of ROs explain the replication error rates for the entire range of eukaryotes for which data is available. The theory highlights three domains of error rates: negligible (yeast), tolerable (metazoan) and high (some plants), with the human genome at the extreme end of the middle domain.

AB - The replication of DNA is initiated at particular sites on the genome called replication origins (ROs). Understanding the constraints that regulate the distribution of ROs across different organisms is fundamental for quantifying the degree of replication errors and their downstream consequences. Using a simple probabilistic model we generate a set of predictions on the extreme sensitivity of error rates to the distribution of ROs, and how this distribution must therefore be tuned for genomes of vastly different sizes. As genome size changes from Megabases to Gigabases we predict that regularity of RO spacing is lost, that large gaps between ROs dominate error rates but are heavily constrained by the mean stalling distance of replication forks, and that for genomes spanning ~100 Megabases to ~10 Gigabases errors become increasingly inevitable but their number remains very small (three or less). Our theory predicts that the number of errors becomes significantly higher for genome sizes greater than ~10 Gigabases. We test these predictions against datasets in yeast, Arabidopsis, Drosophila and human, and also through direct experimentation on two different human cell lines. Agreement of theoretical predictions with experiment and datasets is found in all cases, resulting in a picture of great simplicity, whereby the density and positioning of ROs explain the replication error rates for the entire range of eukaryotes for which data is available. The theory highlights three domains of error rates: negligible (yeast), tolerable (metazoan) and high (some plants), with the human genome at the extreme end of the middle domain.

KW - eukaryotes

KW - genome length

KW - replication error

KW - Poisson distribution

KW - mathematical modeling

U2 - 10.1073/pnas.1603241113

DO - 10.1073/pnas.1603241113

M3 - Article

C2 - 27630194

VL - 113

SP - E5765-E5774

JO - Proceedings of the National Academy of Sciences

JF - Proceedings of the National Academy of Sciences

SN - 0027-8424

IS - 39

ER -