Probabilistic Modelling of Replication Fidelity in Eukaryotic Genomes

  • Mohammed Al Mamun

    Student thesis: Doctoral ThesisDoctor of Philosophy


    Eukaryotic DNA replication is composed of a complex array of molecular biological activities compounded by the pressure for faithful replication in order to maintain genetic and genomic integrity. The constraints governing DNA replication biology is of fundamental importance to understand the degree of replication error and strategies employed by organisms to tackle the threats to replication fidelity from such errors. We apply a simple conceptual model, formalized by the use of probability theory and statistics, to discern fundamental pressures and constraints that optimise complete DNA replication in genomes of different size scales (10 Megabases to 10 Gigabases), spanning the whole eukaryota. We show in yeasts (genome size ~10 Megabases) that the replication origins (sites on DNA where replication can be initiated) are biased towards equal spacing on the genome and the largest gap between adjacent origins is limited compared to that is expected by chance, as well as origins are placed very close to the telomeric ends in order to minimize the replication errors arising from occasional irreversible failures of replication forks. Replication origin mapping data from five different yeasts confirm to all of these predictions. We derive an estimate of ~5.8×10-8 for the fork stalling rate per nucleotide, the one unknown parameter in our theory, which conforms to previous experimental estimates. We show in higher eukaryotes (genome size 100 Megabases to 10 Gigabases) that the bias for equal origin spacing is absent, larger origin gaps contribute more to the errors while the permissible origin separations are restricted by the rate of fork stalling per nucleotide, and in the larger genomes (> 100 Megabases) errors become increasingly inevitable, yet with low net number of events, that follows a Poisson with small mean. We show, in very large genomes e.g. human genome, that larger gaps contributing most to the error are distributed as a power law to spread the risk of damage from the error, and post-replicative error-correction mechanisms are necessary for containment of the inevitable errors. Replication origin mapping data from yeast, Arabidopsis, Drosophila and human cell lines as well as experimental observations of post replicative error markers validate these predictions. We show that replication errors can be quantified from the nucleosome scale minimum inter-origin distance permissible under the known DNA structure and we propose a universal replication constant maintained across all eukaryotes independent of their architectural complexity. We show this molecular biological constant relates the genome length and developmental robustness of organisms and this is confirmed by early embryonic mortality rates from different organisms. Good agreement of the biologically obtained data to the model predictions in all cases suggests our model efficiently captures the biological complexity involved in containing errors in the DNA replication process. Conceptually, the model thus portrays how simple ideas can help complex biology to elevate our understanding of the continuously increasing knowledge of biological details.
    Date of Award2016
    Original languageEnglish
    SupervisorTimothy Newman (Supervisor)


    • Probability
    • Eukaryotes
    • Genome
    • Replication error
    • Development

    Cite this