Bioinformatics Analysis of Whole-Exome Sequencing Data for the Identification of Nuclear and Chloroplast Diversity in Barley

  • Stylianos Kyriakidis

Student thesis: Doctoral ThesisDoctor of Philosophy

Abstract

Barley (Hordeum vulgare) is one of the most important crops worldwide, not only based on its yearly production and its financial contribution to the food and drink industries, but also based on its ability to adapt to different and extreme environments providing an enormous pool of genetic variation that can be exploited in future crop improvement projects. Previous predictions have shown that an annual increase in production of 1.6% until 2050 will be required to meet predicted population growth, and this is in the face of climate change which is expected to negatively impact worldwide and European crop productivity.

Over the last decade, sequencing methodologies have improved enormously, improving our ability to identify genetic variants that can affect heritable phenotypes. However, sequencing whole genomes is still an expensive and time-consuming process compared to targeted sequencing. In this study, targeted exome sequencing data from a large set of diverse geo-referenced barley germplasm have been used to study nuclear and chloroplast genetic diversity associated with geography and environmental factors. In order to assess the performance of two of the most widely used variant calling pipelines, Bowtie2/FreeBayes and BWA/GATK, a small subset of samples was used and variants identified and validated against an independent set of iSelect SNP-chip data which showed higher accuracy for BWA/GATK. I subsequently used the latter for the rest of this thesis.

Flowering time in barley is synchronised to external stimuli such as day length and temperature through the flowering time mechanism that includes flowering and circadian clock genes. I selected 19 known flowering-related genes and transcription factors that control different aspects of flowering development, that had recently been identified as homologs of well characterised genes in Arabidopsis, for a detailed analysis of molecular diversity. My analysis showed that many contain extensive sequence variation and that patterns of single- and multiple-gene haplotypes exhibit strong geographical structuring. Furthermore, previously identified causal SNPs for days to heading were validated and investigated as a potential cause of geographical structuring. I observed a strong association with certain SNPs and latitude.

Chloroplast (cp) genomes, can be used to understand plant diversity and evolution. In this study, I mapped raw whole exome sequence data from 351 barley accessions against the “Morex” chloroplast genome, aiming for robust variant calling across the whole length of the chloroplast genome. The data revealed that whole exome sequence reads can produce a higher number of high-quality variants compared to previously published methods using cpSSRs and RFLPs. By using these variants to assess diversity and genetic structure in wild and landrace accessions, I identified two clusters of landraces, genetically distant from other landraces in our collection, one from Ethiopia and one from the Fertile crescent. Both could be useful sources alleles to study the diversity and adaptation in barley germplasm.
Date of Award2018
Original languageEnglish
SponsorsEuropean Union
SupervisorRobbie Waugh (Supervisor) & Joanne Russell (Supervisor)

Cite this

'