Towards population-scale proteomics to study molecular phenotypes in health and disease

Student thesis: Doctoral ThesisDoctor of Philosophy

Abstract

Proteomics has reached a turning point where the datasets are now approaching population-scale, this was achieved via faster and more sensitive instruments, improved sample handling and robust software tools. The population-scale era creates the opportunity to further study the molecular differences across cell lines, cell types or tissues as well as phenotypes of health and disease in human patients. In this thesis we utilize different methodologies to perform large-scale proteomic analyses, including a data dependent acquisition (DDA) based dataset using tandem mass tags (TMT) and a label-free data independent acquisition (DIA) one. The two studies looked at different systems, including induced pluripotent stem cells (hiPSCs) as well as peripheral blood neutrophils derived from both healthy and diseased patients and provide new biological insights including new disease specific biomarkers as well as novel drug targets.

On a technical aspect, this work revealed novel considerations and limitations of using TMT for large-scale proteomic experiments, providing a framework to improve future studies and minimise the potential issues involved. Furthermore, the TMT-based hiPSC dataset revealed novel protein level effects caused by the erosion of X chromosome inactivation (XCI) in healthy female hiPSCs. The data show the erosion of XCI increases abundance of not just X chromosome proteins, but over 2,000 proteins derived from all other chromosomes, thereby significantly increasing the protein content in these eroded female lines compared to male and non-eroded female lines.

The DIA-based large-scale proteomic characterisation of neutrophils derived from control and COVID19 patients revealed the effects of SARS-CoV2 infection in human neutrophils at the early infection and recovery phase. It highlighted a core signature present in the early infection timepoints in COVID19 patients, as well as some transient and some persistent changes on the neutrophil proteomes caused by COVID19. The study also highlighted the potential for patient stratification and precision medicine, as it detected important proteomic changes that were only present in patients with critically severe COVID19, and empowered treatment options that could help improve clinical trajectories for these patients. Furthermore, the data provided molecular insights into the delayed recovery state seen in a subset of COVID19 patients, as this patient group displayed a dysfunctional neutrophil phenotype reminiscent of what is seen in chronic diseases like COPD.

Finally, this thesis explores the value of web applications (web app) to visualise and explore the proteomic data. Proteomic datasets are growing in size and complexity, thus finding ways to share and visualise this complex data in an intuitive format for non-specialist users becomes an important goal. As a solution we built two web apps, the Encyclopedia of Proteome Dynamics (EPD) and the Immunological Proteome Resource (ImmPRes), to ensure open access to the to the raw and processed proteomic data, as well as easy visualisation/exploration via graphical interfaces. ImmPRes provides access to the biggest collection of leukocyte proteomic data, integrating data from innate and adaptive cells as well as multiple datasets characterising different T cell populations along with multiple signalling pathways that modulate their functions, all contained within a simple web application that is easily accessible for all immunologists.
Date of Award2023
Original languageEnglish
SponsorsWellcome Trust
SupervisorDoreen Cantrell (Supervisor) & Angus Lamond (Supervisor)

Keywords

  • proteomics
  • immunology
  • stem cells
  • Mass Spectrometry
  • T cells
  • Neutrophils
  • DIA
  • DDA
  • web applications
  • ImmPRes
  • data sharing

Cite this

'