Computational methods for the characterisation and evaluation of protein-ligand binding sites

  • Javier Sanchez Utges

Student thesis: Doctoral ThesisDoctor of Philosophy

Abstract

Fragment screening is used for hit identification in drug discovery, but it is often unclear which binding sites are functionally relevant. Here, data from 37 experiments is analysed. A method to group ligands by binding sites is introduced and sites clustered by their solvent accessibility. This identified 293 ligand sites, grouped into four clusters. C1 includes buried, conserved, missense-depleted sites and is enriched in known functional sites. C4 comprises accessible, divergent, missense-enriched sites and is depleted in functional sites.

This approach is extended to the entire PDB, resulting in the LIGYSIS dataset, accessible through a new web server. LIGYSIS-web hosts a database of 65,000 protein-ligand binding sites across 25,000 proteins. LIGYSIS sites are defined by aggregating unique relevant protein-ligand interfaces across multiple structures. Additionally, users can upload structures for analysis, visualisation and download. Results are displayed in LIGYSIS-web, a Python Flask web application.

Finally, the human component of LIGYSIS, comprising 6800 binding sites across 2775 proteins, is used to perform the largest benchmark of ligand site prediction to date. Thirteen canonical methods and fifteen novel variants are evaluated using 14 metrics. Additionally, LIGYSIS is compared to datasets such as PDBbind or MOAD and shown to be superior, as it considers non-redundant interfaces across biological assemblies. Re-scored fpocket predictions present the highest recall (60%). The detrimental effect in performance of redundant prediction, as well as the beneficial impact of stronger pocket scoring schemes is demonstrated. To conclude, top-N+2 recall is proposed as the universal benchmark metric and authors encouraged to share their benchmark code for reproducibility.
Date of Award2025
Original languageEnglish
Awarding Institution
  • University of Dundee
SponsorsBBSRC EASTBIO DTP Studentship
SupervisorGeoffrey Barton (Supervisor) & Ulrich Zachariae (Supervisor)

Keywords

  • Protein-ligand binding site
  • Evolutionary conservation
  • Missense variation
  • Solvent accessibility
  • Web design
  • Ligand site prediction
  • Benchmark
  • Fragment screening
  • Functional classification

Cite this

'