Machine learning for novel therapeutic target identification and prioritisation.

  • David Narganes Carlon

Student thesis: Doctoral ThesisDoctor of Philosophy

Abstract

Therapeutic target prioritisation and identification is one of the first and most important steps in the drug discovery pipeline. In this step, scientists try to find a biological molecule, the target, such that when it is modulated by a compound or drug, the effects of the disease are palliated. This thesis will contain 3 approaches that use multiple machine-learning algorithms to generate a genome-wide ranking of therapeutic targets. In the first chapter, Trendy Genes implements a pipeline to perform entity recognition of genes, the targets, and diseases in the biomedical literature. The novelty of the method is that it uses a graph representation of the biomedical literature and natural language processing, achieving 86% recall and high precision against other databases. Subsequently, recurrent neural networks were trained on the dynamics of publications of biomedical entities to prioritise genes with an unexpected, unpredictable high number of publications to prioritise targets. The second results chapter presents some language models that were trained in the past and validated historically to test their accuracy to prioritise therapeutic targets more likely to enter clinical trials, achieving a precision of 6% on the top 200 hypotheses and outperforming the state-of-the-art hypotheses. The last results chapter will describe the clinical link prediction knowledge graph that integrates more than 50 data sources relevant to prioritising therapeutic targets. Finally, graph machine learning and link prediction models were used to generate recommendation systems for novel links between genes, the targets, and diseases in the graph, achieving a 5% precision at the top 200 hypotheses.
Date of Award2022
Original languageEnglish
SupervisorEwan Pearson (Supervisor), Daniel Crowther (Supervisor) & Rory McCrimmon (Supervisor)

Cite this

'