Infectious diseases represent a multitude of threats to populations in both the developed and developing world, from the emergence of drug resistant bacteria and new pathogens to the ancient killers of the neglected topical diseases. Yet a common problem unites all infectious diseases, that is the challenge of how do we cost effectively identify new drugs? The arrival of high-throughput low cost sequencing starkly illustrates the nature of the challenge: the genome sequence of any pathogen can now be determined in a few days yet the availability of complete pathogen genomes has not led to the anticipated wave of new therapies. One reason for this failure might be that previous efforts at selecting the best targets from the genome have not taken into account information on the properties of associated small molecule ligands.
To improve the exploitation of genomic information in the discovery of drug targets for new anti-infective agents a modular informatics framework is described that enables the large-scale comparative analysis of pathogen and host genomes. Specifically, new methods to predict essential genes, identify druggable domains and predict selectivity are presented, that have advantages over current approaches.
The proposed method to predict essentiality is benchmarked against whole genome essentiality datasets and applied in practice to the analysis of a diverse range species including the bacterial pathogen Pseudomonas aeruginosa and eukaryotic parasites Trypanosoma brucei, Trypanosoma cruzi, Leishmania braziliensis, Leishmania infantum, Leishmania major and Schistosoma mansoni.
In order to identify druggable and selective targets a domain-based approach to mining genomes for druggable targets is developed. A domain family based approach enables the determination of "binding site signatures" in the primary amino acid sequences which enables the identification and comparison of specific binding modes for both active/orthosteric site and allosteric site ligands. Information in the binding site signatures is used to train and validate a Bayesian model to predict a compounds selectivity between members of a domain family, whether from within a single genome or from multiple species.
|Date of Award
|Andrew Hopkins (Supervisor)