Abstract
Over the past decade, knowledge of the human genome has grown exponentially. While identifying individual genes and their protein products is crucial, understanding how these entities exist within the context of other molecules within the cell provides valuable insight into their functional significance. In particular, mapping the intricate web of interactions between proteins (or the ‘interactome’), allows for an understanding the roles of individual proteins within specific cellular processes and the potentially negative implications when these processes cannot occur. At the present time, approximately 40,000 binary, protein-protein interactions have been identified in human through low- and high-throughput, lab-based experiments; however, this number represents only a fraction of the estimated 600,000 protein-protein interactions thought to occur. With the high number of potential protein-protein pairing, experimentally testing each possible interaction is a time-consuming and near-impossible task. As a result, several computational methods have been developed to predict probable interactions for experimental verification.Previously, our group developed PIPs, a predictor of protein-protein interactions in human based on a naive Bayesian framework that has undergone two version releases (Scott et al., 2007, McDowall, 2011). In this thesis, a third version of PIPs, PIPs v. 3.0, is described. In addition to an update of the included data, PIPs v. 3.0 contains a new network analysis component, the TransMCL (Z) module, that combines the previously separate Transitive module (and associated EOCT predictor) introduced in version 1.0 and Cluster module (and associated EOCM predictor) introduced in version 2.0. This new module has allowed the two previously separate PIPs predictors to be merged into one method (the EOCZ predictor). In total, the new EOCZ predictor identifies over 500K significant interactions, made up of those predicted by the EOCT and EOCM predictors individually as well as a new set of interactions.
Additionally, this thesis describes the development of PIP’NN, a new protein-protein interaction predictor built on a neural network framework with the data incorporated into PIPs. Overall, PIP’NN performs slightly better than the three PIPs predictors on multiple blind tests of varying sizes. PIP’NN identifies both interactions predicted by the three PIPs methods as well as a set of new interactions. As a result, PIP’NN is able to stand on its own as a new predictor of human protein-protein interactions or in conjunction with PIPs as a method to further narrow down the set of predicted interactions.
Finally, this thesis describes the practical implementation of PIPs and PIP’NN through collaborations with two groups within the University of Dundee that have identified sets of potential interactions of interest for experimental confirmation. While these interactions have yet to be confirmed, both studies offer a proof of concept of how the predictors can be incorporated into lab-based interaction identification protocols. Additionally, the new PIPs web server will allow outside groups access to the updated PIPs prediction database.
Overall, the work described in this thesis has built upon previous work both within and outside of the University of Dundee to further the identification of novel protein-protein interactions in human and increase the understanding of the human interactome.
Date of Award | 2013 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Geoffrey Barton (Supervisor) |