|
||||||
|
||||||
| Graduate Thesis 2010 | ||||||
|
PIPE3: A Massively Parallel Protein-Protein Interaction Prediction Engine By Andrew Schoenrock Summer 2010 A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master of Computer Science
Ottawa-Carleton Institute for Computer Science School of Computer Science Carleton University Supervisor: Frank Dehne ABSTRACT Protein-protein interactions play a key role in many human diseases. A detailed knowledge of protein-protein interactions will accelerate the drug discovery process as well as reduce overall drug development costs. Biologists have a variety of methods to determine protein-protein interactions in the labratory, however these approaches are generally expensive, labour intensive, time consuming and have significantly high error rates. The Protein-protein Interaction Prediction Engine (PIPE) is a computational approach to prediction protein-protein interactions. The first version of PIPE proved that the underlying hypothesis that PIPE is built upon had merit. The second version of PIPE was a vast improvement over the first version and proved that a prediction over all possible yeast protein pairs was possible, however still had room for major improvements. The two major limitations of the PIPE2 implementation was that it was not very efficient with its memory use and it did not scale well due to a lack of a real parallel architecture.
This thesis presents PIPE3, a new version of PIPE designed to overcome the downfalls of the earlier versions. A new parallel architecture consisting of a mixed master/slave and all-slaves model was designed and implemented to address these issues. This overall approach exploits parallelism on two levels, the first by running multiple PIPE threads on a given cluster node and secondly by running many of these processes on different cluster nodes, all being fed work in an on-demand fashion. PIPE3's parallel architecture performed well overall. The all-slaves portion of the parallel architecture produced a significant speedup and the master/slave portion of the parallel architecture resulted in a linear speedup when more cluster nodes were used. PIPE3 was then used to produce some significant scientific results. These include a proteome-wide scan on the Caenorhabditis Elegans and Homo Sapiens organisms, which also resulted in the largest threaded calculation ever run at the High Performance Computing Virtual Laboratory (HPCVL).
THESIS DOWNLOAD [ TH_mcs_2010_schoenrock_0023.pdf ] |
||||||
|