THESIS
2012
xi, 148 p. : ill. ; 30 cm
Abstract
Protein interactions constitute a crucial part of the cell metabolsim. In particular,
protein-protein, DNA-protein and RNA-protein interactions play a critical role in
the whole cellular system of transcription-expression-regulation. Machine Learning
is utilized to predict protein interactions through empirical models: simulations are
viable tools to experiments that const time and resources, and also require expensive
equipment and trained biologists to be performed. In this work, we provide novel
Machine Learning tools for protein-protein interaction, prediction DNA- and RNA-protein
interaction prediction and quantitative protein-protein affinity prediction.
The proposed protein-protein interaction classifier is based on an ensemble of
classifiers. The algorithm extends the c...[
Read more ]
Protein interactions constitute a crucial part of the cell metabolsim. In particular,
protein-protein, DNA-protein and RNA-protein interactions play a critical role in
the whole cellular system of transcription-expression-regulation. Machine Learning
is utilized to predict protein interactions through empirical models: simulations are
viable tools to experiments that const time and resources, and also require expensive
equipment and trained biologists to be performed. In this work, we provide novel
Machine Learning tools for protein-protein interaction, prediction DNA- and RNA-protein
interaction prediction and quantitative protein-protein affinity prediction.
The proposed protein-protein interaction classifier is based on an ensemble of
classifiers. The algorithm extends the concept of similarity from protein pairs to
protein pairs of pairs, therefore allowing the application of a k-Nearest Neighbor
machinery to instances composed of protein pairs. By combining Support Vector
Machines and this novel pairwise approach, our classifier overcomes the flaws of the
single algorithms composing it. It represents a novel method for protein-protein
interaction prediction. Compared to previously published works trained on the very
same data, it provides more accurate results.
The proposed approach for the DNA- and RNA-protein interaction classication
overcomes the limitations of other techinques by combining a pair perspective and
the presence both of sequential and structural elements in feature encoding. Thanks
to the pair perspective multiple targets can be embedded in the models. As consequence
our method is capable of generalization: it predicts the interaction of any
given DNA- and RNA-protein pair. In fact, it is utilized also to infer new putative
interactions in model organisms that are not used for training. In order to get a
better insight of the predicting mechanisms, we explore the effects of four different
negative sample assembling techniques and the how different features influences the
predictions.
A quantitative affinity prediction method is presented as well. The method is
tailored on the Dscam protein-protein Affinity prediction problem. It represents the
first attempt to model the Dscam protein self-binding machinery, whose mechanisms
have been described in Molecular Biology before but never mathematically modeled.
Based on a restricted data sample of 89 instances, the model provides predictions
about all the possible 19008 self-binding Dscam isoforms, while feature ranking
allows to investigate the evolution of Drosophile Dscam binding machinery. By
comparing predicted affinities on actual and ancient Dscam proteins, we demonstrate
that Dscam evolved more efficient self-binding isoforms.
Post a Comment