THESIS
2012
1 volume (unpaged) : illustrations ; 30 cm
Abstract
We implemented a GPU-powered k-centers algorithm to perform clustering on the conformations of biomolecules. The algorithm is up to two orders of magnitude faster than the CPU implementation. We test our algorithm on four protein MD datasets ranging from the small Alanine Dipeptide to a 370-residue Maltose Binding Protein. It is capable of grouping 250,000 conformations of the Maltose Binding protein into 4000 clusters within 40 seconds. To achieve this, we effectively parallelize the code on the GPU and utilize the triangle inequality of metric spaces. Furthermore, the algorithm is linear with respect to the number of cluster centers. We also provide a mathematical argument for exploring the property of the triangle inequality in higher dimensions. Finally, using Alanine Dipeptide as a...[
Read more ]
We implemented a GPU-powered k-centers algorithm to perform clustering on the conformations of biomolecules. The algorithm is up to two orders of magnitude faster than the CPU implementation. We test our algorithm on four protein MD datasets ranging from the small Alanine Dipeptide to a 370-residue Maltose Binding Protein. It is capable of grouping 250,000 conformations of the Maltose Binding protein into 4000 clusters within 40 seconds. To achieve this, we effectively parallelize the code on the GPU and utilize the triangle inequality of metric spaces. Furthermore, the algorithm is linear with respect to the number of cluster centers. We also provide a mathematical argument for exploring the property of the triangle inequality in higher dimensions. Finally, using Alanine Dipeptide as an example, we show a strong correlation between cluster populations resulting from the k-centers algorithm and the underlying density.
Furthermore, in collaboration with members of the group, we developed an easy to use general purpose MD-engine that is very useful for generation of datasets for other types of clustering algorithms. The MD engine is quite robust, and primarily serves to generate datasets in 2D. It is capable of simulating the system of interest under varying ensembles, such as NVT and NVE.
Post a Comment