Developing structural clustering algorithms for analyzing molecular dynamics trajectories

HKUST Electronic Theses

Developing structural clustering algorithms for analyzing molecular dynamics trajectories

by Song Liu

THESIS 2020

M.Phil. Chemistry

vii, 84 pages : illustrations (chiefly color) ; 30 cm

Abstract

Understanding the structure and dynamics of biosystems is crucial in the study of computational biochemistry. MD simulations have been widely used as a powerful tool to investigate biological molecules and mechanisms in recent decades. As MD simulations normally generate high-dimensional data that are very hard to visualize and directly comprehend, advanced numerical techniques, such as unsupervised learning, need to be adopted; e.g., clustering algorithms that have been widely used in recent years in Markov state models(MSMs). The key benefit of clustering techniques is their ability to reduce the dimensionality of MD data without prior knowledge of the structural details or dynamic mechanisms. The existing clustering algorithms can only be performed on a given resolution of the conformational space or the protein free energy landscape. Consequently, a two-step splitting-and-lumping scheme has been widely adopted in MSMs to find the metastable states that may appear at different levels of the free energy landscape by first clustering or splitting the conformational space into microstates and then lumping them together. However, this two-step scheme often provides limited insights into the free energy landscape, particularly its hierarchical structures. Therefore, improved clustering algorithms are required to generate the metastable states across different timescales and to study the hierarchical structure of the free energy landscape. In this thesis, I introduced a new density-based clustering algorithm, the Multi-Level DBSCAN (ML-DBSCAN), which combines clustering results at different resolution levels to obtain the hierarchical structure of the free energy landscape and identify metastable conformational states. We show that ML-DBSCAN could efficiently free energy landscape from MD simulations of four different peptide systems. I also developed a software package for data clustering: Hong Kong Data Miner (HKDataMiner), which is particularly suited for MD simulation trajectories. In addition to standard clustering algorithms, HKDataMiner is implemented with our new clustering algorithms: i.e. The GPU implementation of ML-DBSCAN and APLoD clustering algorithm. Finally, I contributed to the development of a traveling salesman-based automated path searching method (TAPS) to locate the minimum free energy paths (MFEPs) between two conformational states. Using two peptide systems, we show that the TAPS method has a computation time that is 5-8 times faster than the computation time of the string method to local MFEPs.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree M.Phil. Department Chemistry Authors Liu, Song Subjects Molecular dynamics Mathematical models Cluster analysis Language English Call number Thesis CHEM 2020 Liu DOI 10.14711/thesis-991012893266803412

Full record

Developing structural clustering algorithms for analyzing molecular dynamics trajectories

by Song Liu

Post a Comment Cancel reply