THESIS
2020
vii, 84 pages : illustrations (chiefly color) ; 30 cm
Abstract
Understanding the structure and dynamics of biosystems is crucial in the study of
computational biochemistry. MD simulations have been widely used as a powerful tool to
investigate biological molecules and mechanisms in recent decades. As MD simulations
normally generate high-dimensional data that are very hard to visualize and directly
comprehend, advanced numerical techniques, such as unsupervised learning, need to be
adopted; e.g., clustering algorithms that have been widely used in recent years in Markov state
models(MSMs). The key benefit of clustering techniques is their ability to reduce the
dimensionality of MD data without prior knowledge of the structural details or dynamic
mechanisms. The existing clustering algorithms can only be performed on a given resolution
of t...[
Read more ]
Understanding the structure and dynamics of biosystems is crucial in the study of
computational biochemistry. MD simulations have been widely used as a powerful tool to
investigate biological molecules and mechanisms in recent decades. As MD simulations
normally generate high-dimensional data that are very hard to visualize and directly
comprehend, advanced numerical techniques, such as unsupervised learning, need to be
adopted; e.g., clustering algorithms that have been widely used in recent years in Markov state
models(MSMs). The key benefit of clustering techniques is their ability to reduce the
dimensionality of MD data without prior knowledge of the structural details or dynamic
mechanisms. The existing clustering algorithms can only be performed on a given resolution
of the conformational space or the protein free energy landscape. Consequently, a two-step
splitting-and-lumping scheme has been widely adopted in MSMs to find the metastable states
that may appear at different levels of the free energy landscape by first clustering or splitting
the conformational space into microstates and then lumping them together. However, this two-step
scheme often provides limited insights into the free energy landscape, particularly its
hierarchical structures. Therefore, improved clustering algorithms are required to generate the
metastable states across different timescales and to study the hierarchical structure of the free
energy landscape. In this thesis, I introduced a new density-based clustering algorithm, the
Multi-Level DBSCAN (ML-DBSCAN), which combines clustering results at different
resolution levels to obtain the hierarchical structure of the free energy landscape and identify
metastable conformational states. We show that ML-DBSCAN could efficiently free energy
landscape from MD simulations of four different peptide systems. I also developed a software
package for data clustering: Hong Kong Data Miner (HKDataMiner), which is particularly
suited for MD simulation trajectories. In addition to standard clustering algorithms,
HKDataMiner is implemented with our new clustering algorithms: i.e. The GPU
implementation of ML-DBSCAN and APLoD clustering algorithm. Finally, I contributed to
the development of a traveling salesman-based automated path searching method (TAPS) to
locate the minimum free energy paths (MFEPs) between two conformational states. Using two
peptide systems, we show that the TAPS method has a computation time that is 5-8 times faster
than the computation time of the string method to local MFEPs.
Post a Comment