THESIS
2018
xvi, 143 pages : color illustrations ; 30 cm
Abstract
With the rapid accumulation of proteomics data from laboratories worldwide, opportunities now
exist to leverage this data to build high-quality spectral library to assist peptide identification.
Traditional library building in proteomics, however, is error-prone and is not amenable for
incremental updates, making it unsustainable for today’s data volume. Spectral clustering has
been proposed as a more reliable way to build spectral libraries compared to conventional methods
because of its capability of retaining unidentified spectra and correcting identification errors. In
this study, we propose a novel method designed for spectral clustering using DBSCAN with
Euclidean-style distance metrics, which incorporate both spectral similarities and precursor mass-to-charge ratio differe...[
Read more ]
With the rapid accumulation of proteomics data from laboratories worldwide, opportunities now
exist to leverage this data to build high-quality spectral library to assist peptide identification.
Traditional library building in proteomics, however, is error-prone and is not amenable for
incremental updates, making it unsustainable for today’s data volume. Spectral clustering has
been proposed as a more reliable way to build spectral libraries compared to conventional methods
because of its capability of retaining unidentified spectra and correcting identification errors. In
this study, we propose a novel method designed for spectral clustering using DBSCAN with
Euclidean-style distance metrics, which incorporate both spectral similarities and precursor mass-to-charge ratio differences.
The newly proposed method can perform refinement of existing clusters, or directly cluster raw
data using a GPU-accelerated algorithm. Existing clusters were each subjected to DBSCAN to
identify cluster cores and separate loosely connected nodes. For each distance metric, the distance
threshold (Eps) was optimized. With this method, higher-quality clusters could be built compared
with the original clusters built by only considering pairwise spectral similarity. Both internal and
external measurements to assess the quality of our refined resulting clusters. The clusters
constructed by DBSCAN method were less likely to consist of spectra of different peptide
identifications, and the lower-quality spectra were removed on account of their lower similarity
and fewer connections to the cluster cores.
Post a Comment