THESIS
2017
xviii, 105 pages : illustrations (some color) ; 30 cm
Abstract
Identification of peptide tandem mass spectra by spectral library searching is well-suited for proteomics experiments that aim at re-detecting and quantifying previously observed peptides since it is more accurate, faster and simpler than conventional database searching. Currently, building libraries typically involves grouping spectra by identifications assigned by sequence database searching, merging spectra with the same identification into consensus spectra, and filtering questionable spectra. Unidentified spectra are discarded. Here we propose a novel method for identification-independent library building by spectral clustering to improve the fidelity of the process and make use of unidentified spectra. Unlike previously proposed methods, we aim to make clustering billion of spect...[
Read more ]
Identification of peptide tandem mass spectra by spectral library searching is well-suited for proteomics experiments that aim at re-detecting and quantifying previously observed peptides since it is more accurate, faster and simpler than conventional database searching. Currently, building libraries typically involves grouping spectra by identifications assigned by sequence database searching, merging spectra with the same identification into consensus spectra, and filtering questionable spectra. Unidentified spectra are discarded. Here we propose a novel method for identification-independent library building by spectral clustering to improve the fidelity of the process and make use of unidentified spectra. Unlike previously proposed methods, we aim to make clustering billion of spectra feasible with low-cost home computers, so that our method can be utilized by smaller research groups that do not have ready access to expensive high-performance computing hardware. To this end, Graphics Processing Units (GPU) are adopted to accelerate the computation of the pairwise similarities of spectra. Our method also does not merge spectra before the cluster is completed, thereby preserving the accurate cluster structures to enable cluster refinement based on connectivity measures. We show that our method produces fewer mixed clusters than the state-of-the-art spectrum clustering method MS-Cluster.
Post a Comment