THESIS
2018
xvii, 75 pages : illustrations ; 30 cm
Abstract
Mass spectrometry (MS) is currently the mainstream technique in analyzing protein
samples. As a fundamental task in tandem mass spectrometry experiments, peptide
identification plays an essential role in providing sequence information for protein analysis.
Although traditional algorithms of peptide identification are mature, when post-translational
modifications (PTMs) and cross-linking techniques are taken into consideration,
they fail to provide satisfactory results due to the heavy computation. Existing
methods specialized in PTM identification and cross-linked peptide identification suffer
from the following issues:
(1) In PTM identification, the specified number of PTMs during the search is limited.
(2) In cross-linked peptide identification, only very few tools can finish...[
Read more ]
Mass spectrometry (MS) is currently the mainstream technique in analyzing protein
samples. As a fundamental task in tandem mass spectrometry experiments, peptide
identification plays an essential role in providing sequence information for protein analysis.
Although traditional algorithms of peptide identification are mature, when post-translational
modifications (PTMs) and cross-linking techniques are taken into consideration,
they fail to provide satisfactory results due to the heavy computation. Existing
methods specialized in PTM identification and cross-linked peptide identification suffer
from the following issues:
(1) In PTM identification, the specified number of PTMs during the search is limited.
(2) In cross-linked peptide identification, only very few tools can finish the search
within an acceptable period of time when a large database is used.
In this thesis, a simulation-based study on spectrum-based and tag-based open search
for peptide identification with PTMs is proposed to tackle the first issue. In addition, a
novel linear time algorithm for cross-linked peptide identification is presented to tackle
the second. The simulation-based study investigates both spectrum-based and tag-based
open search methods to identify core peptides, i.e., the peptide backbones without PTM
annotations. It demonstrates the performance trends of these methods when the mass
spectra have different qualities, and provides an analytical model for predicting the performance.
The feasible regions are then obtained for reliable use.
The novel linear time algorithm for cross-linked peptide identification solves several
remaining issues in existing methods. The new algorithm is implemented in a tool
called Xolik. It provides precise numerical computations while achieving a linear time
complexity. Experiments using synthetic and empirical datasets show that it outperforms
existing tools in terms of running time and statistical power. Theoretical proofs of the
correctness and time complexity of the algorithm are also provided in the thesis.
Post a Comment