THESIS
2010
x, 58 p. : col. ill. ; 30 cm
Abstract
Tandem mass spectrometry is the dominant proteomics technology for identification of proteins in a mixture. Nowadays, developments in modern mass spectrometers have made it possible to produce a large number of tandem mass spectra in a relative short time. Unfortunately, almost every single spectrum contains a significant amount of noise, which is introduced as a result of contamination or other experimental artifacts. Thus, computational analysis of thousands of noise-contaminated spectra is a major challenge in proteomics research. The appearance of noise peaks in spectra not only leads to a waste of time spent in sequence database searching and data storage, but also, more critically, increases the false positives or false negatives in the process of interpretation of peptides and p...[
Read more ]
Tandem mass spectrometry is the dominant proteomics technology for identification of proteins in a mixture. Nowadays, developments in modern mass spectrometers have made it possible to produce a large number of tandem mass spectra in a relative short time. Unfortunately, almost every single spectrum contains a significant amount of noise, which is introduced as a result of contamination or other experimental artifacts. Thus, computational analysis of thousands of noise-contaminated spectra is a major challenge in proteomics research. The appearance of noise peaks in spectra not only leads to a waste of time spent in sequence database searching and data storage, but also, more critically, increases the false positives or false negatives in the process of interpretation of peptides and proteins.
Strategies to de-noise spectra intend to retain signal peaks while removing noisy peaks. On average, up to 74% of the peaks in tandem mass spectra are noise. Therefore, it is appealing to develop a noise-filtering algorithm before assigning peptides to spectra, as well as when spectra are archived in spectral libraries. A common strategy is to specify a threshold, based on intensity of each peak. Peaks with intensity below that threshold are considered as noise and thrown out. Another simple method applies a rank cut-off criterion. For example, all peaks in a certain tandem mass spectrum are ranked by their intensity, and the top 50 peaks are assumed to be signals. It is obvious that these simple filters just takes the intensity information into consideration but neglect other useful hidden characteristic of peptide MS/MS spectra as location of peaks, relationship between pair of peaks, etc. In addition to this disadvantage, since the signal density, i.e. the fraction of peaks that are signals, varies a lot among spectra, it is not optimal to apply a constant rank cutoff to de-noise spectra.
Here we propose a Bayesian machine-learning approach to assign a probability of being a signal to each peak in a spectrum based on their characteristics that are overlooked by intensity-based filtering methods. The cut-off criterion is determined according to this estimated probability. Therefore, spectra with different Signal density can be de-noised at a controlled number of signal peaks. The conditional probabilities are learned from a training set, in which signal and noise peaks are independently partitioned by reproducibility. Our model confirms, and quantifies, well-known qualitative behavior of peptide fragmentation.
Post a Comment