THESIS
2013
xii, 62 pages : illustrations (some color) ; 30 cm
Abstract
Liquid chromatography coupled to mass spectrometry (LC‐MS) is the dominant
technological platform for proteomics. An LC‐MS analysis of a complex biological
sample can be visualized as a “map” of which the positional coordinates are the
mass‐to‐charge ratio (m/z) and chromatographic retention time (RT) of the chemical
species profiled. Label‐free quantitative proteomics requires the alignment and
comparison of multiple LC‐MS maps to ascertain the reproducibility of experiments or
reveal proteome changes under different conditions. The main challenge in this task
lies in correcting retention time shifts, which are inevitable even on the same
instrument and under the same elution conditions. For large‐scale studies, multiple
instruments or multi‐week experiments are often required...[
Read more ]
Liquid chromatography coupled to mass spectrometry (LC‐MS) is the dominant
technological platform for proteomics. An LC‐MS analysis of a complex biological
sample can be visualized as a “map” of which the positional coordinates are the
mass‐to‐charge ratio (m/z) and chromatographic retention time (RT) of the chemical
species profiled. Label‐free quantitative proteomics requires the alignment and
comparison of multiple LC‐MS maps to ascertain the reproducibility of experiments or
reveal proteome changes under different conditions. The main challenge in this task
lies in correcting retention time shifts, which are inevitable even on the same
instrument and under the same elution conditions. For large‐scale studies, multiple
instruments or multi‐week experiments are often required, which exacerbates the
problem. Similar, but not identical, LC instruments and settings can cause peptides to
elute in a different order, violating the key assumption of many state‐of‐the‐art
alignment tools. We present a new graph‐based time alignment algorithm that can
align these less similar LC‐MS maps, which cannot be effectively handled by existing
methods.
We developed WBMatch, which is based on an efficient weighted bipartite matching algorithm from graph theory, to align different LC‐MS maps. Instead of finding warping
functions to correct the variations in the retention time dimension, WBMatch directly
tries to find a peak‐to‐peak mapping that maximize a similarity function between two
LC‐MS maps. The similarity function is a combination of m/z and retention time
deviations of all aligned peaks. In order to handle large retention time shifts between
LC‐MS experiments conducted from different instruments or settings, an additional
step of locally weighted scatterplot smoothing is performed prior to WBMatch, forming
a new method called LWBMatch. For validation, we defined the ground‐truth for
alignment success based on MS/MS identifications from sequence searching. We
showed that our method outperforms several existing tools in terms of precision and
recall, and is capable of aligning maps from different instruments and settings.
We also applied WBMatch to a chemical fingerprint database building and searching
software application which is called MSSearch. We built a fingerprint database using
LC‐MS profiles of secondary metabolites of marine bacteria. An unknown bacterial
sample can then be searched against the database for the closest match in terms of
their secondary metabolite repertoires. The searching process consists of simple
pairwise comparisons of LC‐MS maps, conducted by the efficient WBMatch. We
showed the LC‐MS profiles of the secondary metabolites of the same bacteria have
sufficient reproducibility ( > 90% for technical replicates and > 65% for biological
replicates) and the differentiation between the secondary metabolite profiles of
different marine bacteria variants in the same species is enough to separate them.
More practically, MSSearch provides the functionality to search for individual peak
matches, enabling one to screen unknown bacteria for their abilities to produce certain
useful compounds.
Post a Comment