THESIS
2020
xv, 174 pages : illustrations ; 30 cm
Abstract
In shotgun proteomics, one of the major steps necessary for reliable peptide identification is false
discovery rate (FDR) estimation, which, ideally, allows preservation of a pre-specified fraction of
false positive peptide identifications in the set of those deemed correct. Currently used statistical
validation tools for FDR control are dominated by heavily data-dependent approaches with
restrictive or empirical, and therefore lacking any theoretical grounding, assumptions and settings
which often lead to generation of inaccurate FDR estimates. In this study, a new approach to FDR
estimation using the concept of fixed empirical null score distribution based on a single-spectrum
statistical confidence measure (E-value), called Common Decoy Distribution (CDD) is proposed.
Separat...[
Read more ]
In shotgun proteomics, one of the major steps necessary for reliable peptide identification is false
discovery rate (FDR) estimation, which, ideally, allows preservation of a pre-specified fraction of
false positive peptide identifications in the set of those deemed correct. Currently used statistical
validation tools for FDR control are dominated by heavily data-dependent approaches with
restrictive or empirical, and therefore lacking any theoretical grounding, assumptions and settings
which often lead to generation of inaccurate FDR estimates. In this study, a new approach to FDR
estimation using the concept of fixed empirical null score distribution based on a single-spectrum
statistical confidence measure (E-value), called Common Decoy Distribution (CDD) is proposed.
Separate CDDs constructed for spectra of different charge states were shown to be negligibly
sensitive to changes in noise levels and the presence of unexpected post-translational modifications
(PTMs) in the analyzed spectra. Optimized implementation of CDD idea in PeptideProphet
framework (PP-CDD) allowed calculation of satisfactorily accurate and precise FDR estimates for
different sets of validation files, with the performance being on par with that of PeptideProphet
variants studied, and much better than Percolator’s. Noticeable sensitivity of the precision of FDR
estimation by PP-CDD towards precursor mass tolerance setting applied at the sequence database
search stage indicates that accurate determination of E-values for all analyzed peptide-spectrum
matches is of paramount importance for the optimal performance of a CDD-based tool for FDR
control.
Post a Comment