De novo peptide identification via tandem mass spectrometry and integer linear optimization. 2007

Peter A DiMaggio, and Christodoulos A Floudas
Department of Chemical Engineering Princeton University, Princeton, New Jersey 08544-5263, USA.

A novel methodology for the automated de novo identification of peptides via integer linear optimization (also referred to as integer linear programming or ILP) and tandem mass spectrometry is presented in this article. The various features of the mathematical model are presented and examples are used to illustrate the key concepts of the proposed approach. A variety of challenging peptide identification problems, accompanied by a comparative study with five state-of-the-art methods, are examined to illustrate the proposed method's ability to address (a) residue-dependent fragmentation properties that result in missing ion peaks and (b) the variability of resolution in different mass analyzers. A preprocessing algorithm is utilized to identify important m/z values in the tandem mass spectrum. Missing peaks, due to residue-dependent fragmentation characteristics, are dealt with using a two-stage algorithmic framework. A cross-correlation approach is used to resolve missing amino acid assignments and to select the most probable peptide by comparing the theoretical spectra of the candidate sequences that were generated from the ILP sequencing stages with the experimental tandem mass spectrum. The novel, proposed de novo method, denoted as PILOT, is compared to existing popular methods such as Lutefisk, PEAKS, PepNovo, EigenMS, and NovoHMM for a set of spectra resulting from QTOF and ion trap instruments.

UI MeSH Term Description Entries
D010455 Peptides Members of the class of compounds composed of AMINO ACIDS joined together by peptide bonds between adjacent amino acids into linear, branched or cyclical structures. OLIGOPEPTIDES are composed of approximately 2-12 amino acids. Polypeptides are composed of approximately 13 or more amino acids. PROTEINS are considered to be larger versions of peptides that can form into complex structures such as ENZYMES and RECEPTORS. Peptide,Polypeptide,Polypeptides
D011382 Programming, Linear A technique of operations research for solving certain kinds of problems involving many variables where a best value or set of best values is to be found. It is most likely to be feasible when the quantity to be optimized, sometimes called the objective function, can be stated as a mathematical expression in terms of the various activities within the system, and when this expression is simply proportional to the measure of the activities, i.e., is linear, and when all the restrictions are also linear. It is different from computer programming, although problems using linear programming techniques may be programmed on a computer. Linear Programming
D012680 Sensitivity and Specificity Binary classification measures to assess test results. Sensitivity or recall rate is the proportion of true positives. Specificity is the probability of correctly determining the absence of a condition. (From Last, Dictionary of Epidemiology, 2d ed) Specificity,Sensitivity,Specificity and Sensitivity
D053719 Tandem Mass Spectrometry A mass spectrometry technique using two (MS/MS) or more mass analyzers. With two in tandem, the precursor ions are mass-selected by a first mass analyzer, and focused into a collision region where they are then fragmented into product ions which are then characterized by a second mass analyzer. A variety of techniques are used to separate the compounds, ionize them, and introduce them to the first mass analyzer. For example, for in GC-MS/MS, GAS CHROMATOGRAPHY-MASS SPECTROMETRY is involved in separating relatively small compounds by GAS CHROMATOGRAPHY prior to injecting them into an ionization chamber for the mass selection. Mass Spectrometry-Mass Spectrometry,Mass Spectrometry Mass Spectrometry,Mass Spectrometry, Tandem
D019992 Databases as Topic Works on organized collections of records, standardized in format and content, that are stored in any of a variety of computer-readable modes. Data Banks as Topic,Data Bases as Topic,Databanks as Topic

Related Publications

Peter A DiMaggio, and Christodoulos A Floudas
January 1999, Journal of computational biology : a journal of computational molecular cell biology,
Peter A DiMaggio, and Christodoulos A Floudas
January 2003, Journal of computational biology : a journal of computational molecular cell biology,
Peter A DiMaggio, and Christodoulos A Floudas
November 2002, Molecular biotechnology,
Peter A DiMaggio, and Christodoulos A Floudas
January 2000, Methods in molecular biology (Clifton, N.J.),
Peter A DiMaggio, and Christodoulos A Floudas
January 1997, Rapid communications in mass spectrometry : RCM,
Peter A DiMaggio, and Christodoulos A Floudas
January 2001, Journal of computational biology : a journal of computational molecular cell biology,
Peter A DiMaggio, and Christodoulos A Floudas
January 2015, Mass spectrometry reviews,
Peter A DiMaggio, and Christodoulos A Floudas
July 2007, Analytical chemistry,
Peter A DiMaggio, and Christodoulos A Floudas
January 2007, Journal of proteome research,
Peter A DiMaggio, and Christodoulos A Floudas
January 2003, Rapid communications in mass spectrometry : RCM,
Copied contents to your clipboard!