MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry. 2007

Lijuan Mo, and Debojyoti Dutta, and Yunhu Wan, and Ting Chen
Department of Biology, Department of Mathematics, University of Southern California, Los Angeles, California 90089, USA.

Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways of analyzing MS/MS data are database search and de novo sequencing. In this paper, we present a new approach to peptide de novo sequencing, called MSNovo, which has the following advanced features. (1) It works on data generated from both LCQ and LTQ mass spectrometers and interprets singly, doubly, and triply charged ions. (2) It integrates a new probabilistic scoring function with a mass array-based dynamic programming algorithm. The simplicity of the scoring function, with only 6-10 parameters to be trained, avoids the problem of overfitting and allows MSNovo to be adopted for other machines and data sets easily. The mass array data structure explicitly encodes all possible peptides and allows the dynamic programming algorithm to find the best peptide. (3) Compared to existing programs, MSNovo predicts peptides as well as sequence tags with a higher accuracy, which is important for those applications that search protein databases using the de novo sequencing results. More specifically, we show that MSNovo outperforms other programs on various ESI ion trap data. We also show that for high-resolution data the performance of MSNovo improves significantly. Supporting Information, executable files and data sets can be found at http://msms.usc.edu/supplementary/msnovo.

UI MeSH Term Description Entries
D008970 Molecular Weight The sum of the weight of all the atoms in a molecule. Molecular Weights,Weight, Molecular,Weights, Molecular
D010455 Peptides Members of the class of compounds composed of AMINO ACIDS joined together by peptide bonds between adjacent amino acids into linear, branched or cyclical structures. OLIGOPEPTIDES are composed of approximately 2-12 amino acids. Polypeptides are composed of approximately 13 or more amino acids. PROTEINS are considered to be larger versions of peptides that can form into complex structures such as ENZYMES and RECEPTORS. Peptide,Polypeptide,Polypeptides
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D012680 Sensitivity and Specificity Binary classification measures to assess test results. Sensitivity or recall rate is the proportion of true positives. Specificity is the probability of correctly determining the absence of a condition. (From Last, Dictionary of Epidemiology, 2d ed) Specificity,Sensitivity,Specificity and Sensitivity
D015203 Reproducibility of Results The statistical reproducibility of measurements (often in a clinical context), including the testing of instrumentation or techniques to obtain reproducible results. The concept includes reproducibility of physiological measurements, which may be used to develop rules to assess probability or prognosis, or response to a stimulus; reproducibility of occurrence of a condition; and reproducibility of experimental results. Reliability and Validity,Reliability of Result,Reproducibility Of Result,Reproducibility of Finding,Validity of Result,Validity of Results,Face Validity,Reliability (Epidemiology),Reliability of Results,Reproducibility of Findings,Test-Retest Reliability,Validity (Epidemiology),Finding Reproducibilities,Finding Reproducibility,Of Result, Reproducibility,Of Results, Reproducibility,Reliabilities, Test-Retest,Reliability, Test-Retest,Result Reliabilities,Result Reliability,Result Validities,Result Validity,Result, Reproducibility Of,Results, Reproducibility Of,Test Retest Reliability,Validity and Reliability,Validity, Face
D016208 Databases, Factual Extensive collections, reputedly complete, of facts and data garnered from material of a specialized subject area and made available for analysis and application. The collection can be automated by various contemporary methods for retrieval. The concept should be differentiated from DATABASES, BIBLIOGRAPHIC which is restricted to collections of bibliographic references. Databanks, Factual,Data Banks, Factual,Data Bases, Factual,Data Bank, Factual,Data Base, Factual,Databank, Factual,Database, Factual,Factual Data Bank,Factual Data Banks,Factual Data Base,Factual Data Bases,Factual Databank,Factual Databanks,Factual Database,Factual Databases
D017421 Sequence Analysis A multistage process that includes the determination of a sequence (protein, carbohydrate, etc.), its fragmentation and analysis, and the interpretation of the resulting sequence information. Sequence Determination,Analysis, Sequence,Determination, Sequence,Determinations, Sequence,Sequence Determinations,Analyses, Sequence,Sequence Analyses
D053719 Tandem Mass Spectrometry A mass spectrometry technique using two (MS/MS) or more mass analyzers. With two in tandem, the precursor ions are mass-selected by a first mass analyzer, and focused into a collision region where they are then fragmented into product ions which are then characterized by a second mass analyzer. A variety of techniques are used to separate the compounds, ionize them, and introduce them to the first mass analyzer. For example, for in GC-MS/MS, GAS CHROMATOGRAPHY-MASS SPECTROMETRY is involved in separating relatively small compounds by GAS CHROMATOGRAPHY prior to injecting them into an ionization chamber for the mass selection. Mass Spectrometry-Mass Spectrometry,Mass Spectrometry Mass Spectrometry,Mass Spectrometry, Tandem
D019295 Computational Biology A field of biology concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories for solving biological problems including manipulation of models and datasets. Bioinformatics,Molecular Biology, Computational,Bio-Informatics,Biology, Computational,Computational Molecular Biology,Bio Informatics,Bio-Informatic,Bioinformatic,Biologies, Computational Molecular,Biology, Computational Molecular,Computational Molecular Biologies,Molecular Biologies, Computational

Related Publications

Lijuan Mo, and Debojyoti Dutta, and Yunhu Wan, and Ting Chen
January 2001, Journal of computational biology : a journal of computational molecular cell biology,
Lijuan Mo, and Debojyoti Dutta, and Yunhu Wan, and Ting Chen
January 2003, Journal of computational biology : a journal of computational molecular cell biology,
Lijuan Mo, and Debojyoti Dutta, and Yunhu Wan, and Ting Chen
January 1999, Journal of computational biology : a journal of computational molecular cell biology,
Lijuan Mo, and Debojyoti Dutta, and Yunhu Wan, and Ting Chen
November 2002, Molecular biotechnology,
Lijuan Mo, and Debojyoti Dutta, and Yunhu Wan, and Ting Chen
January 2000, Methods in molecular biology (Clifton, N.J.),
Lijuan Mo, and Debojyoti Dutta, and Yunhu Wan, and Ting Chen
January 1997, Rapid communications in mass spectrometry : RCM,
Lijuan Mo, and Debojyoti Dutta, and Yunhu Wan, and Ting Chen
January 2015, Mass spectrometry reviews,
Lijuan Mo, and Debojyoti Dutta, and Yunhu Wan, and Ting Chen
January 2003, Rapid communications in mass spectrometry : RCM,
Lijuan Mo, and Debojyoti Dutta, and Yunhu Wan, and Ting Chen
February 2007, Analytical chemistry,
Lijuan Mo, and Debojyoti Dutta, and Yunhu Wan, and Ting Chen
June 2001, Analytical chemistry,
Copied contents to your clipboard!