Statistical data processing in clinical proteomics. 2008

Suzanne Smit, and Huub C J Hoefsloot, and Age K Smilde
Swammerdam Institute for Life Sciences, Universiteit van Amsterdam - Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands. ssmit@science.uva.nl

This review discusses data analysis strategies for the discovery of biomarkers in clinical proteomics. Proteomics studies produce large amounts of data, characterized by few samples of which many variables are measured. A wealth of classification methods exists for extracting information from the data. Feature selection plays an important role in reducing the dimensionality of the data prior to classification and in discovering biomarker leads. The question which classification strategy works best is yet unanswered. Validation is a crucial step for biomarker leads towards clinical use. Here we only discuss statistical validation, recognizing that biological and clinical validation is of utmost importance. First, there is the need for validated model selection to develop a generalized classifier that predicts new samples correctly. A cross-validation loop that is wrapped around the model development procedure assesses the performance using unseen data. The significance of the model should be tested; we use permutations of the data for comparison with uninformative data. This procedure also tests the correctness of the performance validation. Preferably, a new set of samples is measured to test the classifier and rule out results specific for a machine, analyst, laboratory or the first set of samples. This is not yet standard practice. We present a modular framework that combines feature selection, classification, biomarker discovery and statistical validation; these data analysis aspects are all discussed in this review. The feature selection, classification and biomarker discovery modules can be incorporated or omitted to the preference of the researcher. The validation modules, however, should not be optional. In each module, the researcher can select from a wide range of methods, since there is not one unique way that leads to the correct model and proper validation. We discuss many possibilities for feature selection, classification and biomarker discovery. For validation we advice a combination of cross-validation and permutation testing, a validation strategy supported in the literature.

UI MeSH Term Description Entries
D003627 Data Interpretation, Statistical Application of statistical procedures to analyze specific observed or assumed facts from a particular study. Data Analysis, Statistical,Data Interpretations, Statistical,Interpretation, Statistical Data,Statistical Data Analysis,Statistical Data Interpretation,Analyses, Statistical Data,Analysis, Statistical Data,Data Analyses, Statistical,Interpretations, Statistical Data,Statistical Data Analyses,Statistical Data Interpretations
D040901 Proteomics The systematic study of the complete complement of proteins (PROTEOME) of organisms. Peptidomics

Related Publications

Suzanne Smit, and Huub C J Hoefsloot, and Age K Smilde
January 2014, Methods in molecular biology (Clifton, N.J.),
Suzanne Smit, and Huub C J Hoefsloot, and Age K Smilde
June 2009, Expert review of proteomics,
Suzanne Smit, and Huub C J Hoefsloot, and Age K Smilde
January 1963, Mikrobiolohichnyi zhurnal,
Suzanne Smit, and Huub C J Hoefsloot, and Age K Smilde
September 2021, Briefings in bioinformatics,
Suzanne Smit, and Huub C J Hoefsloot, and Age K Smilde
January 1976, Methods of information in medicine. Supplement,
Suzanne Smit, and Huub C J Hoefsloot, and Age K Smilde
April 2008, Molecular & cellular proteomics : MCP,
Suzanne Smit, and Huub C J Hoefsloot, and Age K Smilde
October 2015, Journal of proteome research,
Suzanne Smit, and Huub C J Hoefsloot, and Age K Smilde
August 2015, Proteomics,
Suzanne Smit, and Huub C J Hoefsloot, and Age K Smilde
May 1971, Deutsches medizinisches Journal,
Suzanne Smit, and Huub C J Hoefsloot, and Age K Smilde
September 1970, Veterinariia,
Copied contents to your clipboard!