Benchmark study of feature selection strategies for multi-omics data. 2022

Yingxia Li, and Ulrich Mansmann, and Shangming Du, and Roman Hornung
Institute for Medical Information Processing, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, 81377, Munich, Germany. yingxiali@ibe.med.uni-muenchen.de.

BACKGROUND In the last few years, multi-omics data, that is, datasets containing different types of high-dimensional molecular variables for the same samples, have become increasingly available. To date, several comparison studies focused on feature selection methods for omics data, but to our knowledge, none compared these methods for the special case of multi-omics data. Given that these data have specific structures that differentiate them from single-omics data, it is unclear whether different feature selection strategies may be optimal for such data. In this paper, using 15 cancer multi-omics datasets we compared four filter methods, two embedded methods, and two wrapper methods with respect to their performance in the prediction of a binary outcome in several situations that may affect the prediction results. As classifiers, we used support vector machines and random forests. The methods were compared using repeated fivefold cross-validation. The accuracy, the AUC, and the Brier score served as performance metrics. RESULTS The results suggested that, first, the chosen number of selected features affects the predictive performance for many feature selection methods but not all. Second, whether the features were selected by data type or from all data types concurrently did not considerably affect the predictive performance, but for some methods, concurrent selection took more time. Third, regardless of which performance measure was considered, the feature selection methods mRMR, the permutation importance of random forests, and the Lasso tended to outperform the other considered methods. Here, mRMR and the permutation importance of random forests already delivered strong predictive performance when considering only a few selected features. Finally, the wrapper methods were computationally much more expensive than the filter and embedded methods. CONCLUSIONS We recommend the permutation importance of random forests and the filter method mRMR for feature selection using multi-omics data, where, however, mRMR is considerably more computationally costly.

UI MeSH Term Description Entries
D009369 Neoplasms New abnormal growth of tissue. Malignant neoplasms show a greater degree of anaplasia and have the properties of invasion and metastasis, compared to benign neoplasms. Benign Neoplasm,Cancer,Malignant Neoplasm,Tumor,Tumors,Benign Neoplasms,Malignancy,Malignant Neoplasms,Neoplasia,Neoplasm,Neoplasms, Benign,Cancers,Malignancies,Neoplasias,Neoplasm, Benign,Neoplasm, Malignant,Neoplasms, Malignant
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D060388 Support Vector Machine SUPERVISED MACHINE LEARNING algorithm which learns to assign labels to objects from a set of training examples. Examples are learning to recognize fraudulent credit card activity by examining hundreds or thousands of fraudulent and non-fraudulent credit card activity, or learning to make disease diagnosis or prognosis based on automatic classification of microarray gene expression profiles drawn from hundreds or thousands of samples. Support Vector Network,Machine, Support Vector,Machines, Support Vector,Network, Support Vector,Networks, Support Vector,Support Vector Machines,Support Vector Networks,Vector Machine, Support,Vector Machines, Support,Vector Network, Support,Vector Networks, Support
D019985 Benchmarking Method of measuring performance against established standards of best practice. Benchmarking, Health Care,Benchmarks,Best Practice Analysis,Metrics,Benchmark,Benchmarking, Healthcare,Analysis, Best Practice,Health Care Benchmarking,Healthcare Benchmarking

Related Publications

Yingxia Li, and Ulrich Mansmann, and Shangming Du, and Roman Hornung
July 2022, Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference,
Yingxia Li, and Ulrich Mansmann, and Shangming Du, and Roman Hornung
March 2022, Bioinformatics (Oxford, England),
Yingxia Li, and Ulrich Mansmann, and Shangming Du, and Roman Hornung
February 2022, BMC medical genomics,
Yingxia Li, and Ulrich Mansmann, and Shangming Du, and Roman Hornung
May 2021, Briefings in bioinformatics,
Yingxia Li, and Ulrich Mansmann, and Shangming Du, and Roman Hornung
August 2022, Genome biology,
Yingxia Li, and Ulrich Mansmann, and Shangming Du, and Roman Hornung
January 2014, PloS one,
Yingxia Li, and Ulrich Mansmann, and Shangming Du, and Roman Hornung
September 2018, BMC medical genomics,
Yingxia Li, and Ulrich Mansmann, and Shangming Du, and Roman Hornung
January 2017, IEEE transactions on nanobioscience,
Yingxia Li, and Ulrich Mansmann, and Shangming Du, and Roman Hornung
January 2017, PloS one,
Yingxia Li, and Ulrich Mansmann, and Shangming Du, and Roman Hornung
January 2021, Computational and structural biotechnology journal,
Copied contents to your clipboard!