Model-free prediction test with application to genomics data. 2022

Zhanrui Cai, and Jing Lei, and Kathryn Roeder
Department of Statistics, Iowa State University, Ames, IA 50011.

Testing the significance of predictors in a regression model is one of the most important topics in statistics. This problem is especially difficult without any parametric assumptions on the data. This paper aims to test the null hypothesis that given confounding variables Z, X does not significantly contribute to the prediction of Y under the model-free setting, where X and Z are possibly high dimensional. We propose a general framework that first fits nonparametric machine learning regression algorithms on [Formula: see text] and [Formula: see text], then compares the prediction power of the two models. The proposed method allows us to leverage the strength of the most powerful regression algorithms developed in the modern machine learning community. The P value for the test can be easily obtained by permutation. In simulations, we find that the proposed method is more powerful compared to existing methods. The proposed method allows us to draw biologically meaningful conclusions from two gene expression data analyses without strong distributional assumptions: 1) testing the prediction power of sequencing RNA for the proteins in cellular indexing of transcriptomes and epitopes by sequencing data and 2) identification of spatially variable genes in spatially resolved transcriptomics data.

UI MeSH Term Description Entries
D012044 Regression Analysis Procedures for finding the mathematical function which best describes the relationship between a dependent variable and one or more independent variables. In linear regression (see LINEAR MODELS) the relationship is constrained to be a straight line and LEAST-SQUARES ANALYSIS is used to determine the best fit. In logistic regression (see LOGISTIC MODELS) the dependent variable is qualitative rather than continuously variable and LIKELIHOOD FUNCTIONS are used to find the best relationship. In multiple regression, the dependent variable is considered to depend on more than a single independent variable. Regression Diagnostics,Statistical Regression,Analysis, Regression,Analyses, Regression,Diagnostics, Regression,Regression Analyses,Regression, Statistical,Regressions, Statistical,Statistical Regressions
D000069550 Machine Learning A type of ARTIFICIAL INTELLIGENCE that enable COMPUTERS to independently initiate and execute LEARNING when exposed to new data. Transfer Learning,Learning, Machine,Learning, Transfer
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D059467 Transcriptome The pattern of GENE EXPRESSION at the level of genetic transcription in a specific organism or under specific circumstances in specific cells. Transcriptomes,Gene Expression Profiles,Gene Expression Signatures,Transcriptome Profiles,Expression Profile, Gene,Expression Profiles, Gene,Expression Signature, Gene,Expression Signatures, Gene,Gene Expression Profile,Gene Expression Signature,Profile, Gene Expression,Profile, Transcriptome,Profiles, Gene Expression,Profiles, Transcriptome,Signature, Gene Expression,Signatures, Gene Expression,Transcriptome Profile
D023281 Genomics The systematic study of the complete DNA sequences (GENOME) of organisms. Included is construction of complete genetic, physical, and transcript maps, and the analysis of this structural genomic information on a global scale such as in GENOME WIDE ASSOCIATION STUDIES. Functional Genomics,Structural Genomics,Comparative Genomics,Genomics, Comparative,Genomics, Functional,Genomics, Structural

Related Publications

Zhanrui Cai, and Jing Lei, and Kathryn Roeder
April 2012, Statistics in biosciences,
Zhanrui Cai, and Jing Lei, and Kathryn Roeder
March 2018, Biometrics,
Zhanrui Cai, and Jing Lei, and Kathryn Roeder
May 2023, ISA transactions,
Zhanrui Cai, and Jing Lei, and Kathryn Roeder
September 2017, Human mutation,
Zhanrui Cai, and Jing Lei, and Kathryn Roeder
July 2009, Artificial intelligence in medicine,
Zhanrui Cai, and Jing Lei, and Kathryn Roeder
August 2018, Statistics in medicine,
Zhanrui Cai, and Jing Lei, and Kathryn Roeder
May 2024, BMC genomics,
Zhanrui Cai, and Jing Lei, and Kathryn Roeder
February 2019, BMC medical research methodology,
Zhanrui Cai, and Jing Lei, and Kathryn Roeder
December 2005, Acta crystallographica. Section D, Biological crystallography,
Copied contents to your clipboard!