Diagnosing and Handling Common Violations of Missing at Random. 2023

Feng Ji, and Sophia Rabe-Hesketh, and Anders Skrondal
University of California, Berkeley University of Toronto, Berkeley, USA.

Ignorable likelihood (IL) approaches are often used to handle missing data when estimating a multivariate model, such as a structural equation model. In this case, the likelihood is based on all available data, and no model is specified for the missing data mechanism. Inference proceeds via maximum likelihood or Bayesian methods, including multiple imputation without auxiliary variables. Such IL approaches are valid under a missing at random (MAR) assumption. Rabe-Hesketh and Skrondal (Ignoring non-ignorable missingness. Presidential Address at the International Meeting of the Psychometric Society, Beijing, China, 2015; Psychometrika, 2023) consider a violation of MAR where a variable A can affect missingness of another variable B also when A is not observed. They show that this case can be handled by discarding more data before proceeding with IL approaches. This data-deletion approach is similar to the sequential estimation of Mohan et al. (in: Advances in neural information processing systems, 2013) based on their ordered factorization theorem but is preferable for parametric models. Which kind of data-deletion or ordered factorization to employ depends on the nature of the MAR violation. In this article, we therefore propose two diagnostic tests, a likelihood-ratio test for a heteroscedastic regression model and a kernel conditional independence test. We also develop a test-based estimator that first uses diagnostic tests to determine which MAR violation appears to be present and then proceeds with the corresponding data-deletion estimator. Simulations show that the test-based estimator outperforms IL when the missing data problem is severe and performs similarly otherwise.

UI MeSH Term Description Entries
D008962 Models, Theoretical Theoretical representations that simulate the behavior or activity of systems, processes, or phenomena. They include the use of mathematical equations, computers, and other electronic equipment. Experimental Model,Experimental Models,Mathematical Model,Model, Experimental,Models (Theoretical),Models, Experimental,Models, Theoretic,Theoretical Study,Mathematical Models,Model (Theoretical),Model, Mathematical,Model, Theoretical,Models, Mathematical,Studies, Theoretical,Study, Theoretical,Theoretical Model,Theoretical Models,Theoretical Studies
D011594 Psychometrics Assessment of psychological variables by the application of mathematical procedures. Psychometric
D001499 Bayes Theorem A theorem in probability theory named for Thomas Bayes (1702-1761). In epidemiology, it is used to obtain the probability of disease in a group of people with some characteristic on the basis of the overall rate of that disease and of the likelihood of that characteristic in healthy and diseased individuals. The most familiar application is in clinical decision analysis where it is used for estimating the probability of a particular diagnosis given the appearance of some symptoms or test result. Bayesian Analysis,Bayesian Estimation,Bayesian Forecast,Bayesian Method,Bayesian Prediction,Analysis, Bayesian,Bayesian Approach,Approach, Bayesian,Approachs, Bayesian,Bayesian Approachs,Estimation, Bayesian,Forecast, Bayesian,Method, Bayesian,Prediction, Bayesian,Theorem, Bayes
D015233 Models, Statistical Statistical formulations or analyses which, when applied to data and found to fit the data, are then used to verify the assumptions and parameters used in the analysis. Examples of statistical models are the linear model, binomial model, polynomial model, two-parameter model, etc. Probabilistic Models,Statistical Models,Two-Parameter Models,Model, Statistical,Models, Binomial,Models, Polynomial,Statistical Model,Binomial Model,Binomial Models,Model, Binomial,Model, Polynomial,Model, Probabilistic,Model, Two-Parameter,Models, Probabilistic,Models, Two-Parameter,Polynomial Model,Polynomial Models,Probabilistic Model,Two Parameter Models,Two-Parameter Model
D016013 Likelihood Functions Functions constructed from a statistical model and a set of observed data which give the probability of that data for various values of the unknown model parameters. Those parameter values that maximize the probability are the maximum likelihood estimates of the parameters. Likelihood Ratio Test,Maximum Likelihood Estimates,Estimate, Maximum Likelihood,Estimates, Maximum Likelihood,Function, Likelihood,Functions, Likelihood,Likelihood Function,Maximum Likelihood Estimate,Test, Likelihood Ratio

Related Publications

Feng Ji, and Sophia Rabe-Hesketh, and Anders Skrondal
January 2014, Multivariate behavioral research,
Feng Ji, and Sophia Rabe-Hesketh, and Anders Skrondal
December 2022, Behavior research methods,
Feng Ji, and Sophia Rabe-Hesketh, and Anders Skrondal
March 2011, Journal of biopharmaceutical statistics,
Feng Ji, and Sophia Rabe-Hesketh, and Anders Skrondal
July 2022, American journal of orthodontics and dentofacial orthopedics : official publication of the American Association of Orthodontists, its constituent societies, and the American Board of Orthodontics,
Feng Ji, and Sophia Rabe-Hesketh, and Anders Skrondal
August 2014, International journal of epidemiology,
Feng Ji, and Sophia Rabe-Hesketh, and Anders Skrondal
October 2020, Statistical methods in medical research,
Feng Ji, and Sophia Rabe-Hesketh, and Anders Skrondal
October 2021, Psychological methods,
Feng Ji, and Sophia Rabe-Hesketh, and Anders Skrondal
October 2021, International journal of rheumatic diseases,
Feng Ji, and Sophia Rabe-Hesketh, and Anders Skrondal
February 2022, Biometrika,
Feng Ji, and Sophia Rabe-Hesketh, and Anders Skrondal
January 2015, Addiction (Abingdon, England),
Copied contents to your clipboard!