Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. 2000

E W Steyerberg, and M J Eijkemans, and F E Harrell, and J D Habbema
Center for Clinical Decision Sciences, Department of Public Health, Erasmus University, Rotterdam, The Netherlands. steyerberg@mgz.fgg.eur.nl

Logistic regression analysis may well be used to develop a prognostic model for a dichotomous outcome. Especially when limited data are available, it is difficult to determine an appropriate selection of covariables for inclusion in such models. Also, predictions may be improved by applying some sort of shrinkage in the estimation of regression coefficients. In this study we compare the performance of several selection and shrinkage methods in small data sets of patients with acute myocardial infarction, where we aim to predict 30-day mortality. Selection methods included backward stepwise selection with significance levels alpha of 0.01, 0.05, 0. 157 (the AIC criterion) or 0.50, and the use of qualitative external information on the sign of regression coefficients in the model. Estimation methods included standard maximum likelihood, the use of a linear shrinkage factor, penalized maximum likelihood, the Lasso, or quantitative external information on univariable regression coefficients. We found that stepwise selection with a low alpha (for example, 0.05) led to a relatively poor model performance, when evaluated on independent data. Substantially better performance was obtained with full models with a limited number of important predictors, where regression coefficients were reduced with any of the shrinkage methods. Incorporation of external information for selection and estimation improved the stability and quality of the prognostic models. We therefore recommend shrinkage methods in full models including prespecified predictors and incorporation of external information, when prognostic models are constructed in small data sets.

UI MeSH Term Description Entries
D008297 Male Males
D009203 Myocardial Infarction NECROSIS of the MYOCARDIUM caused by an obstruction of the blood supply to the heart (CORONARY CIRCULATION). Cardiovascular Stroke,Heart Attack,Myocardial Infarct,Cardiovascular Strokes,Heart Attacks,Infarct, Myocardial,Infarction, Myocardial,Infarctions, Myocardial,Infarcts, Myocardial,Myocardial Infarctions,Myocardial Infarcts,Stroke, Cardiovascular,Strokes, Cardiovascular
D011379 Prognosis A prediction of the probable outcome of a disease based on a individual's condition and the usual course of the disease as seen in similar situations. Prognostic Factor,Prognostic Factors,Factor, Prognostic,Factors, Prognostic,Prognoses
D012044 Regression Analysis Procedures for finding the mathematical function which best describes the relationship between a dependent variable and one or more independent variables. In linear regression (see LINEAR MODELS) the relationship is constrained to be a straight line and LEAST-SQUARES ANALYSIS is used to determine the best fit. In logistic regression (see LOGISTIC MODELS) the dependent variable is qualitative rather than continuously variable and LIKELIHOOD FUNCTIONS are used to find the best relationship. In multiple regression, the dependent variable is considered to depend on more than a single independent variable. Regression Diagnostics,Statistical Regression,Analysis, Regression,Analyses, Regression,Diagnostics, Regression,Regression Analyses,Regression, Statistical,Regressions, Statistical,Statistical Regressions
D005260 Female Females
D005544 Forecasting The prediction or projection of the nature of future problems or existing conditions based upon the extrapolation or interpretation of existing scientific data or by the application of scientific methodology. Futurology,Projections and Predictions,Future,Predictions and Projections
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000367 Age Factors Age as a constituent element or influence contributing to the production of a result. It may be applicable to the cause or the effect of a circumstance. It is used with human or animal concepts but should be differentiated from AGING, a physiological process, and TIME FACTORS which refers only to the passage of time. Age Reporting,Age Factor,Factor, Age,Factors, Age
D000368 Aged A person 65 years of age or older. For a person older than 79 years, AGED, 80 AND OVER is available. Elderly
D012307 Risk Factors An aspect of personal behavior or lifestyle, environmental exposure, inborn or inherited characteristic, which, based on epidemiological evidence, is known to be associated with a health-related condition considered important to prevent. Health Correlates,Risk Factor Scores,Risk Scores,Social Risk Factors,Population at Risk,Populations at Risk,Correlates, Health,Factor, Risk,Factor, Social Risk,Factors, Social Risk,Risk Factor,Risk Factor Score,Risk Factor, Social,Risk Factors, Social,Risk Score,Score, Risk,Score, Risk Factor,Social Risk Factor

Related Publications

E W Steyerberg, and M J Eijkemans, and F E Harrell, and J D Habbema
June 2023, Journal of epidemiology,
E W Steyerberg, and M J Eijkemans, and F E Harrell, and J D Habbema
October 2023, Behavior research methods,
E W Steyerberg, and M J Eijkemans, and F E Harrell, and J D Habbema
February 2006, Statistics in medicine,
E W Steyerberg, and M J Eijkemans, and F E Harrell, and J D Habbema
March 2005, Statistics in medicine,
E W Steyerberg, and M J Eijkemans, and F E Harrell, and J D Habbema
March 2010, Statistics in medicine,
E W Steyerberg, and M J Eijkemans, and F E Harrell, and J D Habbema
June 2021, Biometrics,
E W Steyerberg, and M J Eijkemans, and F E Harrell, and J D Habbema
November 2004, Statistics in medicine,
E W Steyerberg, and M J Eijkemans, and F E Harrell, and J D Habbema
January 2019, IEEE/ACM transactions on computational biology and bioinformatics,
E W Steyerberg, and M J Eijkemans, and F E Harrell, and J D Habbema
May 2001, Applied and environmental microbiology,
E W Steyerberg, and M J Eijkemans, and F E Harrell, and J D Habbema
August 2019, Computational statistics & data analysis,
Copied contents to your clipboard!