Regularized sandwich estimators for analysis of high-dimensional data using generalized estimating equations. 2011

David I Warton
School of Mathematics and Statistics and Evolution and Ecology Research Centre, The University of New South Wales, NSW 2052, Australia. David.Warton@unsw.edu.au

A modification of generalized estimating equations (GEEs) methodology is proposed for hypothesis testing of high-dimensional data, with particular interest in multivariate abundance data in ecology, an important application of interest in thousands of environmental science studies. Such data are typically counts characterized by high dimensionality (in the sense that cluster size exceeds number of clusters, n>K) and over-dispersion relative to the Poisson distribution. Usual GEE methods cannot be applied in this setting primarily because sandwich estimators become numerically unstable as n increases. We propose instead using a regularized sandwich estimator that assumes a common correlation matrix R, and shrinks the sample estimate of R toward the working correlation matrix to improve its numerical stability. It is shown via theory and simulation that this substantially improves the power of Wald statistics when cluster size is not small. We apply the proposed approach to study the effects of nutrient addition on nematode communities, and in doing so discuss important issues in implementation, such as using statistics that have good properties when parameter estimates approach the boundary (), and using resampling to enable valid inference that is robust to high dimensionality and to possible model misspecification.

UI MeSH Term Description Entries
D009348 Nematoda A phylum of unsegmented helminths with fundamental bilateral symmetry and secondary triradiate symmetry of the oral and esophageal structures. Many species are parasites. Phasmidia,Secernentea,Sipunculida
D011158 Population Growth Increase, over a specific period of time, in the number of individuals living in a country or region. Population Explosion,Baby Boom,Baby Bust,High Fertility Population,Natural Increase,Past Trends,Population Growth and Natural Resources,Population Size and Growth,Zero Population Growth,Baby Booms,Baby Busts,Explosion, Population,Explosions, Population,Growth, Population,High Fertility Populations,Increase, Natural,Increases, Natural,Natural Increases,Past Trend,Population Explosions,Population, High Fertility,Populations, High Fertility,Trend, Past,Trends, Past
D003198 Computer Simulation Computer-based representation of physical systems and phenomena such as chemical processes. Computational Modeling,Computational Modelling,Computer Models,In silico Modeling,In silico Models,In silico Simulation,Models, Computer,Computerized Models,Computer Model,Computer Simulations,Computerized Model,In silico Model,Model, Computer,Model, Computerized,Model, In silico,Modeling, Computational,Modeling, In silico,Modelling, Computational,Simulation, Computer,Simulation, In silico,Simulations, Computer
D003627 Data Interpretation, Statistical Application of statistical procedures to analyze specific observed or assumed facts from a particular study. Data Analysis, Statistical,Data Interpretations, Statistical,Interpretation, Statistical Data,Statistical Data Analysis,Statistical Data Interpretation,Analyses, Statistical Data,Analysis, Statistical Data,Data Analyses, Statistical,Interpretations, Statistical Data,Statistical Data Analyses,Statistical Data Interpretations
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D001699 Biometry The use of statistical and mathematical methods to analyze biological observations and phenomena. Biometric Analysis,Biometrics,Analyses, Biometric,Analysis, Biometric,Biometric Analyses
D015233 Models, Statistical Statistical formulations or analyses which, when applied to data and found to fit the data, are then used to verify the assumptions and parameters used in the analysis. Examples of statistical models are the linear model, binomial model, polynomial model, two-parameter model, etc. Probabilistic Models,Statistical Models,Two-Parameter Models,Model, Statistical,Models, Binomial,Models, Polynomial,Statistical Model,Binomial Model,Binomial Models,Model, Binomial,Model, Polynomial,Model, Probabilistic,Model, Two-Parameter,Models, Probabilistic,Models, Two-Parameter,Polynomial Model,Polynomial Models,Probabilistic Model,Two Parameter Models,Two-Parameter Model
D017753 Ecosystem A functional system which includes the organisms of a natural community together with their environment. (McGraw Hill Dictionary of Scientific and Technical Terms, 4th ed) Ecosystems,Biome,Ecologic System,Ecologic Systems,Ecological System,Habitat,Niche, Ecological,System, Ecological,Systems, Ecological,Biomes,Ecological Niche,Ecological Systems,Habitats,System, Ecologic,Systems, Ecologic
Copied contents to your clipboard!