Power and sample size estimation in microarray studies. 2010

Wei-Jiun Lin, and Huey-Miin Hsueh, and James J Chen
Division of Personalized Nutrition and Medicine, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA.

BACKGROUND Before conducting a microarray experiment, one important issue that needs to be determined is the number of arrays required in order to have adequate power to identify differentially expressed genes. This paper discusses some crucial issues in the problem formulation, parameter specifications, and approaches that are commonly proposed for sample size estimation in microarray experiments. Common methods for sample size estimation are formulated as the minimum sample size necessary to achieve a specified sensitivity (proportion of detected truly differentially expressed genes) on average at a specified false discovery rate (FDR) level and specified expected proportion (pi1) of the true differentially expression genes in the array. Unfortunately, the probability of detecting the specified sensitivity in such a formulation can be low. We formulate the sample size problem as the number of arrays needed to achieve a specified sensitivity with 95% probability at the specified significance level. A permutation method using a small pilot dataset to estimate sample size is proposed. This method accounts for correlation and effect size heterogeneity among genes. RESULTS A sample size estimate based on the common formulation, to achieve the desired sensitivity on average, can be calculated using a univariate method without taking the correlation among genes into consideration. This formulation of sample size problem is inadequate because the probability of detecting the specified sensitivity can be lower than 50%. On the other hand, the needed sample size calculated by the proposed permutation method will ensure detecting at least the desired sensitivity with 95% probability. The method is shown to perform well for a real example dataset using a small pilot dataset with 4-6 samples per group. CONCLUSIONS We recommend that the sample size problem should be formulated to detect a specified proportion of differentially expressed genes with 95% probability. This formulation ensures finding the desired proportion of true positives with high probability. The proposed permutation method takes the correlation structure and effect size heterogeneity into consideration and works well using only a small pilot dataset.

UI MeSH Term Description Entries
D005189 False Positive Reactions Positive test results in subjects who do not possess the attribute for which the test is conducted. The labeling of healthy persons as diseased when screening in the detection of disease. (Last, A Dictionary of Epidemiology, 2d ed) False Positive Reaction,Positive Reaction, False,Positive Reactions, False,Reaction, False Positive,Reactions, False Positive
D018401 Sample Size The number of units (persons, animals, patients, specified circumstances, etc.) in a population to be studied. The sample size should be big enough to have a high likelihood of detecting a true difference between two groups. (From Wassertheil-Smoller, Biostatistics and Epidemiology, 1990, p95) Sample Sizes,Size, Sample,Sizes, Sample
D019295 Computational Biology A field of biology concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories for solving biological problems including manipulation of models and datasets. Bioinformatics,Molecular Biology, Computational,Bio-Informatics,Biology, Computational,Computational Molecular Biology,Bio Informatics,Bio-Informatic,Bioinformatic,Biologies, Computational Molecular,Biology, Computational Molecular,Computational Molecular Biologies,Molecular Biologies, Computational
D020411 Oligonucleotide Array Sequence Analysis Hybridization of a nucleic acid sample to a very large set of OLIGONUCLEOTIDE PROBES, which have been attached individually in columns and rows to a solid support, to determine a BASE SEQUENCE, or to detect variations in a gene sequence, GENE EXPRESSION, or for GENE MAPPING. DNA Microarrays,Gene Expression Microarray Analysis,Oligonucleotide Arrays,cDNA Microarrays,DNA Arrays,DNA Chips,DNA Microchips,Gene Chips,Oligodeoxyribonucleotide Array Sequence Analysis,Oligonucleotide Microarrays,Sequence Analysis, Oligonucleotide Array,cDNA Arrays,Array, DNA,Array, Oligonucleotide,Array, cDNA,Arrays, DNA,Arrays, Oligonucleotide,Arrays, cDNA,Chip, DNA,Chip, Gene,Chips, DNA,Chips, Gene,DNA Array,DNA Chip,DNA Microarray,DNA Microchip,Gene Chip,Microarray, DNA,Microarray, Oligonucleotide,Microarray, cDNA,Microarrays, DNA,Microarrays, Oligonucleotide,Microarrays, cDNA,Microchip, DNA,Microchips, DNA,Oligonucleotide Array,Oligonucleotide Microarray,cDNA Array,cDNA Microarray
D020869 Gene Expression Profiling The determination of the pattern of genes expressed at the level of GENETIC TRANSCRIPTION, under specific circumstances or in a specific cell. Gene Expression Analysis,Gene Expression Pattern Analysis,Transcript Expression Analysis,Transcriptome Profiling,Transcriptomics,mRNA Differential Display,Gene Expression Monitoring,Transcriptome Analysis,Analyses, Gene Expression,Analyses, Transcript Expression,Analyses, Transcriptome,Analysis, Gene Expression,Analysis, Transcript Expression,Analysis, Transcriptome,Differential Display, mRNA,Differential Displays, mRNA,Expression Analyses, Gene,Expression Analysis, Gene,Gene Expression Analyses,Gene Expression Monitorings,Gene Expression Profilings,Monitoring, Gene Expression,Monitorings, Gene Expression,Profiling, Gene Expression,Profiling, Transcriptome,Profilings, Gene Expression,Profilings, Transcriptome,Transcript Expression Analyses,Transcriptome Analyses,Transcriptome Profilings,mRNA Differential Displays

Related Publications

Wei-Jiun Lin, and Huey-Miin Hsueh, and James J Chen
January 2012, Journal of biopharmaceutical statistics,
Wei-Jiun Lin, and Huey-Miin Hsueh, and James J Chen
December 2002, Statistics in medicine,
Wei-Jiun Lin, and Huey-Miin Hsueh, and James J Chen
January 2012, Journal of human reproductive sciences,
Wei-Jiun Lin, and Huey-Miin Hsueh, and James J Chen
December 2003, Physiological genomics,
Wei-Jiun Lin, and Huey-Miin Hsueh, and James J Chen
August 2014, Bioinformatics (Oxford, England),
Wei-Jiun Lin, and Huey-Miin Hsueh, and James J Chen
January 2015, Journal of human reproductive sciences,
Wei-Jiun Lin, and Huey-Miin Hsueh, and James J Chen
March 2007, Journal of preventive medicine and public health = Yebang Uihakhoe chi,
Wei-Jiun Lin, and Huey-Miin Hsueh, and James J Chen
January 2022, Statistical applications in genetics and molecular biology,
Wei-Jiun Lin, and Huey-Miin Hsueh, and James J Chen
January 2016, BMC genomics,
Wei-Jiun Lin, and Huey-Miin Hsueh, and James J Chen
September 2003, Emergency medicine journal : EMJ,
Copied contents to your clipboard!