Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements. 2011

Emma J Cooke, and Richard S Savage, and Paul D W Kirk, and Robert Darkins, and David L Wild
Systems Biology Centre, University of Warwick, Coventry, UK.

BACKGROUND Post-genomic molecular biology has resulted in an explosion of data, providing measurements for large numbers of genes, proteins and metabolites. Time series experiments have become increasingly common, necessitating the development of novel analysis tools that capture the resulting data structure. Outlier measurements at one or more time points present a significant challenge, while potentially valuable replicate information is often ignored by existing techniques. RESULTS We present a generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data. By using a mixture model likelihood, our method permits a small proportion of the data to be modelled as outlier measurements, and adopts an empirical Bayes approach which uses replicate observations to inform a prior distribution of the noise variance. The method automatically learns the optimum number of clusters and can incorporate non-uniformly sampled time points. Using a wide variety of experimental data sets, we show that our algorithm consistently yields higher quality and more biologically meaningful clusters than current state-of-the-art methodologies. We highlight the importance of modelling outlier values by demonstrating that noisy genes can be grouped with other genes of similar biological function. We demonstrate the importance of including replicate information, which we find enables the discrimination of additional distinct expression profiles. CONCLUSIONS By incorporating outlier measurements and replicate values, this clustering algorithm for time series microarray data provides a step towards a better treatment of the noise inherent in measurements from high-throughput genomic technologies. Timeseries BHC is available as part of the R package 'BHC' (version 1.5), which is available for download from Bioconductor (version 2.9 and above) via http://www.bioconductor.org/packages/release/bioc/html/BHC.html?pagewanted=all.

UI MeSH Term Description Entries
D008954 Models, Biological Theoretical representations that simulate the behavior or activity of biological processes or diseases. For disease models in living animals, DISEASE MODELS, ANIMAL is available. Biological models include the use of mathematical equations, computers, and other electronic equipment. Biological Model,Biological Models,Model, Biological,Models, Biologic,Biologic Model,Biologic Models,Model, Biologic
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D001499 Bayes Theorem A theorem in probability theory named for Thomas Bayes (1702-1761). In epidemiology, it is used to obtain the probability of disease in a group of people with some characteristic on the basis of the overall rate of that disease and of the likelihood of that characteristic in healthy and diseased individuals. The most familiar application is in clinical decision analysis where it is used for estimating the probability of a particular diagnosis given the appearance of some symptoms or test result. Bayesian Analysis,Bayesian Estimation,Bayesian Forecast,Bayesian Method,Bayesian Prediction,Analysis, Bayesian,Bayesian Approach,Approach, Bayesian,Approachs, Bayesian,Bayesian Approachs,Estimation, Bayesian,Forecast, Bayesian,Method, Bayesian,Prediction, Bayesian,Theorem, Bayes
D012441 Saccharomyces cerevisiae A species of the genus SACCHAROMYCES, family Saccharomycetaceae, order Saccharomycetales, known as "baker's" or "brewer's" yeast. The dried form is used as a dietary supplement. Baker's Yeast,Brewer's Yeast,Candida robusta,S. cerevisiae,Saccharomyces capensis,Saccharomyces italicus,Saccharomyces oviformis,Saccharomyces uvarum var. melibiosus,Yeast, Baker's,Yeast, Brewer's,Baker Yeast,S cerevisiae,Baker's Yeasts,Yeast, Baker
D016000 Cluster Analysis A set of statistical methods used to group variables or observations into strongly inter-related subgroups. In epidemiology, it may be used to analyze a closely grouped series of events or cases of disease or other health-related phenomenon with well-defined distribution patterns in relation to time or place or both. Clustering,Analyses, Cluster,Analysis, Cluster,Cluster Analyses,Clusterings
D016011 Normal Distribution Continuous frequency distribution of infinite range. Its properties are as follows: 1, continuous, symmetrical distribution with both tails extending to infinity; 2, arithmetic mean, mode, and median identical; and 3, shape completely determined by the mean and standard deviation. Gaussian Distribution,Distribution, Gaussian,Distribution, Normal,Distributions, Normal,Normal Distributions
D020411 Oligonucleotide Array Sequence Analysis Hybridization of a nucleic acid sample to a very large set of OLIGONUCLEOTIDE PROBES, which have been attached individually in columns and rows to a solid support, to determine a BASE SEQUENCE, or to detect variations in a gene sequence, GENE EXPRESSION, or for GENE MAPPING. DNA Microarrays,Gene Expression Microarray Analysis,Oligonucleotide Arrays,cDNA Microarrays,DNA Arrays,DNA Chips,DNA Microchips,Gene Chips,Oligodeoxyribonucleotide Array Sequence Analysis,Oligonucleotide Microarrays,Sequence Analysis, Oligonucleotide Array,cDNA Arrays,Array, DNA,Array, Oligonucleotide,Array, cDNA,Arrays, DNA,Arrays, Oligonucleotide,Arrays, cDNA,Chip, DNA,Chip, Gene,Chips, DNA,Chips, Gene,DNA Array,DNA Chip,DNA Microarray,DNA Microchip,Gene Chip,Microarray, DNA,Microarray, Oligonucleotide,Microarray, cDNA,Microarrays, DNA,Microarrays, Oligonucleotide,Microarrays, cDNA,Microchip, DNA,Microchips, DNA,Oligonucleotide Array,Oligonucleotide Microarray,cDNA Array,cDNA Microarray
D020869 Gene Expression Profiling The determination of the pattern of genes expressed at the level of GENETIC TRANSCRIPTION, under specific circumstances or in a specific cell. Gene Expression Analysis,Gene Expression Pattern Analysis,Transcript Expression Analysis,Transcriptome Profiling,Transcriptomics,mRNA Differential Display,Gene Expression Monitoring,Transcriptome Analysis,Analyses, Gene Expression,Analyses, Transcript Expression,Analyses, Transcriptome,Analysis, Gene Expression,Analysis, Transcript Expression,Analysis, Transcriptome,Differential Display, mRNA,Differential Displays, mRNA,Expression Analyses, Gene,Expression Analysis, Gene,Gene Expression Analyses,Gene Expression Monitorings,Gene Expression Profilings,Monitoring, Gene Expression,Monitorings, Gene Expression,Profiling, Gene Expression,Profiling, Transcriptome,Profilings, Gene Expression,Profilings, Transcriptome,Transcript Expression Analyses,Transcriptome Analyses,Transcriptome Profilings,mRNA Differential Displays

Related Publications

Emma J Cooke, and Richard S Savage, and Paul D W Kirk, and Robert Darkins, and David L Wild
January 2011, Advances in experimental medicine and biology,
Emma J Cooke, and Richard S Savage, and Paul D W Kirk, and Robert Darkins, and David L Wild
August 2009, BMC bioinformatics,
Emma J Cooke, and Richard S Savage, and Paul D W Kirk, and Robert Darkins, and David L Wild
July 2007, BMC bioinformatics,
Emma J Cooke, and Richard S Savage, and Paul D W Kirk, and Robert Darkins, and David L Wild
January 2008, International journal of plant genomics,
Emma J Cooke, and Richard S Savage, and Paul D W Kirk, and Robert Darkins, and David L Wild
August 2013, BMC bioinformatics,
Emma J Cooke, and Richard S Savage, and Paul D W Kirk, and Robert Darkins, and David L Wild
April 2006, Biostatistics (Oxford, England),
Emma J Cooke, and Richard S Savage, and Paul D W Kirk, and Robert Darkins, and David L Wild
June 2007, Biometrics,
Emma J Cooke, and Richard S Savage, and Paul D W Kirk, and Robert Darkins, and David L Wild
September 2010, Bioinformatics (Oxford, England),
Emma J Cooke, and Richard S Savage, and Paul D W Kirk, and Robert Darkins, and David L Wild
March 2019, Entropy (Basel, Switzerland),
Emma J Cooke, and Richard S Savage, and Paul D W Kirk, and Robert Darkins, and David L Wild
November 2003, Bioinformatics (Oxford, England),
Copied contents to your clipboard!