A Bayesian nonparametric approach for comparing clustering structures in EST libraries. 2008

Antonio Lijoi, and Ramsés H Mena, and Igor Prünster
Department of Economics and Quantitative Methods, University of Pavia, Pavia, Italy.

Inference for Expressed Sequence Tags (ESTs) data is considered. We focus on evaluating the redundancy of a cDNA library and, more importantly, on comparing different libraries on the basis of their clustering structure. The numerical results we achieve allow us to assess the effect of an error correction procedure for EST data and to study the compatibility of single EST libraries with respect to merged ones. The proposed method is based on a Bayesian nonparametric approach that allows to understand the clustering mechanism that generates the observed data. As specific nonparametric model we use the two parameter Poisson-Dirichlet (PD) process. The PD process represents a tractable nonparametric prior which is a natural candidate for modeling data arising from discrete distributions. It allows prediction and testing in order to analyze the clustering structure featured by the data. We show how a full Bayesian analysis can be performed and describe the corresponding computational algorithm.

UI MeSH Term Description Entries
D008969 Molecular Sequence Data Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories. Sequence Data, Molecular,Molecular Sequencing Data,Data, Molecular Sequence,Data, Molecular Sequencing,Sequencing Data, Molecular
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D001483 Base Sequence The sequence of PURINES and PYRIMIDINES in nucleic acids and polynucleotides. It is also called nucleotide sequence. DNA Sequence,Nucleotide Sequence,RNA Sequence,DNA Sequences,Base Sequences,Nucleotide Sequences,RNA Sequences,Sequence, Base,Sequence, DNA,Sequence, Nucleotide,Sequence, RNA,Sequences, Base,Sequences, DNA,Sequences, Nucleotide,Sequences, RNA
D001499 Bayes Theorem A theorem in probability theory named for Thomas Bayes (1702-1761). In epidemiology, it is used to obtain the probability of disease in a group of people with some characteristic on the basis of the overall rate of that disease and of the likelihood of that characteristic in healthy and diseased individuals. The most familiar application is in clinical decision analysis where it is used for estimating the probability of a particular diagnosis given the appearance of some symptoms or test result. Bayesian Analysis,Bayesian Estimation,Bayesian Forecast,Bayesian Method,Bayesian Prediction,Analysis, Bayesian,Bayesian Approach,Approach, Bayesian,Approachs, Bayesian,Bayesian Approachs,Estimation, Bayesian,Forecast, Bayesian,Method, Bayesian,Prediction, Bayesian,Theorem, Bayes
D015723 Gene Library A large collection of DNA fragments cloned (CLONING, MOLECULAR) from a given organism, tissue, organ, or cell type. It may contain complete genomic sequences (GENOMIC LIBRARY) or complementary DNA sequences, the latter being formed from messenger RNA and lacking intron sequences. DNA Library,cDNA Library,DNA Libraries,Gene Libraries,Libraries, DNA,Libraries, Gene,Libraries, cDNA,Library, DNA,Library, Gene,Library, cDNA,cDNA Libraries
D016000 Cluster Analysis A set of statistical methods used to group variables or observations into strongly inter-related subgroups. In epidemiology, it may be used to analyze a closely grouped series of events or cases of disease or other health-related phenomenon with well-defined distribution patterns in relation to time or place or both. Clustering,Analyses, Cluster,Analysis, Cluster,Cluster Analyses,Clusterings
D017422 Sequence Analysis, DNA A multistage process that includes cloning, physical mapping, subcloning, determination of the DNA SEQUENCE, and information analysis. DNA Sequence Analysis,Sequence Determination, DNA,Analysis, DNA Sequence,DNA Sequence Determination,DNA Sequence Determinations,DNA Sequencing,Determination, DNA Sequence,Determinations, DNA Sequence,Sequence Determinations, DNA,Analyses, DNA Sequence,DNA Sequence Analyses,Sequence Analyses, DNA,Sequencing, DNA
D020224 Expressed Sequence Tags Partial cDNA (DNA, COMPLEMENTARY) sequences that are unique to the cDNAs from which they were derived. ESTs,Expressed Sequence Tag,Sequence Tag, Expressed,Sequence Tags, Expressed,Tag, Expressed Sequence,Tags, Expressed Sequence

Related Publications

Antonio Lijoi, and Ramsés H Mena, and Igor Prünster
January 2012, BMC genomics,
Antonio Lijoi, and Ramsés H Mena, and Igor Prünster
January 2016, Methods in molecular biology (Clifton, N.J.),
Antonio Lijoi, and Ramsés H Mena, and Igor Prünster
May 2016, IEEE transactions on pattern analysis and machine intelligence,
Antonio Lijoi, and Ramsés H Mena, and Igor Prünster
January 2020, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America,
Antonio Lijoi, and Ramsés H Mena, and Igor Prünster
December 2022, IEEE transactions on neural networks and learning systems,
Antonio Lijoi, and Ramsés H Mena, and Igor Prünster
January 2013, Journal of the American Statistical Association,
Antonio Lijoi, and Ramsés H Mena, and Igor Prünster
September 2002, Biometrics,
Antonio Lijoi, and Ramsés H Mena, and Igor Prünster
December 2013, Bayesian analysis,
Antonio Lijoi, and Ramsés H Mena, and Igor Prünster
January 2015, Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference,
Antonio Lijoi, and Ramsés H Mena, and Igor Prünster
August 2013, Genetics,
Copied contents to your clipboard!