Probabilistic inference of transcription factor binding from multiple data sources. 2008

Harri Lähdesmäki, and Alistair G Rust, and Ilya Shmulevich
Institute for Systems Biology, Seattle, Washington, United States of America.

An important problem in molecular biology is to build a complete understanding of transcriptional regulatory processes in the cell. We have developed a flexible, probabilistic framework to predict TF binding from multiple data sources that differs from the standard hypothesis testing (scanning) methods in several ways. Our probabilistic modeling framework estimates the probability of binding and, thus, naturally reflects our degree of belief in binding. Probabilistic modeling also allows for easy and systematic integration of our binding predictions into other probabilistic modeling methods, such as expression-based gene network inference. The method answers the question of whether the whole analyzed promoter has a binding site, but can also be extended to estimate the binding probability at each nucleotide position. Further, we introduce an extension to model combinatorial regulation by several TFs. Most importantly, the proposed methods can make principled probabilistic inference from multiple evidence sources, such as, multiple statistical models (motifs) of the TFs, evolutionary conservation, regulatory potential, CpG islands, nucleosome positioning, DNase hypersensitive sites, ChIP-chip binding segments and other (prior) sequence-based biological knowledge. We developed both a likelihood and a Bayesian method, where the latter is implemented with a Markov chain Monte Carlo algorithm. Results on a carefully constructed test set from the mouse genome demonstrate that principled data fusion can significantly improve the performance of TF binding prediction methods. We also applied the probabilistic modeling framework to all promoters in the mouse genome and the results indicate a sparse connectivity between transcriptional regulators and their target promoters. To facilitate analysis of other sequences and additional data, we have developed an on-line web tool, ProbTF, which implements our probabilistic TF binding prediction method using multiple data sources. Test data set, a web tool, source codes and supplementary data are available at: http://www.probtf.org.

UI MeSH Term Description Entries
D008390 Markov Chains A stochastic process such that the conditional probability distribution for a state at any future instant, given the present state, is unaffected by any additional knowledge of the past history of the system. Markov Process,Markov Chain,Chain, Markov,Chains, Markov,Markov Processes,Process, Markov,Processes, Markov
D009010 Monte Carlo Method In statistics, a technique for numerically approximating the solution of a mathematical problem by studying the distribution of some random variable, often generated by a computer. The name alludes to the randomness characteristic of the games of chance played at the gambling casinos in Monte Carlo. (From Random House Unabridged Dictionary, 2d ed, 1993) Method, Monte Carlo
D011336 Probability The study of chance processes or the relative frequency characterizing a chance process. Probabilities
D011485 Protein Binding The process in which substances, either endogenous or exogenous, bind to proteins, peptides, enzymes, protein precursors, or allied compounds. Specific protein-binding measures are often used as assays in diagnostic assessments. Plasma Protein Binding Capacity,Binding, Protein
D001499 Bayes Theorem A theorem in probability theory named for Thomas Bayes (1702-1761). In epidemiology, it is used to obtain the probability of disease in a group of people with some characteristic on the basis of the overall rate of that disease and of the likelihood of that characteristic in healthy and diseased individuals. The most familiar application is in clinical decision analysis where it is used for estimating the probability of a particular diagnosis given the appearance of some symptoms or test result. Bayesian Analysis,Bayesian Estimation,Bayesian Forecast,Bayesian Method,Bayesian Prediction,Analysis, Bayesian,Bayesian Approach,Approach, Bayesian,Approachs, Bayesian,Bayesian Approachs,Estimation, Bayesian,Forecast, Bayesian,Method, Bayesian,Prediction, Bayesian,Theorem, Bayes
D014157 Transcription Factors Endogenous substances, usually proteins, which are effective in the initiation, stimulation, or termination of the genetic transcription process. Transcription Factor,Factor, Transcription,Factors, Transcription

Related Publications

Harri Lähdesmäki, and Alistair G Rust, and Ilya Shmulevich
January 2015, IEEE/ACM transactions on computational biology and bioinformatics,
Harri Lähdesmäki, and Alistair G Rust, and Ilya Shmulevich
May 2004, Bioinformatics (Oxford, England),
Harri Lähdesmäki, and Alistair G Rust, and Ilya Shmulevich
January 2013, PloS one,
Harri Lähdesmäki, and Alistair G Rust, and Ilya Shmulevich
March 2011, Genome research,
Harri Lähdesmäki, and Alistair G Rust, and Ilya Shmulevich
April 2024, Psychological review,
Harri Lähdesmäki, and Alistair G Rust, and Ilya Shmulevich
October 2010, Bioinformatics (Oxford, England),
Harri Lähdesmäki, and Alistair G Rust, and Ilya Shmulevich
November 2006, Bioinformatics (Oxford, England),
Harri Lähdesmäki, and Alistair G Rust, and Ilya Shmulevich
January 2010, Methods in molecular biology (Clifton, N.J.),
Harri Lähdesmäki, and Alistair G Rust, and Ilya Shmulevich
May 2017, Nucleic acids research,
Copied contents to your clipboard!