Modeling promoter grammars with evolving hidden Markov models. 2008

Kyoung-Jae Won, and Albin Sandelin, and Troels Torben Marstrand, and Anders Krogh
The Bioinformatics Centre, Department of Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen N, Denmark.

BACKGROUND Describing and modeling biological features of eukaryotic promoters remains an important and challenging problem within computational biology. The promoters of higher eukaryotes in particular display a wide variation in regulatory features, which are difficult to model. Often several factors are involved in the regulation of a set of co-regulated genes. If so, promoters can be modeled with connected regulatory features, where the network of connections is characteristic for a particular mode of regulation. RESULTS With the goal of automatically deciphering such regulatory structures, we present a method that iteratively evolves an ensemble of regulatory grammars using a hidden Markov Model (HMM) architecture composed of interconnected blocks representing transcription factor binding sites (TFBSs) and background regions of promoter sequences. The ensemble approach reduces the risk of overfitting and generally improves performance. We apply this method to identify TFBSs and to classify promoters preferentially expressed in macrophages, where it outperforms other methods due to the increased predictive power given by the grammar. BACKGROUND The software and the datasets are available from http://modem.ucsd.edu/won/eHMM.tar.gz

UI MeSH Term Description Entries
D008390 Markov Chains A stochastic process such that the conditional probability distribution for a state at any future instant, given the present state, is unaffected by any additional knowledge of the past history of the system. Markov Process,Markov Chain,Chain, Markov,Chains, Markov,Markov Processes,Process, Markov,Processes, Markov
D008957 Models, Genetic Theoretical representations that simulate the behavior or activity of genetic processes or phenomena. They include the use of mathematical equations, computers, and other electronic equipment. Genetic Models,Genetic Model,Model, Genetic
D008969 Molecular Sequence Data Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories. Sequence Data, Molecular,Molecular Sequencing Data,Data, Molecular Sequence,Data, Molecular Sequencing,Sequencing Data, Molecular
D011401 Promoter Regions, Genetic DNA sequences which are recognized (directly or indirectly) and bound by a DNA-dependent RNA polymerase during the initiation of transcription. Highly conserved sequences within the promoter include the Pribnow box in bacteria and the TATA BOX in eukaryotes. rRNA Promoter,Early Promoters, Genetic,Late Promoters, Genetic,Middle Promoters, Genetic,Promoter Regions,Promoter, Genetic,Promotor Regions,Promotor, Genetic,Pseudopromoter, Genetic,Early Promoter, Genetic,Genetic Late Promoter,Genetic Middle Promoters,Genetic Promoter,Genetic Promoter Region,Genetic Promoter Regions,Genetic Promoters,Genetic Promotor,Genetic Promotors,Genetic Pseudopromoter,Genetic Pseudopromoters,Late Promoter, Genetic,Middle Promoter, Genetic,Promoter Region,Promoter Region, Genetic,Promoter, Genetic Early,Promoter, rRNA,Promoters, Genetic,Promoters, Genetic Middle,Promoters, rRNA,Promotor Region,Promotors, Genetic,Pseudopromoters, Genetic,Region, Genetic Promoter,Region, Promoter,Region, Promotor,Regions, Genetic Promoter,Regions, Promoter,Regions, Promotor,rRNA Promoters
D011485 Protein Binding The process in which substances, either endogenous or exogenous, bind to proteins, peptides, enzymes, protein precursors, or allied compounds. Specific protein-binding measures are often used as assays in diagnostic assessments. Plasma Protein Binding Capacity,Binding, Protein
D003198 Computer Simulation Computer-based representation of physical systems and phenomena such as chemical processes. Computational Modeling,Computational Modelling,Computer Models,In silico Modeling,In silico Models,In silico Simulation,Models, Computer,Computerized Models,Computer Model,Computer Simulations,Computerized Model,In silico Model,Model, Computer,Model, Computerized,Model, In silico,Modeling, Computational,Modeling, In silico,Modelling, Computational,Simulation, Computer,Simulation, In silico,Simulations, Computer
D001483 Base Sequence The sequence of PURINES and PYRIMIDINES in nucleic acids and polynucleotides. It is also called nucleotide sequence. DNA Sequence,Nucleotide Sequence,RNA Sequence,DNA Sequences,Base Sequences,Nucleotide Sequences,RNA Sequences,Sequence, Base,Sequence, DNA,Sequence, Nucleotide,Sequence, RNA,Sequences, Base,Sequences, DNA,Sequences, Nucleotide,Sequences, RNA
D001665 Binding Sites The parts of a macromolecule that directly participate in its specific combination with another molecule. Combining Site,Binding Site,Combining Sites,Site, Binding,Site, Combining,Sites, Binding,Sites, Combining
D012660 Semantics The relationships between symbols and their meanings. Semantic
D014157 Transcription Factors Endogenous substances, usually proteins, which are effective in the initiation, stimulation, or termination of the genetic transcription process. Transcription Factor,Factor, Transcription,Factors, Transcription

Related Publications

Kyoung-Jae Won, and Albin Sandelin, and Troels Torben Marstrand, and Anders Krogh
April 2005, Journal of bioinformatics and computational biology,
Kyoung-Jae Won, and Albin Sandelin, and Troels Torben Marstrand, and Anders Krogh
February 2016, Computer methods and programs in biomedicine,
Kyoung-Jae Won, and Albin Sandelin, and Troels Torben Marstrand, and Anders Krogh
February 2018, Behavior research methods,
Kyoung-Jae Won, and Albin Sandelin, and Troels Torben Marstrand, and Anders Krogh
January 2012, Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference,
Kyoung-Jae Won, and Albin Sandelin, and Troels Torben Marstrand, and Anders Krogh
September 2019, Nature methods,
Kyoung-Jae Won, and Albin Sandelin, and Troels Torben Marstrand, and Anders Krogh
October 2003, Bioinformatics (Oxford, England),
Kyoung-Jae Won, and Albin Sandelin, and Troels Torben Marstrand, and Anders Krogh
August 2021, IEEE International Conference on Development and Learning,
Kyoung-Jae Won, and Albin Sandelin, and Troels Torben Marstrand, and Anders Krogh
January 2005, International journal of bioinformatics research and applications,
Kyoung-Jae Won, and Albin Sandelin, and Troels Torben Marstrand, and Anders Krogh
February 2022, PLoS computational biology,
Kyoung-Jae Won, and Albin Sandelin, and Troels Torben Marstrand, and Anders Krogh
June 1996, Current opinion in structural biology,
Copied contents to your clipboard!