Recognition models to predict DNA-binding specificities of homeodomain proteins. 2012

Ryan G Christensen, and Metewo Selase Enuameh, and Marcus B Noyes, and Michael H Brodsky, and Scot A Wolfe, and Gary D Stormo
Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA.

BACKGROUND Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constructing useful models, especially for C(2)H(2) zinc finger proteins, but it remains a challenging problem with ample room for improvement. For most families of transcription factors the best available methods utilize k-nearest neighbor (KNN) algorithms to make specificity predictions based on the average of the specificities of the k most similar proteins with defined specificities. Homeodomain (HD) proteins are the second most abundant family of transcription factors, after zinc fingers, in most metazoan genomes, and as a consequence an effective recognition model for this family would facilitate predictive models of many transcriptional regulatory networks within these genomes. RESULTS Using extensive experimental data, we have tested several machine learning approaches and find that both support vector machines and random forests (RFs) can produce recognition models for HD proteins that are significant improvements over KNN-based methods. Cross-validation analyses show that the resulting models are capable of predicting specificities with high accuracy. We have produced a web-based prediction tool, PreMoTF (Predicted Motifs for Transcription Factors) (http://stormo.wustl.edu/PreMoTF), for predicting position frequency matrices from protein sequence using a RF-based model.

UI MeSH Term Description Entries
D004247 DNA A deoxyribonucleotide polymer that is the primary genetic material of all cells. Eukaryotic and prokaryotic organisms normally contain DNA in a double-stranded state, yet several important biological processes transiently involve single-stranded regions. DNA, which consists of a polysugar-phosphate backbone possessing projections of purines (adenine and guanine) and pyrimidines (thymine and cytosine), forms a double helix that is held together by hydrogen bonds between these purines and pyrimidines (adenine to thymine and guanine to cytosine). DNA, Double-Stranded,Deoxyribonucleic Acid,ds-DNA,DNA, Double Stranded,Double-Stranded DNA,ds DNA
D004330 Drosophila A genus of small, two-winged flies containing approximately 900 described species. These organisms are the most extensively studied of all genera from the standpoint of genetics and cytology. Fruit Fly, Drosophila,Drosophila Fruit Flies,Drosophila Fruit Fly,Drosophilas,Flies, Drosophila Fruit,Fly, Drosophila Fruit,Fruit Flies, Drosophila
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D000595 Amino Acid Sequence The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION. Protein Structure, Primary,Amino Acid Sequences,Sequence, Amino Acid,Sequences, Amino Acid,Primary Protein Structure,Primary Protein Structures,Protein Structures, Primary,Structure, Primary Protein,Structures, Primary Protein
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D001185 Artificial Intelligence Theory and development of COMPUTER SYSTEMS which perform tasks that normally require human intelligence. Such tasks may include speech recognition, LEARNING; VISUAL PERCEPTION; MATHEMATICAL COMPUTING; reasoning, PROBLEM SOLVING, DECISION-MAKING, and translation of language. AI (Artificial Intelligence),Computer Reasoning,Computer Vision Systems,Knowledge Acquisition (Computer),Knowledge Representation (Computer),Machine Intelligence,Computational Intelligence,Acquisition, Knowledge (Computer),Computer Vision System,Intelligence, Artificial,Intelligence, Computational,Intelligence, Machine,Knowledge Representations (Computer),Reasoning, Computer,Representation, Knowledge (Computer),System, Computer Vision,Systems, Computer Vision,Vision System, Computer,Vision Systems, Computer
D001665 Binding Sites The parts of a macromolecule that directly participate in its specific combination with another molecule. Combining Site,Binding Site,Combining Sites,Site, Binding,Site, Combining,Sites, Binding,Sites, Combining
D014157 Transcription Factors Endogenous substances, usually proteins, which are effective in the initiation, stimulation, or termination of the genetic transcription process. Transcription Factor,Factor, Transcription,Factors, Transcription
D015233 Models, Statistical Statistical formulations or analyses which, when applied to data and found to fit the data, are then used to verify the assumptions and parameters used in the analysis. Examples of statistical models are the linear model, binomial model, polynomial model, two-parameter model, etc. Probabilistic Models,Statistical Models,Two-Parameter Models,Model, Statistical,Models, Binomial,Models, Polynomial,Statistical Model,Binomial Model,Binomial Models,Model, Binomial,Model, Polynomial,Model, Probabilistic,Model, Two-Parameter,Models, Probabilistic,Models, Two-Parameter,Polynomial Model,Polynomial Models,Probabilistic Model,Two Parameter Models,Two-Parameter Model

Related Publications

Ryan G Christensen, and Metewo Selase Enuameh, and Marcus B Noyes, and Michael H Brodsky, and Scot A Wolfe, and Gary D Stormo
September 1999, Biological chemistry,
Ryan G Christensen, and Metewo Selase Enuameh, and Marcus B Noyes, and Michael H Brodsky, and Scot A Wolfe, and Gary D Stormo
July 1994, Cell,
Ryan G Christensen, and Metewo Selase Enuameh, and Marcus B Noyes, and Michael H Brodsky, and Scot A Wolfe, and Gary D Stormo
January 1996, Progress in biophysics and molecular biology,
Ryan G Christensen, and Metewo Selase Enuameh, and Marcus B Noyes, and Michael H Brodsky, and Scot A Wolfe, and Gary D Stormo
September 1992, Nucleic acids research,
Ryan G Christensen, and Metewo Selase Enuameh, and Marcus B Noyes, and Michael H Brodsky, and Scot A Wolfe, and Gary D Stormo
January 2013, PloS one,
Ryan G Christensen, and Metewo Selase Enuameh, and Marcus B Noyes, and Michael H Brodsky, and Scot A Wolfe, and Gary D Stormo
June 2000, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme,
Ryan G Christensen, and Metewo Selase Enuameh, and Marcus B Noyes, and Michael H Brodsky, and Scot A Wolfe, and Gary D Stormo
December 2003, Nucleic acids research,
Ryan G Christensen, and Metewo Selase Enuameh, and Marcus B Noyes, and Michael H Brodsky, and Scot A Wolfe, and Gary D Stormo
November 1997, Molecular and cellular biology,
Ryan G Christensen, and Metewo Selase Enuameh, and Marcus B Noyes, and Michael H Brodsky, and Scot A Wolfe, and Gary D Stormo
January 2006, Methods in enzymology,
Ryan G Christensen, and Metewo Selase Enuameh, and Marcus B Noyes, and Michael H Brodsky, and Scot A Wolfe, and Gary D Stormo
December 2016, Proceedings. IEEE International Conference on Bioinformatics and Biomedicine,
Copied contents to your clipboard!