A novel method for improved accuracy of transcription factor binding site prediction. 2018

Abdullah M Khamis, and Olaa Motwalli, and Romina Oliva, and Boris R Jankovic, and Yulia A Medvedeva, and Haitham Ashoor, and Magbubah Essack, and Xin Gao, and Vladimir B Bajic
King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia.

Identifying transcription factor (TF) binding sites (TFBSs) is important in the computational inference of gene regulation. Widely used computational methods of TFBS prediction based on position weight matrices (PWMs) usually have high false positive rates. Moreover, computational studies of transcription regulation in eukaryotes frequently require numerous PWM models of TFBSs due to a large number of TFs involved. To overcome these problems we developed DRAF, a novel method for TFBS prediction that requires only 14 prediction models for 232 human TFs, while at the same time significantly improves prediction accuracy. DRAF models use more features than PWM models, as they combine information from TFBS sequences and physicochemical properties of TF DNA-binding domains into machine learning models. Evaluation of DRAF on 98 human ChIP-seq datasets shows on average 1.54-, 1.96- and 5.19-fold reduction of false positives at the same sensitivities compared to models from HOCOMOCO, TRANSFAC and DeepBind, respectively. This observation suggests that one can efficiently replace the PWM models for TFBS prediction by a small number of DRAF models that significantly improve prediction accuracy. The DRAF method is implemented in a web tool and in a stand-alone software freely available at http://cbrc.kaust.edu.sa/DRAF.

UI MeSH Term Description Entries
D004247 DNA A deoxyribonucleotide polymer that is the primary genetic material of all cells. Eukaryotic and prokaryotic organisms normally contain DNA in a double-stranded state, yet several important biological processes transiently involve single-stranded regions. DNA, which consists of a polysugar-phosphate backbone possessing projections of purines (adenine and guanine) and pyrimidines (thymine and cytosine), forms a double helix that is held together by hydrogen bonds between these purines and pyrimidines (adenine to thymine and guanine to cytosine). DNA, Double-Stranded,Deoxyribonucleic Acid,ds-DNA,DNA, Double Stranded,Double-Stranded DNA,ds DNA
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000069550 Machine Learning A type of ARTIFICIAL INTELLIGENCE that enable COMPUTERS to independently initiate and execute LEARNING when exposed to new data. Transfer Learning,Learning, Machine,Learning, Transfer
D001665 Binding Sites The parts of a macromolecule that directly participate in its specific combination with another molecule. Combining Site,Binding Site,Combining Sites,Site, Binding,Site, Combining,Sites, Binding,Sites, Combining
D014157 Transcription Factors Endogenous substances, usually proteins, which are effective in the initiation, stimulation, or termination of the genetic transcription process. Transcription Factor,Factor, Transcription,Factors, Transcription
D017422 Sequence Analysis, DNA A multistage process that includes cloning, physical mapping, subcloning, determination of the DNA SEQUENCE, and information analysis. DNA Sequence Analysis,Sequence Determination, DNA,Analysis, DNA Sequence,DNA Sequence Determination,DNA Sequence Determinations,DNA Sequencing,Determination, DNA Sequence,Determinations, DNA Sequence,Sequence Determinations, DNA,Analyses, DNA Sequence,DNA Sequence Analyses,Sequence Analyses, DNA,Sequencing, DNA
D047369 Chromatin Immunoprecipitation A technique for identifying specific DNA sequences that are bound, in vivo, to proteins of interest. It involves formaldehyde fixation of CHROMATIN to crosslink the DNA-BINDING PROTEINS to the DNA. After shearing the DNA into small fragments, specific DNA-protein complexes are isolated by immunoprecipitation with protein-specific ANTIBODIES. Then, the DNA isolated from the complex can be identified by PCR amplification and sequencing. Immunoprecipitation, Chromatin
D056510 Position-Specific Scoring Matrices Tabular numerical representations of sequence motifs displaying their variability as likelihood values for each possible residue at each position in a sequence. Position-specific scoring matrices (PSSMs) are calculated from position frequency matrices. Position-Specific Weight Matrices,Position Frequency Matrices,Position Weight Matrices,Position Weight Matrix,Position-Specific Scoring Matrix,Position-Specific Weight Matrix,Sequence Logo,Sequence Logos,Frequency Matrices, Position,Logo, Sequence,Logos, Sequence,Matrices, Position Frequency,Matrices, Position Weight,Matrices, Position-Specific Scoring,Matrices, Position-Specific Weight,Matrix, Position Weight,Matrix, Position-Specific Scoring,Matrix, Position-Specific Weight,Position Specific Scoring Matrices,Position Specific Scoring Matrix,Position Specific Weight Matrices,Position Specific Weight Matrix,Scoring Matrices, Position-Specific,Scoring Matrix, Position-Specific,Weight Matrices, Position,Weight Matrices, Position-Specific,Weight Matrix, Position,Weight Matrix, Position-Specific

Related Publications

Abdullah M Khamis, and Olaa Motwalli, and Romina Oliva, and Boris R Jankovic, and Yulia A Medvedeva, and Haitham Ashoor, and Magbubah Essack, and Xin Gao, and Vladimir B Bajic
October 2011, Bioinformatics (Oxford, England),
Abdullah M Khamis, and Olaa Motwalli, and Romina Oliva, and Boris R Jankovic, and Yulia A Medvedeva, and Haitham Ashoor, and Magbubah Essack, and Xin Gao, and Vladimir B Bajic
January 2010, PloS one,
Abdullah M Khamis, and Olaa Motwalli, and Romina Oliva, and Boris R Jankovic, and Yulia A Medvedeva, and Haitham Ashoor, and Magbubah Essack, and Xin Gao, and Vladimir B Bajic
June 2006, Bioinformation,
Abdullah M Khamis, and Olaa Motwalli, and Romina Oliva, and Boris R Jankovic, and Yulia A Medvedeva, and Haitham Ashoor, and Magbubah Essack, and Xin Gao, and Vladimir B Bajic
November 2016, BMC bioinformatics,
Abdullah M Khamis, and Olaa Motwalli, and Romina Oliva, and Boris R Jankovic, and Yulia A Medvedeva, and Haitham Ashoor, and Magbubah Essack, and Xin Gao, and Vladimir B Bajic
April 2005, BMC bioinformatics,
Abdullah M Khamis, and Olaa Motwalli, and Romina Oliva, and Boris R Jankovic, and Yulia A Medvedeva, and Haitham Ashoor, and Magbubah Essack, and Xin Gao, and Vladimir B Bajic
October 2022, Bioinformatics (Oxford, England),
Abdullah M Khamis, and Olaa Motwalli, and Romina Oliva, and Boris R Jankovic, and Yulia A Medvedeva, and Haitham Ashoor, and Magbubah Essack, and Xin Gao, and Vladimir B Bajic
June 1989, Nucleic acids research,
Abdullah M Khamis, and Olaa Motwalli, and Romina Oliva, and Boris R Jankovic, and Yulia A Medvedeva, and Haitham Ashoor, and Magbubah Essack, and Xin Gao, and Vladimir B Bajic
January 2003, Genome biology,
Abdullah M Khamis, and Olaa Motwalli, and Romina Oliva, and Boris R Jankovic, and Yulia A Medvedeva, and Haitham Ashoor, and Magbubah Essack, and Xin Gao, and Vladimir B Bajic
March 2013, BMC bioinformatics,
Abdullah M Khamis, and Olaa Motwalli, and Romina Oliva, and Boris R Jankovic, and Yulia A Medvedeva, and Haitham Ashoor, and Magbubah Essack, and Xin Gao, and Vladimir B Bajic
January 2010, Methods in molecular biology (Clifton, N.J.),
Copied contents to your clipboard!