Predicting CTCF-mediated chromatin loops using CTCF-MP. 2018

Ruochi Zhang, and Yuchuan Wang, and Yang Yang, and Yang Zhang, and Jian Ma
Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.

The three dimensional organization of chromosomes within the cell nucleus is highly regulated. It is known that CCCTC-binding factor (CTCF) is an important architectural protein to mediate long-range chromatin loops. Recent studies have shown that the majority of CTCF binding motif pairs at chromatin loop anchor regions are in convergent orientation. However, it remains unknown whether the genomic context at the sequence level can determine if a convergent CTCF motif pair is able to form a chromatin loop. In this article, we directly ask whether and what sequence-based features (other than the motif itself) may be important to establish CTCF-mediated chromatin loops. We found that motif conservation measured by 'branch-of-origin' that accounts for motif turn-over in evolution is an important feature. We developed a new machine learning algorithm called CTCF-MP based on word2vec to demonstrate that sequence-based features alone have the capability to predict if a pair of convergent CTCF motifs would form a loop. Together with functional genomic signals from CTCF ChIP-seq and DNase-seq, CTCF-MP is able to make highly accurate predictions on whether a convergent CTCF motif pair would form a loop in a single cell type and also across different cell types. Our work represents an important step further to understand the sequence determinants that may guide the formation of complex chromatin architectures. The source code of CTCF-MP can be accessed at: https://github.com/ma-compbio/CTCF-MP. Supplementary data are available at Bioinformatics online.

UI MeSH Term Description Entries
D002843 Chromatin The material of CHROMOSOMES. It is a complex of DNA; HISTONES; and nonhistone proteins (CHROMOSOMAL PROTEINS, NON-HISTONE) found within the nucleus of a cell. Chromatins
D002877 Chromosomes, Human Very long DNA molecules and associated proteins, HISTONES, and non-histone chromosomal proteins (CHROMOSOMAL PROTEINS, NON-HISTONE). Normally 46 chromosomes, including two sex chromosomes are found in the nucleus of human cells. They carry the hereditary information of the individual. Chromosome, Human,Human Chromosome,Human Chromosomes
D006367 HeLa Cells The first continuously cultured human malignant CELL LINE, derived from the cervical carcinoma of Henrietta Lacks. These cells are used for, among other things, VIRUS CULTIVATION and PRECLINICAL DRUG EVALUATION assays. Cell, HeLa,Cells, HeLa,HeLa Cell
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000076246 CCCTC-Binding Factor A repressor protein with poly(ADP)-ribose binding activity that binds CHROMATIN and DNA; its structure consisting of 11 CYS2-HIS2 ZINC FINGERS allows it to recognize many different DNA target sites. It functions as a repressor by binding to INSULATOR ELEMENTS and preventing interaction between promoters and nearby enhancers and silencers. It plays a critical role in EPIGENETIC PROCESSES, including GENOMIC IMPRINTING. CTCF Protein,DNA-Binding Protein CTCF,CCCTC Binding Factor,CTCF, DNA-Binding Protein,DNA Binding Protein CTCF
D012984 Software Sequential operating programs and data which instruct the functioning of a digital computer. Computer Programs,Computer Software,Open Source Software,Software Engineering,Software Tools,Computer Applications Software,Computer Programs and Programming,Computer Software Applications,Application, Computer Software,Applications Software, Computer,Applications Softwares, Computer,Applications, Computer Software,Computer Applications Softwares,Computer Program,Computer Software Application,Engineering, Software,Open Source Softwares,Program, Computer,Programs, Computer,Software Application, Computer,Software Applications, Computer,Software Tool,Software, Computer,Software, Computer Applications,Software, Open Source,Softwares, Computer Applications,Softwares, Open Source,Source Software, Open,Source Softwares, Open,Tool, Software,Tools, Software
D017422 Sequence Analysis, DNA A multistage process that includes cloning, physical mapping, subcloning, determination of the DNA SEQUENCE, and information analysis. DNA Sequence Analysis,Sequence Determination, DNA,Analysis, DNA Sequence,DNA Sequence Determination,DNA Sequence Determinations,DNA Sequencing,Determination, DNA Sequence,Determinations, DNA Sequence,Sequence Determinations, DNA,Analyses, DNA Sequence,DNA Sequence Analyses,Sequence Analyses, DNA,Sequencing, DNA
D047369 Chromatin Immunoprecipitation A technique for identifying specific DNA sequences that are bound, in vivo, to proteins of interest. It involves formaldehyde fixation of CHROMATIN to crosslink the DNA-BINDING PROTEINS to the DNA. After shearing the DNA into small fragments, specific DNA-protein complexes are isolated by immunoprecipitation with protein-specific ANTIBODIES. Then, the DNA isolated from the complex can be identified by PCR amplification and sequencing. Immunoprecipitation, Chromatin
D023281 Genomics The systematic study of the complete DNA sequences (GENOME) of organisms. Included is construction of complete genetic, physical, and transcript maps, and the analysis of this structural genomic information on a global scale such as in GENOME WIDE ASSOCIATION STUDIES. Functional Genomics,Structural Genomics,Comparative Genomics,Genomics, Comparative,Genomics, Functional,Genomics, Structural

Related Publications

Ruochi Zhang, and Yuchuan Wang, and Yang Yang, and Yang Zhang, and Jian Ma
December 2021, Bioinformatics (Oxford, England),
Ruochi Zhang, and Yuchuan Wang, and Yang Yang, and Yang Zhang, and Jian Ma
March 2016, BMC genomics,
Ruochi Zhang, and Yuchuan Wang, and Yang Yang, and Yang Zhang, and Jian Ma
April 2022, Nucleic acids research,
Ruochi Zhang, and Yuchuan Wang, and Yang Yang, and Yang Zhang, and Jian Ma
October 2018, Nature communications,
Ruochi Zhang, and Yuchuan Wang, and Yang Yang, and Yang Zhang, and Jian Ma
December 2017, Cell systems,
Ruochi Zhang, and Yuchuan Wang, and Yang Yang, and Yang Zhang, and Jian Ma
March 2022, NAR genomics and bioinformatics,
Ruochi Zhang, and Yuchuan Wang, and Yang Yang, and Yang Zhang, and Jian Ma
October 2018, Genome biology,
Ruochi Zhang, and Yuchuan Wang, and Yang Yang, and Yang Zhang, and Jian Ma
July 2022, Bioinformatics (Oxford, England),
Ruochi Zhang, and Yuchuan Wang, and Yang Yang, and Yang Zhang, and Jian Ma
November 2018, Cell systems,
Ruochi Zhang, and Yuchuan Wang, and Yang Yang, and Yang Zhang, and Jian Ma
September 2022, Bioinformatics (Oxford, England),
Copied contents to your clipboard!