Genome-wide prediction of transcriptional regulatory elements of human promoters using gene expression and promoter analysis data. 2006

Seon-Young Kim, and YongSung Kim
Human Genomics Laboratory, Genome Research Center, Korea Research Institute of Bioscience and Biotechnology, 52 Eoeun-dong, Yuseong-gu, Daejeon 305-333, Korea. kimsy@kribb.re.kr

BACKGROUND A complete understanding of the regulatory mechanisms of gene expression is the next important issue of genomics. Many bioinformaticians have developed methods and algorithms for predicting transcriptional regulatory mechanisms from sequence, gene expression, and binding data. However, most of these studies involved the use of yeast which has much simpler regulatory networks than human and has many genome wide binding data and gene expression data under diverse conditions. Studies of genome wide transcriptional networks of human genomes currently lag behind those of yeast. RESULTS We report herein a new method that combines gene expression data analysis with promoter analysis to infer transcriptional regulatory elements of human genes. The Z scores from the application of gene set analysis with gene sets of transcription factor binding sites (TFBSs) were successfully used to represent the activity of TFBSs in a given microarray data set. A significant correlation between the Z scores of gene sets of TFBSs and individual genes across multiple conditions permitted successful identification of many known human transcriptional regulatory elements of genes as well as the prediction of numerous putative TFBSs of many genes which will constitute a good starting point for further experiments. Using Z scores of gene sets of TFBSs produced better predictions than the use of mRNA levels of a transcription factor itself, suggesting that the Z scores of gene sets of TFBSs better represent diverse mechanisms for changing the activity of transcription factors in the cell. In addition, cis-regulatory modules, combinations of co-acting TFBSs, were readily identified by our analysis. CONCLUSIONS By a strategic combination of gene set level analysis of gene expression data sets and promoter analysis, we were able to identify and predict many transcriptional regulatory elements of human genes. We conclude that this approach will aid in decoding some of the important transcriptional regulatory elements of human genes.

UI MeSH Term Description Entries
D011401 Promoter Regions, Genetic DNA sequences which are recognized (directly or indirectly) and bound by a DNA-dependent RNA polymerase during the initiation of transcription. Highly conserved sequences within the promoter include the Pribnow box in bacteria and the TATA BOX in eukaryotes. rRNA Promoter,Early Promoters, Genetic,Late Promoters, Genetic,Middle Promoters, Genetic,Promoter Regions,Promoter, Genetic,Promotor Regions,Promotor, Genetic,Pseudopromoter, Genetic,Early Promoter, Genetic,Genetic Late Promoter,Genetic Middle Promoters,Genetic Promoter,Genetic Promoter Region,Genetic Promoter Regions,Genetic Promoters,Genetic Promotor,Genetic Promotors,Genetic Pseudopromoter,Genetic Pseudopromoters,Late Promoter, Genetic,Middle Promoter, Genetic,Promoter Region,Promoter Region, Genetic,Promoter, Genetic Early,Promoter, rRNA,Promoters, Genetic,Promoters, Genetic Middle,Promoters, rRNA,Promotor Region,Promotors, Genetic,Pseudopromoters, Genetic,Region, Genetic Promoter,Region, Promoter,Region, Promotor,Regions, Genetic Promoter,Regions, Promoter,Regions, Promotor,rRNA Promoters
D003627 Data Interpretation, Statistical Application of statistical procedures to analyze specific observed or assumed facts from a particular study. Data Analysis, Statistical,Data Interpretations, Statistical,Interpretation, Statistical Data,Statistical Data Analysis,Statistical Data Interpretation,Analyses, Statistical Data,Analysis, Statistical Data,Data Analyses, Statistical,Interpretations, Statistical Data,Statistical Data Analyses,Statistical Data Interpretations
D005786 Gene Expression Regulation Any of the processes by which nuclear, cytoplasmic, or intercellular factors influence the differential control (induction or repression) of gene action at the level of transcription or translation. Gene Action Regulation,Regulation of Gene Expression,Expression Regulation, Gene,Regulation, Gene Action,Regulation, Gene Expression
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D001665 Binding Sites The parts of a macromolecule that directly participate in its specific combination with another molecule. Combining Site,Binding Site,Combining Sites,Site, Binding,Site, Combining,Sites, Binding,Sites, Combining
D012333 RNA, Messenger RNA sequences that serve as templates for protein synthesis. Bacterial mRNAs are generally primary transcripts in that they do not require post-transcriptional processing. Eukaryotic mRNA is synthesized in the nucleus and must be exported to the cytoplasm for translation. Most eukaryotic mRNAs have a sequence of polyadenylic acid at the 3' end, referred to as the poly(A) tail. The function of this tail is not known for certain, but it may play a role in the export of mature mRNA from the nucleus as well as in helping stabilize some mRNA molecules by retarding their degradation in the cytoplasm. Messenger RNA,Messenger RNA, Polyadenylated,Poly(A) Tail,Poly(A)+ RNA,Poly(A)+ mRNA,RNA, Messenger, Polyadenylated,RNA, Polyadenylated,mRNA,mRNA, Non-Polyadenylated,mRNA, Polyadenylated,Non-Polyadenylated mRNA,Poly(A) RNA,Polyadenylated mRNA,Non Polyadenylated mRNA,Polyadenylated Messenger RNA,Polyadenylated RNA,RNA, Polyadenylated Messenger,mRNA, Non Polyadenylated
D014157 Transcription Factors Endogenous substances, usually proteins, which are effective in the initiation, stimulation, or termination of the genetic transcription process. Transcription Factor,Factor, Transcription,Factors, Transcription
D014158 Transcription, Genetic The biosynthesis of RNA carried out on a template of DNA. The biosynthesis of DNA from an RNA template is called REVERSE TRANSCRIPTION. Genetic Transcription
D015894 Genome, Human The complete genetic complement contained in the DNA of a set of CHROMOSOMES in a HUMAN. The length of the human genome is about 3 billion base pairs. Human Genome,Genomes, Human,Human Genomes

Related Publications

Seon-Young Kim, and YongSung Kim
April 2009, Omics : a journal of integrative biology,
Seon-Young Kim, and YongSung Kim
January 2013, BMC medical genomics,
Seon-Young Kim, and YongSung Kim
January 2002, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing,
Seon-Young Kim, and YongSung Kim
June 2017, DNA research : an international journal for rapid publication of reports on genes and genomes,
Seon-Young Kim, and YongSung Kim
September 1998, The Journal of neuroscience : the official journal of the Society for Neuroscience,
Seon-Young Kim, and YongSung Kim
January 2012, Nucleic acids research,
Seon-Young Kim, and YongSung Kim
January 2007, Methods in molecular biology (Clifton, N.J.),
Copied contents to your clipboard!