SMASHing regulatory sites in DNA by human-mouse sequence comparisons. 2003

Mihaela Zavolan, and Nicholas D Socci, and Nikolaus Rajewsky, and Terry Gaasterlamd
Laboratory for Computational Genomics, The Rockefeller University, New York, NY 10021, USA. mihaela@genomes.rockefeller.edu

Regulatory sequence elements provide important clues to understanding and predicting gene expression. Although the binding sites for hundreds of transcription factors are known, there has been no systematic attempt to incorporate this information in the annotation of the human genome. Cross species sequence comparisons are critical to a meaningful annotation of regulatory elements since they generally reside in conserved non-coding regions. To take advantage of the recently completed drafts of the mouse and human genomes for annotating transcription factor binding sites, we developed SMASH, a computational pipeline that identifies thousands of orthologous human/ mouse proteins, maps them to genomic sequences, extracts and compares upstream regions and annotates putative regulatory elements in conserved, non-coding, upstream regions. Our current dataset consists of approximately 2,500 human/mouse gene pairs. Transcription start sites were estimated by mapping quasi-full length cDNA sequences. SMASH uses a novel probabilistic method to identify putative conserved binding sites that takes into account the competition between transcription factors for binding DNA. SMASH presents the results via a genome browser web interface which displays the predicted regulatory information together with the current annotations for the human genome. Our results are validated by comparison to previously published experimental data. SMASH results compare favorably to other existing computational approaches.

UI MeSH Term Description Entries
D008969 Molecular Sequence Data Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories. Sequence Data, Molecular,Molecular Sequencing Data,Data, Molecular Sequence,Data, Molecular Sequencing,Sequencing Data, Molecular
D012045 Regulatory Sequences, Nucleic Acid Nucleic acid sequences involved in regulating the expression of genes. Nucleic Acid Regulatory Sequences,Regulatory Regions, Nucleic Acid (Genetics),Region, Regulatory,Regions, Regulatory,Regulator Regions, Nucleic Acid,Regulatory Region,Regulatory Regions
D002874 Chromosome Mapping Any method used for determining the location of and relative distances between genes on a chromosome. Gene Mapping,Linkage Mapping,Genome Mapping,Chromosome Mappings,Gene Mappings,Genome Mappings,Linkage Mappings,Mapping, Chromosome,Mapping, Gene,Mapping, Genome,Mapping, Linkage,Mappings, Chromosome,Mappings, Gene,Mappings, Genome,Mappings, Linkage
D005786 Gene Expression Regulation Any of the processes by which nuclear, cytoplasmic, or intercellular factors influence the differential control (induction or repression) of gene action at the level of transcription or translation. Gene Action Regulation,Regulation of Gene Expression,Expression Regulation, Gene,Regulation, Gene Action,Regulation, Gene Expression
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D001483 Base Sequence The sequence of PURINES and PYRIMIDINES in nucleic acids and polynucleotides. It is also called nucleotide sequence. DNA Sequence,Nucleotide Sequence,RNA Sequence,DNA Sequences,Base Sequences,Nucleotide Sequences,RNA Sequences,Sequence, Base,Sequence, DNA,Sequence, Nucleotide,Sequence, RNA,Sequences, Base,Sequences, DNA,Sequences, Nucleotide,Sequences, RNA
D012689 Sequence Homology, Nucleic Acid The sequential correspondence of nucleotides in one nucleic acid molecule with those of another nucleic acid molecule. Sequence homology is an indication of the genetic relatedness of different organisms and gene function. Base Sequence Homology,Homologous Sequences, Nucleic Acid,Homologs, Nucleic Acid Sequence,Homology, Base Sequence,Homology, Nucleic Acid Sequence,Nucleic Acid Sequence Homologs,Nucleic Acid Sequence Homology,Sequence Homology, Base,Base Sequence Homologies,Homologies, Base Sequence,Sequence Homologies, Base
D012984 Software Sequential operating programs and data which instruct the functioning of a digital computer. Computer Programs,Computer Software,Open Source Software,Software Engineering,Software Tools,Computer Applications Software,Computer Programs and Programming,Computer Software Applications,Application, Computer Software,Applications Software, Computer,Applications Softwares, Computer,Applications, Computer Software,Computer Applications Softwares,Computer Program,Computer Software Application,Engineering, Software,Open Source Softwares,Program, Computer,Programs, Computer,Software Application, Computer,Software Applications, Computer,Software Tool,Software, Computer,Software, Computer Applications,Software, Open Source,Softwares, Computer Applications,Softwares, Open Source,Source Software, Open,Source Softwares, Open,Tool, Software,Tools, Software
D013045 Species Specificity The restriction of a characteristic behavior, anatomical structure or physical system, such as immune response; metabolic response, or gene or gene variant to the members of one species. It refers to that property which differentiates one species from another but it is also used for phylogenetic levels higher or lower than the species. Species Specificities,Specificities, Species,Specificity, Species

Related Publications

Mihaela Zavolan, and Nicholas D Socci, and Nikolaus Rajewsky, and Terry Gaasterlamd
October 2000, Nature genetics,
Mihaela Zavolan, and Nicholas D Socci, and Nikolaus Rajewsky, and Terry Gaasterlamd
January 1985, Molecular biology and evolution,
Mihaela Zavolan, and Nicholas D Socci, and Nikolaus Rajewsky, and Terry Gaasterlamd
May 2000, Nature genetics,
Mihaela Zavolan, and Nicholas D Socci, and Nikolaus Rajewsky, and Terry Gaasterlamd
July 2006, Genome research,
Mihaela Zavolan, and Nicholas D Socci, and Nikolaus Rajewsky, and Terry Gaasterlamd
December 1995, Computer applications in the biosciences : CABIOS,
Mihaela Zavolan, and Nicholas D Socci, and Nikolaus Rajewsky, and Terry Gaasterlamd
September 1988, Nature,
Mihaela Zavolan, and Nicholas D Socci, and Nikolaus Rajewsky, and Terry Gaasterlamd
June 1988, Trends in biochemical sciences,
Mihaela Zavolan, and Nicholas D Socci, and Nikolaus Rajewsky, and Terry Gaasterlamd
December 2001, DNA sequence : the journal of DNA sequencing and mapping,
Mihaela Zavolan, and Nicholas D Socci, and Nikolaus Rajewsky, and Terry Gaasterlamd
January 2000, Springer seminars in immunopathology,
Mihaela Zavolan, and Nicholas D Socci, and Nikolaus Rajewsky, and Terry Gaasterlamd
April 1995, Anti-cancer drug design,
Copied contents to your clipboard!