Clustering of RNA secondary structures with application to messenger RNAs. 2006

Ye Ding, and Chi Yu Chan, and Charles E Lawrence
Wadsworth Center, New York State Department of Health, Center for Medical Science, 150 New Scotland Avenue, Albany, NY 12208, USA. yding@wadsworth.org

There is growing evidence of translational gene regulation at the mRNA level, and of the important roles of RNA secondary structure in these regulatory processes. Because mRNAs likely exist in a population of structures, the popular free energy minimization approach may not be well suited to prediction of mRNA structures in studies of post-transcriptional regulation. Here, we describe an alternative procedure for the characterization of mRNA structures, in which structures sampled from the Boltzmann-weighted ensemble of RNA secondary structures are clustered. Based on a random sample of full-length human mRNAs, we find that the minimum free energy (MFE) structure often poorly represents the Boltzmann ensemble, that the ensemble often contains multiple structural clusters, and that the centroids of a small number of structural clusters more effectively characterize the ensemble. We show that cluster-level characteristics and statistics are statistically reproducible. In a comparison between mRNAs and structural RNAs, similarity is observed for the number of clusters and the energy gap between the MFE structure and the sampled ensemble. However, for structural RNAs, there are more high-frequency base-pairs in both the Boltzmann ensemble and the clusters, and the clusters are more compact. The clustering features have been incorporated into the Sfold software package for nucleic acid folding and design.

UI MeSH Term Description Entries
D009690 Nucleic Acid Conformation The spatial arrangement of the atoms of a nucleic acid or polynucleotide that results in its characteristic 3-dimensional shape. DNA Conformation,RNA Conformation,Conformation, DNA,Conformation, Nucleic Acid,Conformation, RNA,Conformations, DNA,Conformations, Nucleic Acid,Conformations, RNA,DNA Conformations,Nucleic Acid Conformations,RNA Conformations
D009711 Nucleotides The monomeric units from which DNA or RNA polymers are constructed. They consist of a purine or pyrimidine base, a pentose sugar, and a phosphate group. (From King & Stansfield, A Dictionary of Genetics, 4th ed) Nucleotide
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D012333 RNA, Messenger RNA sequences that serve as templates for protein synthesis. Bacterial mRNAs are generally primary transcripts in that they do not require post-transcriptional processing. Eukaryotic mRNA is synthesized in the nucleus and must be exported to the cytoplasm for translation. Most eukaryotic mRNAs have a sequence of polyadenylic acid at the 3' end, referred to as the poly(A) tail. The function of this tail is not known for certain, but it may play a role in the export of mature mRNA from the nucleus as well as in helping stabilize some mRNA molecules by retarding their degradation in the cytoplasm. Messenger RNA,Messenger RNA, Polyadenylated,Poly(A) Tail,Poly(A)+ RNA,Poly(A)+ mRNA,RNA, Messenger, Polyadenylated,RNA, Polyadenylated,mRNA,mRNA, Non-Polyadenylated,mRNA, Polyadenylated,Non-Polyadenylated mRNA,Poly(A) RNA,Polyadenylated mRNA,Non Polyadenylated mRNA,Polyadenylated Messenger RNA,Polyadenylated RNA,RNA, Polyadenylated Messenger,mRNA, Non Polyadenylated
D013223 Statistics as Topic Works about the science and art of collecting, summarizing, and analyzing data that are subject to random variation. Area Analysis,Estimation Technics,Estimation Techniques,Indirect Estimation Technics,Indirect Estimation Techniques,Multiple Classification Analysis,Service Statistics,Statistical Study,Statistics, Service,Tables and Charts as Topic,Analyses, Area,Analyses, Multiple Classification,Area Analyses,Classification Analyses, Multiple,Classification Analysis, Multiple,Estimation Technic, Indirect,Estimation Technics, Indirect,Estimation Technique,Estimation Technique, Indirect,Estimation Techniques, Indirect,Indirect Estimation Technic,Indirect Estimation Technique,Multiple Classification Analyses,Statistical Studies,Studies, Statistical,Study, Statistical,Technic, Indirect Estimation,Technics, Estimation,Technics, Indirect Estimation,Technique, Estimation,Technique, Indirect Estimation,Techniques, Estimation,Techniques, Indirect Estimation
D013816 Thermodynamics A rigorously mathematical analysis of energy relationships (heat, work, temperature, and equilibrium). It describes systems whose states are determined by thermal parameters, such as temperature, in addition to mechanical and electromagnetic parameters. (From Hawley's Condensed Chemical Dictionary, 12th ed) Thermodynamic
D016000 Cluster Analysis A set of statistical methods used to group variables or observations into strongly inter-related subgroups. In epidemiology, it may be used to analyze a closely grouped series of events or cases of disease or other health-related phenomenon with well-defined distribution patterns in relation to time or place or both. Clustering,Analyses, Cluster,Analysis, Cluster,Cluster Analyses,Clusterings
D016189 Dystrophin A muscle protein localized in surface membranes which is the product of the Duchenne/Becker muscular dystrophy gene. Individuals with Duchenne muscular dystrophy usually lack dystrophin completely while those with Becker muscular dystrophy have dystrophin of an altered size. It shares features with other cytoskeletal proteins such as SPECTRIN and alpha-actinin but the precise function of dystrophin is not clear. One possible role might be to preserve the integrity and alignment of the plasma membrane to the myofibrils during muscle contraction and relaxation. MW 400 kDa.
D017423 Sequence Analysis, RNA A multistage process that includes cloning, physical mapping, subcloning, sequencing, and information analysis of an RNA SEQUENCE. RNA Sequence Analysis,Sequence Determination, RNA,Analysis, RNA Sequence,Determination, RNA Sequence,Determinations, RNA Sequence,RNA Sequence Determination,RNA Sequence Determinations,RNA Sequencing,Sequence Determinations, RNA,Analyses, RNA Sequence,RNA Sequence Analyses,Sequence Analyses, RNA,Sequencing, RNA
D020029 Base Pairing Pairing of purine and pyrimidine bases by HYDROGEN BONDING in double-stranded DNA or RNA. Base Pair,Base Pairs,Base Pairings

Related Publications

Ye Ding, and Chi Yu Chan, and Charles E Lawrence
September 1978, Journal of molecular biology,
Ye Ding, and Chi Yu Chan, and Charles E Lawrence
June 2012, Bioinformatics (Oxford, England),
Ye Ding, and Chi Yu Chan, and Charles E Lawrence
February 2012, Nucleic acids research,
Ye Ding, and Chi Yu Chan, and Charles E Lawrence
May 2023, RNA (New York, N.Y.),
Ye Ding, and Chi Yu Chan, and Charles E Lawrence
February 2011, BMC bioinformatics,
Ye Ding, and Chi Yu Chan, and Charles E Lawrence
August 1980, Nucleic acids research,
Ye Ding, and Chi Yu Chan, and Charles E Lawrence
April 2005, BMC bioinformatics,
Ye Ding, and Chi Yu Chan, and Charles E Lawrence
April 1971, Nature: New biology,
Ye Ding, and Chi Yu Chan, and Charles E Lawrence
October 2010, Journal of biomolecular structure & dynamics,
Ye Ding, and Chi Yu Chan, and Charles E Lawrence
May 1976, Biochemistry,
Copied contents to your clipboard!