Biomaterial Database

Combining sequence and structural profiles for protein solvent accessibility prediction. 2008

Rajkumar Bondugula, and Dong Xu

Digital Biology Laboratory, 110 C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA. raj@bioanalysis.org

Solvent accessibility is an important structural feature for a protein. We propose a new method for solvent accessibility prediction that uses known structure and sequence information more efficiently. We first estimate the relative solvent accessibility of the query protein using fuzzy mean operator from the solvent accessibilities of known structure fragments that have similar sequences to the query protein. We then integrate the estimated solvent accessibility and the position specific scoring matrix of the query protein using a neural network. We tested our method on a large data set consisting of 3386 non-redundant proteins. The comparison with other methods show slightly improved prediction accuracies with our method. The resulting system does need not be re-trained when new data is available. We incorporated our method into the MUPRED system, which is available as a web server at http://digbio.missouri.edu/mupred.

UI	MeSH Term	Description	Entries
D008956	Models, Chemical	Theoretical representations that simulate the behavior or activity of chemical processes or phenomena; includes the use of mathematical equations, computers, and other electronic equipment.	Chemical Models,Chemical Model,Model, Chemical
D008958	Models, Molecular	Models used experimentally or theoretically to study molecular shape, electronic properties, or interactions; includes analogous molecules, computer-generated graphics, and mechanical structures.	Molecular Models,Model, Molecular,Molecular Model
D008969	Molecular Sequence Data	Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories.	Sequence Data, Molecular,Molecular Sequencing Data,Data, Molecular Sequence,Data, Molecular Sequencing,Sequencing Data, Molecular
D011485	Protein Binding	The process in which substances, either endogenous or exogenous, bind to proteins, peptides, enzymes, protein precursors, or allied compounds. Specific protein-binding measures are often used as assays in diagnostic assessments.	Plasma Protein Binding Capacity,Binding, Protein
D011487	Protein Conformation	The characteristic 3-dimensional shape of a protein, including the secondary, supersecondary (motifs), tertiary (domains) and quaternary structure of the peptide chain. PROTEIN STRUCTURE, QUATERNARY describes the conformation assumed by multimeric proteins (aggregates of more than one polypeptide chain).	Conformation, Protein,Conformations, Protein,Protein Conformations
D011506	Proteins	Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein.	Gene Products, Protein,Gene Proteins,Protein,Protein Gene Products,Proteins, Gene
D003198	Computer Simulation	Computer-based representation of physical systems and phenomena such as chemical processes.	Computational Modeling,Computational Modelling,Computer Models,In silico Modeling,In silico Models,In silico Simulation,Models, Computer,Computerized Models,Computer Model,Computer Simulations,Computerized Model,In silico Model,Model, Computer,Model, Computerized,Model, In silico,Modeling, Computational,Modeling, In silico,Modelling, Computational,Simulation, Computer,Simulation, In silico,Simulations, Computer
D000595	Amino Acid Sequence	The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION.	Protein Structure, Primary,Amino Acid Sequences,Sequence, Amino Acid,Sequences, Amino Acid,Primary Protein Structure,Primary Protein Structures,Protein Structures, Primary,Structure, Primary Protein,Structures, Primary Protein
D001665	Binding Sites	The parts of a macromolecule that directly participate in its specific combination with another molecule.	Combining Site,Binding Site,Combining Sites,Site, Binding,Site, Combining,Sites, Binding,Sites, Combining
D012997	Solvents	Liquids that dissolve other substances (solutes), generally solids, without any change in chemical composition, as, water containing sugar. (Grant & Hackh's Chemical Dictionary, 5th ed)	Solvent