Sann: solvent accessibility prediction of proteins by nearest neighbor method. 2012

Keehyoung Joo, and Sung Jong Lee, and Jooyoung Lee
Center for In Silico Protein Science, Korea Institute for Advanced Study, Korea.

We present a method to predict the solvent accessibility of proteins which is based on a nearest neighbor method applied to the sequence profiles. Using the method, continuous real-value prediction as well as two-state and three-state discrete predictions can be obtained. The method utilizes the z-score value of the distance measure in the feature vector space to estimate the relative contribution among the k-nearest neighbors for prediction of the discrete and continuous solvent accessibility. The Solvent accessibility database is constructed from 5717 proteins extracted from PISCES culling server with the cutoff of 25% sequence identities. Using optimal parameters, the prediction accuracies (for discrete predictions) of 78.38% (two-state prediction with the threshold of 25%), 65.1% (three-state prediction with the thresholds of 9 and 36%), and the Pearson correlation coefficient (between the predicted and true RSA's for continuous prediction) of 0.676 are achieved An independent benchmark test was performed with the CASP8 targets where we find that the proposed method outperforms existing methods. The prediction accuracies are 80.89% (for two state prediction with the threshold of 25%), 67.58% (three-state prediction), and the Pearson correlation coefficient of 0.727 (for continuous prediction) with mean absolute error of 0.148. We have also investigated the effect of increasing database sizes on the prediction accuracy, where additional improvement in the accuracy is observed as the database size increases. The SANN web server is available at http://lee.kias.re.kr/~newton/sann/.

UI MeSH Term Description Entries
D008956 Models, Chemical Theoretical representations that simulate the behavior or activity of chemical processes or phenomena; includes the use of mathematical equations, computers, and other electronic equipment. Chemical Models,Chemical Model,Model, Chemical
D008958 Models, Molecular Models used experimentally or theoretically to study molecular shape, electronic properties, or interactions; includes analogous molecules, computer-generated graphics, and mechanical structures. Molecular Models,Model, Molecular,Molecular Model
D008969 Molecular Sequence Data Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories. Sequence Data, Molecular,Molecular Sequencing Data,Data, Molecular Sequence,Data, Molecular Sequencing,Sequencing Data, Molecular
D011506 Proteins Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein. Gene Products, Protein,Gene Proteins,Protein,Protein Gene Products,Proteins, Gene
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D000595 Amino Acid Sequence The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION. Protein Structure, Primary,Amino Acid Sequences,Sequence, Amino Acid,Sequences, Amino Acid,Primary Protein Structure,Primary Protein Structures,Protein Structures, Primary,Structure, Primary Protein,Structures, Primary Protein
D001185 Artificial Intelligence Theory and development of COMPUTER SYSTEMS which perform tasks that normally require human intelligence. Such tasks may include speech recognition, LEARNING; VISUAL PERCEPTION; MATHEMATICAL COMPUTING; reasoning, PROBLEM SOLVING, DECISION-MAKING, and translation of language. AI (Artificial Intelligence),Computer Reasoning,Computer Vision Systems,Knowledge Acquisition (Computer),Knowledge Representation (Computer),Machine Intelligence,Computational Intelligence,Acquisition, Knowledge (Computer),Computer Vision System,Intelligence, Artificial,Intelligence, Computational,Intelligence, Machine,Knowledge Representations (Computer),Reasoning, Computer,Representation, Knowledge (Computer),System, Computer Vision,Systems, Computer Vision,Vision System, Computer,Vision Systems, Computer
D012997 Solvents Liquids that dissolve other substances (solutes), generally solids, without any change in chemical composition, as, water containing sugar. (Grant & Hackh's Chemical Dictionary, 5th ed) Solvent
D020539 Sequence Analysis, Protein A process that includes the determination of AMINO ACID SEQUENCE of a protein (or peptide, oligopeptide or peptide fragment) and the information analysis of the sequence. Amino Acid Sequence Analysis,Peptide Sequence Analysis,Protein Sequence Analysis,Sequence Determination, Protein,Amino Acid Sequence Analyses,Amino Acid Sequence Determination,Amino Acid Sequence Determinations,Amino Acid Sequencing,Peptide Sequence Determination,Protein Sequencing,Sequence Analyses, Amino Acid,Sequence Analysis, Amino Acid,Sequence Analysis, Peptide,Sequence Determination, Amino Acid,Sequence Determinations, Amino Acid,Acid Sequencing, Amino,Analyses, Peptide Sequence,Analyses, Protein Sequence,Analysis, Peptide Sequence,Analysis, Protein Sequence,Peptide Sequence Analyses,Peptide Sequence Determinations,Protein Sequence Analyses,Protein Sequence Determination,Protein Sequence Determinations,Sequence Analyses, Peptide,Sequence Analyses, Protein,Sequence Determination, Peptide,Sequence Determinations, Peptide,Sequence Determinations, Protein,Sequencing, Amino Acid,Sequencing, Protein
D030562 Databases, Protein Databases containing information about PROTEINS such as AMINO ACID SEQUENCE; PROTEIN CONFORMATION; and other properties. Amino Acid Sequence Databases,Databases, Amino Acid Sequence,Protein Databases,Protein Sequence Databases,SWISS-PROT,Protein Structure Databases,SwissProt,Database, Protein,Database, Protein Sequence,Database, Protein Structure,Databases, Protein Sequence,Databases, Protein Structure,Protein Database,Protein Sequence Database,Protein Structure Database,SWISS PROT,Sequence Database, Protein,Sequence Databases, Protein,Structure Database, Protein,Structure Databases, Protein

Related Publications

Keehyoung Joo, and Sung Jong Lee, and Jooyoung Lee
June 2005, Bioinformatics (Oxford, England),
Keehyoung Joo, and Sung Jong Lee, and Jooyoung Lee
October 2004, Biopolymers,
Keehyoung Joo, and Sung Jong Lee, and Jooyoung Lee
January 2004, Bioinformatics (Oxford, England),
Keehyoung Joo, and Sung Jong Lee, and Jooyoung Lee
June 2008, Amino acids,
Keehyoung Joo, and Sung Jong Lee, and Jooyoung Lee
July 2016, Proceedings of the National Academy of Sciences of the United States of America,
Keehyoung Joo, and Sung Jong Lee, and Jooyoung Lee
August 1993, Journal of molecular biology,
Keehyoung Joo, and Sung Jong Lee, and Jooyoung Lee
April 2005, Journal of computational biology : a journal of computational molecular cell biology,
Keehyoung Joo, and Sung Jong Lee, and Jooyoung Lee
December 2022, Scientific reports,
Keehyoung Joo, and Sung Jong Lee, and Jooyoung Lee
November 2006, Biophysical chemistry,
Keehyoung Joo, and Sung Jong Lee, and Jooyoung Lee
May 2002, Proteins,
Copied contents to your clipboard!