Predicting alpha-helix and beta-strand segments of globular proteins. 1994

V V Solovyev, and A A Salamov
Department of Cell Biology, Baylor College of Medicine, Houston, TX 77030, USA.

All current methods of protein secondary structure prediction are based on evaluation of a single residue state. Although the accuracy of the best of them is approximately 60-70%, for reliable prediction of tertiary structure it is more useful to predict an approximate location of alpha-helix and beta-strand segments, especially prolonged ones. We have developed a simple method for protein secondary structure prediction which is oriented on the location of secondary structure segments. The method uses linear discriminant analysis to assign segments of a given amino acid sequence a particular type of secondary structure, by taking into account the amino acid composition of internal parts of segments as well as their terminal and adjacent regions. Four linear discriminant functions were constructed for recognition of short and long alpha-helix and beta-strand segments respectively. These functions combine three characteristics: hydrophobic moment, segment singlet, and pair preferences to an alpha-helix or beta-strand. The last two characteristics are calculated by summing the preference parameters of single residues and pairs of residues located in a segment and its adjacent regions. The final program SSP predicts all possible potential alpha-helices and beta-strands and resolves some possible overlap between them. Overall three-state (alpha, beta, c) prediction gives approximately 65.1% correctly predicted residues on 126 non-homologous proteins using the jackknife test procedure. Analysis of the prediction results shows a high prediction accuracy of long secondary structure segments (approximately 89% of alpha-helices of length > 8 and approximately 71% of beta-strands of length > 6 are correctly located with probability of correct prediction 0.82 and 0.78 respectively.(ABSTRACT TRUNCATED AT 250 WORDS)

UI MeSH Term Description Entries
D008958 Models, Molecular Models used experimentally or theoretically to study molecular shape, electronic properties, or interactions; includes analogous molecules, computer-generated graphics, and mechanical structures. Molecular Models,Model, Molecular,Molecular Model
D008969 Molecular Sequence Data Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories. Sequence Data, Molecular,Molecular Sequencing Data,Data, Molecular Sequence,Data, Molecular Sequencing,Sequencing Data, Molecular
D011506 Proteins Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein. Gene Products, Protein,Gene Proteins,Protein,Protein Gene Products,Proteins, Gene
D003198 Computer Simulation Computer-based representation of physical systems and phenomena such as chemical processes. Computational Modeling,Computational Modelling,Computer Models,In silico Modeling,In silico Models,In silico Simulation,Models, Computer,Computerized Models,Computer Model,Computer Simulations,Computerized Model,In silico Model,Model, Computer,Model, Computerized,Model, In silico,Modeling, Computational,Modeling, In silico,Modelling, Computational,Simulation, Computer,Simulation, In silico,Simulations, Computer
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D000595 Amino Acid Sequence The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION. Protein Structure, Primary,Amino Acid Sequences,Sequence, Amino Acid,Sequences, Amino Acid,Primary Protein Structure,Primary Protein Structures,Protein Structures, Primary,Structure, Primary Protein,Structures, Primary Protein
D012984 Software Sequential operating programs and data which instruct the functioning of a digital computer. Computer Programs,Computer Software,Open Source Software,Software Engineering,Software Tools,Computer Applications Software,Computer Programs and Programming,Computer Software Applications,Application, Computer Software,Applications Software, Computer,Applications Softwares, Computer,Applications, Computer Software,Computer Applications Softwares,Computer Program,Computer Software Application,Engineering, Software,Open Source Softwares,Program, Computer,Programs, Computer,Software Application, Computer,Software Applications, Computer,Software Tool,Software, Computer,Software, Computer Applications,Software, Open Source,Softwares, Computer Applications,Softwares, Open Source,Source Software, Open,Source Softwares, Open,Tool, Software,Tools, Software
D015394 Molecular Structure The location of the atoms, groups or ions relative to one another in a molecule, as well as the number, type and location of covalent bonds. Structure, Molecular,Molecular Structures,Structures, Molecular
D016002 Discriminant Analysis A statistical analytic technique used with discrete dependent variables, concerned with separating sets of observed values and allocating new values. It is sometimes used instead of regression analysis. Analyses, Discriminant,Analysis, Discriminant,Discriminant Analyses
D016415 Sequence Alignment The arrangement of two or more amino acid or base sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms. Sequence Homology Determination,Determination, Sequence Homology,Alignment, Sequence,Alignments, Sequence,Determinations, Sequence Homology,Sequence Alignments,Sequence Homology Determinations

Related Publications

V V Solovyev, and A A Salamov
August 1976, Journal of molecular biology,
V V Solovyev, and A A Salamov
January 1990, Proteins,
V V Solovyev, and A A Salamov
January 1999, Molekuliarnaia biologiia,
V V Solovyev, and A A Salamov
January 2001, Journal of theoretical biology,
V V Solovyev, and A A Salamov
May 1995, Protein science : a publication of the Protein Society,
V V Solovyev, and A A Salamov
November 1998, Protein engineering,
V V Solovyev, and A A Salamov
September 2006, Journal of theoretical biology,
V V Solovyev, and A A Salamov
February 1999, Protein science : a publication of the Protein Society,
V V Solovyev, and A A Salamov
November 1976, Biochemistry,
Copied contents to your clipboard!