Determination of protein folding kinetic types using sequence and predicted secondary structure and solvent accessibility. 2012

Hua Zhang, and Tuo Zhang, and Jianzhao Gao, and Jishou Ruan, and Shiyi Shen, and Lukasz Kurgan
School of Computer Science and Information Engineering, Zhejiang Gongshang University, Hangzhou, People's Republic of China.

Proteins fold through a two-state (TS), with no visible intermediates, or a multi-state (MS), via at least one intermediate, process. We analyze sequence-derived factors that determine folding types by introducing a novel sequence-based folding type predictor called FOKIT. This method implements a logistic regression model with six input features which hybridize information concerning amino acid composition and predicted secondary structure and solvent accessibility. FOKIT provides predictions with average Matthews correlation coefficient (MCC) between 0.58 and 0.91 measured using out-of-sample tests on four benchmark datasets. These results are shown to be competitive or better than results of four modern predictors. We also show that FOKIT outperforms these methods when predicting chains that share low similarity with the chains used to build the model, which is an important advantage given the limited number of annotated chains. We demonstrate that inclusion of solvent accessibility helps in discrimination of the folding kinetic types and that three of the features constitute statistically significant markers that differentiate TS and MS folders. We found that the increased content of exposed Trp and buried Leu are indicative of the MS folding, which implies that the exposure/burial of certain hydrophobic residues may play important role in the formation of the folding intermediates. Our conclusions are supported by two case studies.

UI MeSH Term Description Entries
D007700 Kinetics The rate dynamics in chemical or physical systems.
D011506 Proteins Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein. Gene Products, Protein,Gene Proteins,Protein,Protein Gene Products,Proteins, Gene
D012997 Solvents Liquids that dissolve other substances (solutes), generally solids, without any change in chemical composition, as, water containing sugar. (Grant & Hackh's Chemical Dictionary, 5th ed) Solvent
D016015 Logistic Models Statistical models which describe the relationship between a qualitative dependent variable (that is, one which can take only certain discrete values, such as the presence or absence of a disease) and an independent variable. A common application is in epidemiology for estimating an individual's risk (probability of a disease) as a function of a given risk factor. Logistic Regression,Logit Models,Models, Logistic,Logistic Model,Logistic Regressions,Logit Model,Model, Logistic,Model, Logit,Models, Logit,Regression, Logistic,Regressions, Logistic
D017433 Protein Structure, Secondary The level of protein structure in which regular hydrogen-bond interactions within contiguous stretches of polypeptide chain give rise to ALPHA-HELICES; BETA-STRANDS (which align to form BETA-SHEETS), or other types of coils. This is the first folding level of protein conformation. Secondary Protein Structure,Protein Structures, Secondary,Secondary Protein Structures,Structure, Secondary Protein,Structures, Secondary Protein
D017510 Protein Folding Processes involved in the formation of TERTIARY PROTEIN STRUCTURE. Protein Folding, Globular,Folding, Globular Protein,Folding, Protein,Foldings, Globular Protein,Foldings, Protein,Globular Protein Folding,Globular Protein Foldings,Protein Foldings,Protein Foldings, Globular
D020539 Sequence Analysis, Protein A process that includes the determination of AMINO ACID SEQUENCE of a protein (or peptide, oligopeptide or peptide fragment) and the information analysis of the sequence. Amino Acid Sequence Analysis,Peptide Sequence Analysis,Protein Sequence Analysis,Sequence Determination, Protein,Amino Acid Sequence Analyses,Amino Acid Sequence Determination,Amino Acid Sequence Determinations,Amino Acid Sequencing,Peptide Sequence Determination,Protein Sequencing,Sequence Analyses, Amino Acid,Sequence Analysis, Amino Acid,Sequence Analysis, Peptide,Sequence Determination, Amino Acid,Sequence Determinations, Amino Acid,Acid Sequencing, Amino,Analyses, Peptide Sequence,Analyses, Protein Sequence,Analysis, Peptide Sequence,Analysis, Protein Sequence,Peptide Sequence Analyses,Peptide Sequence Determinations,Protein Sequence Analyses,Protein Sequence Determination,Protein Sequence Determinations,Sequence Analyses, Peptide,Sequence Analyses, Protein,Sequence Determination, Peptide,Sequence Determinations, Peptide,Sequence Determinations, Protein,Sequencing, Amino Acid,Sequencing, Protein
D030562 Databases, Protein Databases containing information about PROTEINS such as AMINO ACID SEQUENCE; PROTEIN CONFORMATION; and other properties. Amino Acid Sequence Databases,Databases, Amino Acid Sequence,Protein Databases,Protein Sequence Databases,SWISS-PROT,Protein Structure Databases,SwissProt,Database, Protein,Database, Protein Sequence,Database, Protein Structure,Databases, Protein Sequence,Databases, Protein Structure,Protein Database,Protein Sequence Database,Protein Structure Database,SWISS PROT,Sequence Database, Protein,Sequence Databases, Protein,Structure Database, Protein,Structure Databases, Protein

Related Publications

Hua Zhang, and Tuo Zhang, and Jianzhao Gao, and Jishou Ruan, and Shiyi Shen, and Lukasz Kurgan
December 2011, BMC bioinformatics,
Hua Zhang, and Tuo Zhang, and Jianzhao Gao, and Jishou Ruan, and Shiyi Shen, and Lukasz Kurgan
October 1980, Biophysical journal,
Hua Zhang, and Tuo Zhang, and Jianzhao Gao, and Jishou Ruan, and Shiyi Shen, and Lukasz Kurgan
June 2004, Proceedings of the National Academy of Sciences of the United States of America,
Hua Zhang, and Tuo Zhang, and Jianzhao Gao, and Jishou Ruan, and Shiyi Shen, and Lukasz Kurgan
June 2007, BMC bioinformatics,
Hua Zhang, and Tuo Zhang, and Jianzhao Gao, and Jishou Ruan, and Shiyi Shen, and Lukasz Kurgan
November 2005, Proteins,
Hua Zhang, and Tuo Zhang, and Jianzhao Gao, and Jishou Ruan, and Shiyi Shen, and Lukasz Kurgan
January 2021, IEEE/ACM transactions on computational biology and bioinformatics,
Hua Zhang, and Tuo Zhang, and Jianzhao Gao, and Jishou Ruan, and Shiyi Shen, and Lukasz Kurgan
December 1999, Biochemistry,
Hua Zhang, and Tuo Zhang, and Jianzhao Gao, and Jishou Ruan, and Shiyi Shen, and Lukasz Kurgan
December 2002, Protein science : a publication of the Protein Society,
Hua Zhang, and Tuo Zhang, and Jianzhao Gao, and Jishou Ruan, and Shiyi Shen, and Lukasz Kurgan
May 1998, Genetics,
Hua Zhang, and Tuo Zhang, and Jianzhao Gao, and Jishou Ruan, and Shiyi Shen, and Lukasz Kurgan
October 2019, Bioinformatics (Oxford, England),
Copied contents to your clipboard!