Optimal clustering for detecting near-native conformations in protein docking. 2005

Dima Kozakov, and Karl H Clodfelter, and Sandor Vajda, and Carlos J Camacho
Department of Biomedical Engineering, Boston University, Massachusetts, USA.

Clustering is one of the most powerful tools in computational biology. The conventional wisdom is that events that occur in clusters are probably not random. In protein docking, the underlying principle is that clustering occurs because long-range electrostatic and/or desolvation forces steer the proteins to a low free-energy attractor at the binding region. Something similar occurs in the docking of small molecules, although in this case shorter-range van der Waals forces play a more critical role. Based on the above, we have developed two different clustering strategies to predict docked conformations based on the clustering properties of a uniform sampling of low free-energy protein-protein and protein-small molecule complexes. We report on significant improvements in the automated prediction and discrimination of docked conformations by using the cluster size and consensus as a ranking criterion. We show that the success of clustering depends on identifying the appropriate clustering radius of the system. The clustering radius for protein-protein complexes is consistent with the range of the electrostatics and desolvation free energies (i.e., between 4 and 9 Angstroms); for protein-small molecule docking, the radius is set by van der Waals interactions (i.e., at approximately 2 Angstroms). Without any a priori information, a simple analysis of the histogram of distance separations between the set of docked conformations can evaluate the clustering properties of the data set. Clustering is observed when the histogram is bimodal. Data clustering is optimal if one chooses the clustering radius to be the minimum after the first peak of the bimodal distribution. We show that using this optimal radius further improves the discrimination of near-native complex structures.

UI MeSH Term Description Entries
D008956 Models, Chemical Theoretical representations that simulate the behavior or activity of chemical processes or phenomena; includes the use of mathematical equations, computers, and other electronic equipment. Chemical Models,Chemical Model,Model, Chemical
D008958 Models, Molecular Models used experimentally or theoretically to study molecular shape, electronic properties, or interactions; includes analogous molecules, computer-generated graphics, and mechanical structures. Molecular Models,Model, Molecular,Molecular Model
D008969 Molecular Sequence Data Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories. Sequence Data, Molecular,Molecular Sequencing Data,Data, Molecular Sequence,Data, Molecular Sequencing,Sequencing Data, Molecular
D011485 Protein Binding The process in which substances, either endogenous or exogenous, bind to proteins, peptides, enzymes, protein precursors, or allied compounds. Specific protein-binding measures are often used as assays in diagnostic assessments. Plasma Protein Binding Capacity,Binding, Protein
D011487 Protein Conformation The characteristic 3-dimensional shape of a protein, including the secondary, supersecondary (motifs), tertiary (domains) and quaternary structure of the peptide chain. PROTEIN STRUCTURE, QUATERNARY describes the conformation assumed by multimeric proteins (aggregates of more than one polypeptide chain). Conformation, Protein,Conformations, Protein,Protein Conformations
D011506 Proteins Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein. Gene Products, Protein,Gene Proteins,Protein,Protein Gene Products,Proteins, Gene
D003198 Computer Simulation Computer-based representation of physical systems and phenomena such as chemical processes. Computational Modeling,Computational Modelling,Computer Models,In silico Modeling,In silico Models,In silico Simulation,Models, Computer,Computerized Models,Computer Model,Computer Simulations,Computerized Model,In silico Model,Model, Computer,Model, Computerized,Model, In silico,Modeling, Computational,Modeling, In silico,Modelling, Computational,Simulation, Computer,Simulation, In silico,Simulations, Computer
D000595 Amino Acid Sequence The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION. Protein Structure, Primary,Amino Acid Sequences,Sequence, Amino Acid,Sequences, Amino Acid,Primary Protein Structure,Primary Protein Structures,Protein Structures, Primary,Structure, Primary Protein,Structures, Primary Protein
D001665 Binding Sites The parts of a macromolecule that directly participate in its specific combination with another molecule. Combining Site,Binding Site,Combining Sites,Site, Binding,Site, Combining,Sites, Binding,Sites, Combining
D016000 Cluster Analysis A set of statistical methods used to group variables or observations into strongly inter-related subgroups. In epidemiology, it may be used to analyze a closely grouped series of events or cases of disease or other health-related phenomenon with well-defined distribution patterns in relation to time or place or both. Clustering,Analyses, Cluster,Analysis, Cluster,Cluster Analyses,Clusterings

Related Publications

Dima Kozakov, and Karl H Clodfelter, and Sandor Vajda, and Carlos J Camacho
July 2007, Proteins,
Dima Kozakov, and Karl H Clodfelter, and Sandor Vajda, and Carlos J Camacho
September 1998, Proceedings of the National Academy of Sciences of the United States of America,
Dima Kozakov, and Karl H Clodfelter, and Sandor Vajda, and Carlos J Camacho
January 2007, Genome informatics. International Conference on Genome Informatics,
Dima Kozakov, and Karl H Clodfelter, and Sandor Vajda, and Carlos J Camacho
August 2006, Journal of bioinformatics and computational biology,
Dima Kozakov, and Karl H Clodfelter, and Sandor Vajda, and Carlos J Camacho
May 2009, Proteins,
Dima Kozakov, and Karl H Clodfelter, and Sandor Vajda, and Carlos J Camacho
August 2013, Journal of computational chemistry,
Dima Kozakov, and Karl H Clodfelter, and Sandor Vajda, and Carlos J Camacho
April 2017, BMC bioinformatics,
Dima Kozakov, and Karl H Clodfelter, and Sandor Vajda, and Carlos J Camacho
April 2004, Journal of computational chemistry,
Dima Kozakov, and Karl H Clodfelter, and Sandor Vajda, and Carlos J Camacho
January 2004, Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference,
Dima Kozakov, and Karl H Clodfelter, and Sandor Vajda, and Carlos J Camacho
August 2014, Molecular diversity,
Copied contents to your clipboard!