Biomaterial Database

Seq-SetNet: directly exploiting multiple sequence alignment for protein secondary structure prediction. 2022

Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu

Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.

Accurate prediction of protein structure relies heavily on exploiting multiple sequence alignment (MSA) for residue mutations and correlations as this information specifies protein tertiary structure. The widely used prediction approaches usually transform MSA into inter-mediate models, say position-specific scoring matrix or profile hidden Markov model. These inter-mediate models, however, cannot fully represent residue mutations and correlations carried by MSA; hence, an effective way to directly exploit MSAs is highly desirable. Here, we report a novel sequence set network (called Seq-SetNet) to directly and effectively exploit MSA for protein structure prediction. Seq-SetNet uses an 'encoding and aggregation' strategy that consists of two key elements: (i) an encoding module that takes a component homologue in MSA as input, and encodes residue mutations and correlations into context-specific features for each residue; and (ii) an aggregation module to aggregate the features extracted from all component homologues, which are further transformed into structural properties for residues of the query protein. As Seq-SetNet encodes each homologue protein individually, it could consider both insertions and deletions, as well as long-distance correlations among residues, thus representing more information than the inter-mediate models. Moreover, the encoding module automatically learns effective features and thus avoids manual feature engineering. Using symmetric aggregation functions, Seq-SetNet processes the homologue proteins as a sequence set, making its prediction results invariable to the order of these proteins. On popular benchmark sets, we demonstrated the successful application of Seq-SetNet to predict secondary structure and torsion angles of residues with improved accuracy and efficiency. The code and datasets are available through https://github.com/fusong-ju/Seq-SetNet. Supplementary data are available at Bioinformatics online.

UI	MeSH Term	Description	Entries
D011506	Proteins	Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein.	Gene Products, Protein,Gene Proteins,Protein,Protein Gene Products,Proteins, Gene
D000465	Algorithms	A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task.	Algorithm
D012984	Software	Sequential operating programs and data which instruct the functioning of a digital computer.	Computer Programs,Computer Software,Open Source Software,Software Engineering,Software Tools,Computer Applications Software,Computer Programs and Programming,Computer Software Applications,Application, Computer Software,Applications Software, Computer,Applications Softwares, Computer,Applications, Computer Software,Computer Applications Softwares,Computer Program,Computer Software Application,Engineering, Software,Open Source Softwares,Program, Computer,Programs, Computer,Software Application, Computer,Software Applications, Computer,Software Tool,Software, Computer,Software, Computer Applications,Software, Open Source,Softwares, Computer Applications,Softwares, Open Source,Source Software, Open,Source Softwares, Open,Tool, Software,Tools, Software
D016415	Sequence Alignment	The arrangement of two or more amino acid or base sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms.	Sequence Homology Determination,Determination, Sequence Homology,Alignment, Sequence,Alignments, Sequence,Determinations, Sequence Homology,Sequence Alignments,Sequence Homology Determinations
D017433	Protein Structure, Secondary	The level of protein structure in which regular hydrogen-bond interactions within contiguous stretches of polypeptide chain give rise to ALPHA-HELICES; BETA-STRANDS (which align to form BETA-SHEETS), or other types of coils. This is the first folding level of protein conformation.	Secondary Protein Structure,Protein Structures, Secondary,Secondary Protein Structures,Structure, Secondary Protein,Structures, Secondary Protein
D056510	Position-Specific Scoring Matrices	Tabular numerical representations of sequence motifs displaying their variability as likelihood values for each possible residue at each position in a sequence. Position-specific scoring matrices (PSSMs) are calculated from position frequency matrices.	Position-Specific Weight Matrices,Position Frequency Matrices,Position Weight Matrices,Position Weight Matrix,Position-Specific Scoring Matrix,Position-Specific Weight Matrix,Sequence Logo,Sequence Logos,Frequency Matrices, Position,Logo, Sequence,Logos, Sequence,Matrices, Position Frequency,Matrices, Position Weight,Matrices, Position-Specific Scoring,Matrices, Position-Specific Weight,Matrix, Position Weight,Matrix, Position-Specific Scoring,Matrix, Position-Specific Weight,Position Specific Scoring Matrices,Position Specific Scoring Matrix,Position Specific Weight Matrices,Position Specific Weight Matrix,Scoring Matrices, Position-Specific,Scoring Matrix, Position-Specific,Weight Matrices, Position,Weight Matrices, Position-Specific,Weight Matrix, Position,Weight Matrix, Position-Specific

Related Publications

Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu

Protein multiple sequence alignment benchmarking through secondary structure prediction.

May 2017, Bioinformatics (Oxford, England),

Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu

Integrating protein secondary structure prediction and multiple sequence alignment.

August 2004, Current protein & peptide science,

Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu

Application of multiple sequence alignment profiles to improve protein secondary structure prediction.

August 2000, Proteins,

Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu

The limits of protein secondary structure prediction accuracy from multiple sequence alignment.

December 1993, Journal of molecular biology,

Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu

Bayesian segmental models with multiple sequence alignment profiles for protein secondary structure and contact map prediction.

January 2006, IEEE/ACM transactions on computational biology and bioinformatics,

Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu

Algorithms for multiple protein structure alignment and structure-derived multiple sequence alignment.

January 2008, Methods in molecular biology (Clifton, N.J.),

Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu

Computational methods for protein secondary structure prediction using multiple sequence alignments.

November 2000, Current protein & peptide science,

Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu

Identification of functional residues and secondary structure from protein multiple sequence alignment.

January 1996, Methods in enzymology,

Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu

Evaluation and improvement of multiple sequence methods for protein secondary structure prediction.

March 1999, Proteins,

Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu

Seq-InSite: sequence supersedes structure for protein interaction site prediction.

January 2024, Bioinformatics (Oxford, England),

Seq-SetNet: directly exploiting multiple sequence alignment for protein secondary structure prediction. 2022

Related Publications

SEARCH

RESOURCES

HELP

BIOMATERIAL MARKETPLACE

Selection Actions

Need Help?