Seq-SetNet: directly exploiting multiple sequence alignment for protein secondary structure prediction. 2022

Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu
Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.

Accurate prediction of protein structure relies heavily on exploiting multiple sequence alignment (MSA) for residue mutations and correlations as this information specifies protein tertiary structure. The widely used prediction approaches usually transform MSA into inter-mediate models, say position-specific scoring matrix or profile hidden Markov model. These inter-mediate models, however, cannot fully represent residue mutations and correlations carried by MSA; hence, an effective way to directly exploit MSAs is highly desirable. Here, we report a novel sequence set network (called Seq-SetNet) to directly and effectively exploit MSA for protein structure prediction. Seq-SetNet uses an 'encoding and aggregation' strategy that consists of two key elements: (i) an encoding module that takes a component homologue in MSA as input, and encodes residue mutations and correlations into context-specific features for each residue; and (ii) an aggregation module to aggregate the features extracted from all component homologues, which are further transformed into structural properties for residues of the query protein. As Seq-SetNet encodes each homologue protein individually, it could consider both insertions and deletions, as well as long-distance correlations among residues, thus representing more information than the inter-mediate models. Moreover, the encoding module automatically learns effective features and thus avoids manual feature engineering. Using symmetric aggregation functions, Seq-SetNet processes the homologue proteins as a sequence set, making its prediction results invariable to the order of these proteins. On popular benchmark sets, we demonstrated the successful application of Seq-SetNet to predict secondary structure and torsion angles of residues with improved accuracy and efficiency. The code and datasets are available through https://github.com/fusong-ju/Seq-SetNet. Supplementary data are available at Bioinformatics online.

UI MeSH Term Description Entries
D011506 Proteins Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein. Gene Products, Protein,Gene Proteins,Protein,Protein Gene Products,Proteins, Gene
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D012984 Software Sequential operating programs and data which instruct the functioning of a digital computer. Computer Programs,Computer Software,Open Source Software,Software Engineering,Software Tools,Computer Applications Software,Computer Programs and Programming,Computer Software Applications,Application, Computer Software,Applications Software, Computer,Applications Softwares, Computer,Applications, Computer Software,Computer Applications Softwares,Computer Program,Computer Software Application,Engineering, Software,Open Source Softwares,Program, Computer,Programs, Computer,Software Application, Computer,Software Applications, Computer,Software Tool,Software, Computer,Software, Computer Applications,Software, Open Source,Softwares, Computer Applications,Softwares, Open Source,Source Software, Open,Source Softwares, Open,Tool, Software,Tools, Software
D016415 Sequence Alignment The arrangement of two or more amino acid or base sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms. Sequence Homology Determination,Determination, Sequence Homology,Alignment, Sequence,Alignments, Sequence,Determinations, Sequence Homology,Sequence Alignments,Sequence Homology Determinations
D017433 Protein Structure, Secondary The level of protein structure in which regular hydrogen-bond interactions within contiguous stretches of polypeptide chain give rise to ALPHA-HELICES; BETA-STRANDS (which align to form BETA-SHEETS), or other types of coils. This is the first folding level of protein conformation. Secondary Protein Structure,Protein Structures, Secondary,Secondary Protein Structures,Structure, Secondary Protein,Structures, Secondary Protein
D056510 Position-Specific Scoring Matrices Tabular numerical representations of sequence motifs displaying their variability as likelihood values for each possible residue at each position in a sequence. Position-specific scoring matrices (PSSMs) are calculated from position frequency matrices. Position-Specific Weight Matrices,Position Frequency Matrices,Position Weight Matrices,Position Weight Matrix,Position-Specific Scoring Matrix,Position-Specific Weight Matrix,Sequence Logo,Sequence Logos,Frequency Matrices, Position,Logo, Sequence,Logos, Sequence,Matrices, Position Frequency,Matrices, Position Weight,Matrices, Position-Specific Scoring,Matrices, Position-Specific Weight,Matrix, Position Weight,Matrix, Position-Specific Scoring,Matrix, Position-Specific Weight,Position Specific Scoring Matrices,Position Specific Scoring Matrix,Position Specific Weight Matrices,Position Specific Weight Matrix,Scoring Matrices, Position-Specific,Scoring Matrix, Position-Specific,Weight Matrices, Position,Weight Matrices, Position-Specific,Weight Matrix, Position,Weight Matrix, Position-Specific

Related Publications

Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu
May 2017, Bioinformatics (Oxford, England),
Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu
August 2004, Current protein & peptide science,
Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu
August 2000, Proteins,
Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu
December 1993, Journal of molecular biology,
Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu
January 2006, IEEE/ACM transactions on computational biology and bioinformatics,
Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu
January 2008, Methods in molecular biology (Clifton, N.J.),
Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu
November 2000, Current protein & peptide science,
Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu
January 1996, Methods in enzymology,
Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu
March 1999, Proteins,
Fusong Ju, and Jianwei Zhu, and Qi Zhang, and Guozheng Wei, and Shiwei Sun, and Wei-Mou Zheng, and Dongbo Bu
January 2024, Bioinformatics (Oxford, England),
Copied contents to your clipboard!