Protein design and variant prediction using autoregressive generative models. 2021

Jung-Eun Shin, and Adam J Riesselman, and Aaron W Kollasch, and Conor McMahon, and Elana Simon, and Chris Sander, and Aashish Manglik, and Andrew C Kruse, and Debora S Marks
Department of Systems Biology, Harvard Medical School, Boston, MA, USA.

The ability to design functional sequences and predict effects of variation is central to protein engineering and biotherapeutics. State-of-art computational methods rely on models that leverage evolutionary information but are inadequate for important applications where multiple sequence alignments are not robust. Such applications include the prediction of variant effects of indels, disordered proteins, and the design of proteins such as antibodies due to the highly variable complementarity determining regions. We introduce a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments. The model performs state-of-art prediction of missense and indel effects and we successfully design and test a diverse 105-nanobody library that shows better expression than a 1000-fold larger synthetic library. Our results demonstrate the power of the alignment-free autoregressive model in generalizing to regions of sequence space traditionally considered beyond the reach of prediction and design.

UI MeSH Term Description Entries
D009154 Mutation Any detectable and heritable change in the genetic material that causes a change in the GENOTYPE and which is transmitted to daughter cells and to succeeding generations. Mutations
D010641 Phenotype The outward appearance of the individual. It is the product of interactions between genes, and between the GENOTYPE and the environment. Phenotypes
D011506 Proteins Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein. Gene Products, Protein,Gene Proteins,Protein,Protein Gene Products,Proteins, Gene
D005838 Genotype The genetic constitution of the individual, comprising the ALLELES present at each GENETIC LOCUS. Genogroup,Genogroups,Genotypes
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D000595 Amino Acid Sequence The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION. Protein Structure, Primary,Amino Acid Sequences,Sequence, Amino Acid,Sequences, Amino Acid,Primary Protein Structure,Primary Protein Structures,Protein Structures, Primary,Structure, Primary Protein,Structures, Primary Protein
D000906 Antibodies Immunoglobulin molecules having a specific amino acid sequence by virtue of which they interact only with the ANTIGEN (or a very similar shape) that induced their synthesis in cells of the lymphoid series (especially PLASMA CELLS).
D000941 Antigens Substances that are recognized by the immune system and induce an immune reaction. Antigen
D015202 Protein Engineering Procedures by which protein structure and function are changed or created in vitro by altering existing or synthesizing new structural genes that direct the synthesis of proteins with sought-after properties. Such procedures may include the design of MOLECULAR MODELS of proteins using COMPUTER GRAPHICS or other molecular modeling techniques; site-specific mutagenesis (MUTAGENESIS, SITE-SPECIFIC) of existing genes; and DIRECTED MOLECULAR EVOLUTION techniques to create new genes. Genetic Engineering of Proteins,Genetic Engineering, Protein,Proteins, Genetic Engineering,Engineering, Protein,Engineering, Protein Genetic,Protein Genetic Engineering

Related Publications

Jung-Eun Shin, and Adam J Riesselman, and Aaron W Kollasch, and Conor McMahon, and Elana Simon, and Chris Sander, and Aashish Manglik, and Andrew C Kruse, and Debora S Marks
October 2021, Nature communications,
Jung-Eun Shin, and Adam J Riesselman, and Aaron W Kollasch, and Conor McMahon, and Elana Simon, and Chris Sander, and Aashish Manglik, and Andrew C Kruse, and Debora S Marks
April 2022, Nature communications,
Jung-Eun Shin, and Adam J Riesselman, and Aaron W Kollasch, and Conor McMahon, and Elana Simon, and Chris Sander, and Aashish Manglik, and Andrew C Kruse, and Debora S Marks
September 2023, Nature genetics,
Jung-Eun Shin, and Adam J Riesselman, and Aaron W Kollasch, and Conor McMahon, and Elana Simon, and Chris Sander, and Aashish Manglik, and Andrew C Kruse, and Debora S Marks
December 2021, Current opinion in chemical biology,
Jung-Eun Shin, and Adam J Riesselman, and Aaron W Kollasch, and Conor McMahon, and Elana Simon, and Chris Sander, and Aashish Manglik, and Andrew C Kruse, and Debora S Marks
November 2021, Nature,
Jung-Eun Shin, and Adam J Riesselman, and Aaron W Kollasch, and Conor McMahon, and Elana Simon, and Chris Sander, and Aashish Manglik, and Andrew C Kruse, and Debora S Marks
April 2023, ArXiv,
Jung-Eun Shin, and Adam J Riesselman, and Aaron W Kollasch, and Conor McMahon, and Elana Simon, and Chris Sander, and Aashish Manglik, and Andrew C Kruse, and Debora S Marks
December 2021, Sensors (Basel, Switzerland),
Jung-Eun Shin, and Adam J Riesselman, and Aaron W Kollasch, and Conor McMahon, and Elana Simon, and Chris Sander, and Aashish Manglik, and Andrew C Kruse, and Debora S Marks
July 2022, Journal of chemical information and modeling,
Jung-Eun Shin, and Adam J Riesselman, and Aaron W Kollasch, and Conor McMahon, and Elana Simon, and Chris Sander, and Aashish Manglik, and Andrew C Kruse, and Debora S Marks
January 2022, Nature,
Jung-Eun Shin, and Adam J Riesselman, and Aaron W Kollasch, and Conor McMahon, and Elana Simon, and Chris Sander, and Aashish Manglik, and Andrew C Kruse, and Debora S Marks
April 2024, Scientific reports,
Copied contents to your clipboard!