Transmembrane segment prediction from protein sequence data. 1993

S M Weiss, and D M Cohen, and N Indurkhya
Department of Computer Science, Rutgers University, New Brunswick, New Jersey 08903, USA.

We consider the automated identification of transmembrane domains in membrane protein sequences. 324 proteins (containing 1585 segments) were examined, representing every protein in the PIR database having the transmembrane domain feature annotation. Machine learning techniques were used to evaluate the efficacy of alternative hydrophobicity measures and windowing techniques. We describe a simpler measure of hydrophobicity and a new variable window size concept. We demonstrate that these techniques are superior to some previous techniques in minimizing the segment error rate. Using these new techniques, we describe an algorithm that has a 7.9% segment error rate on the sampled proteins, while classifying 16.7% of the amino acid residues as transmembrane.

UI MeSH Term Description Entries
D008565 Membrane Proteins Proteins which are found in membranes including cellular and intracellular membranes. They consist of two types, peripheral and integral proteins. They include most membrane-associated enzymes, antigenic proteins, transport proteins, and drug, hormone, and lectin receptors. Cell Membrane Protein,Cell Membrane Proteins,Cell Surface Protein,Cell Surface Proteins,Integral Membrane Proteins,Membrane-Associated Protein,Surface Protein,Surface Proteins,Integral Membrane Protein,Membrane Protein,Membrane-Associated Proteins,Membrane Associated Protein,Membrane Associated Proteins,Membrane Protein, Cell,Membrane Protein, Integral,Membrane Proteins, Integral,Protein, Cell Membrane,Protein, Cell Surface,Protein, Integral Membrane,Protein, Membrane,Protein, Membrane-Associated,Protein, Surface,Proteins, Cell Membrane,Proteins, Cell Surface,Proteins, Integral Membrane,Proteins, Membrane,Proteins, Membrane-Associated,Proteins, Surface,Surface Protein, Cell
D008969 Molecular Sequence Data Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories. Sequence Data, Molecular,Molecular Sequencing Data,Data, Molecular Sequence,Data, Molecular Sequencing,Sequencing Data, Molecular
D005544 Forecasting The prediction or projection of the nature of future problems or existing conditions based upon the extrapolation or interpretation of existing scientific data or by the application of scientific methodology. Futurology,Projections and Predictions,Future,Predictions and Projections
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D000595 Amino Acid Sequence The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION. Protein Structure, Primary,Amino Acid Sequences,Sequence, Amino Acid,Sequences, Amino Acid,Primary Protein Structure,Primary Protein Structures,Protein Structures, Primary,Structure, Primary Protein,Structures, Primary Protein
D000596 Amino Acids Organic compounds that generally contain an amino (-NH2) and a carboxyl (-COOH) group. Twenty alpha-amino acids are the subunits which are polymerized to form proteins. Amino Acid,Acid, Amino,Acids, Amino
D001185 Artificial Intelligence Theory and development of COMPUTER SYSTEMS which perform tasks that normally require human intelligence. Such tasks may include speech recognition, LEARNING; VISUAL PERCEPTION; MATHEMATICAL COMPUTING; reasoning, PROBLEM SOLVING, DECISION-MAKING, and translation of language. AI (Artificial Intelligence),Computer Reasoning,Computer Vision Systems,Knowledge Acquisition (Computer),Knowledge Representation (Computer),Machine Intelligence,Computational Intelligence,Acquisition, Knowledge (Computer),Computer Vision System,Intelligence, Artificial,Intelligence, Computational,Intelligence, Machine,Knowledge Representations (Computer),Reasoning, Computer,Representation, Knowledge (Computer),System, Computer Vision,Systems, Computer Vision,Vision System, Computer,Vision Systems, Computer
D015203 Reproducibility of Results The statistical reproducibility of measurements (often in a clinical context), including the testing of instrumentation or techniques to obtain reproducible results. The concept includes reproducibility of physiological measurements, which may be used to develop rules to assess probability or prognosis, or response to a stimulus; reproducibility of occurrence of a condition; and reproducibility of experimental results. Reliability and Validity,Reliability of Result,Reproducibility Of Result,Reproducibility of Finding,Validity of Result,Validity of Results,Face Validity,Reliability (Epidemiology),Reliability of Results,Reproducibility of Findings,Test-Retest Reliability,Validity (Epidemiology),Finding Reproducibilities,Finding Reproducibility,Of Result, Reproducibility,Of Results, Reproducibility,Reliabilities, Test-Retest,Reliability, Test-Retest,Result Reliabilities,Result Reliability,Result Validities,Result Validity,Result, Reproducibility Of,Results, Reproducibility Of,Test Retest Reliability,Validity and Reliability,Validity, Face
D017421 Sequence Analysis A multistage process that includes the determination of a sequence (protein, carbohydrate, etc.), its fragmentation and analysis, and the interpretation of the resulting sequence information. Sequence Determination,Analysis, Sequence,Determination, Sequence,Determinations, Sequence,Sequence Determinations,Analyses, Sequence,Sequence Analyses

Related Publications

S M Weiss, and D M Cohen, and N Indurkhya
December 2021, Interdisciplinary sciences, computational life sciences,
S M Weiss, and D M Cohen, and N Indurkhya
January 2004, In silico biology,
S M Weiss, and D M Cohen, and N Indurkhya
January 1999, In silico biology,
S M Weiss, and D M Cohen, and N Indurkhya
February 2011, Journal of bioinformatics and computational biology,
S M Weiss, and D M Cohen, and N Indurkhya
October 2009, SAR and QSAR in environmental research,
S M Weiss, and D M Cohen, and N Indurkhya
January 2011, PloS one,
S M Weiss, and D M Cohen, and N Indurkhya
November 2012, Nature biotechnology,
S M Weiss, and D M Cohen, and N Indurkhya
January 1990, European journal of cancer (Oxford, England : 1990),
S M Weiss, and D M Cohen, and N Indurkhya
March 2003, Proteins,
S M Weiss, and D M Cohen, and N Indurkhya
April 2007, Sheng wu yi xue gong cheng xue za zhi = Journal of biomedical engineering = Shengwu yixue gongchengxue zazhi,
Copied contents to your clipboard!