Limits of homology detection by pairwise sequence comparison. 2001

R Spang, and M Vingron
Deutsches Krebsforschungszentrum, Theoretische Bioinformatik, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany.

BACKGROUND Noise in database searches resulting from random sequence similarities increases as the databases expand rapidly. The noise problems are not a technical shortcoming of the database search programs, but a logical consequence of the idea of homology searches. The effect can be observed in simulation experiments. RESULTS We have investigated noise levels in pairwise alignment based database searches. The noise levels of 38 releases of the SwissProt database, display perfect logarithmic growth with the total length of the databases. Clustering of real biological sequences reduces noise levels, but the effect is marginal.

UI MeSH Term Description Entries
D008432 Mathematical Computing Computer-assisted interpretation and analysis of various mathematical functions related to a particular problem. Statistical Computing,Computing, Statistical,Mathematic Computing,Statistical Programs, Computer Based,Computing, Mathematic,Computing, Mathematical,Computings, Mathematic,Computings, Mathematical,Computings, Statistical,Mathematic Computings,Mathematical Computings,Statistical Computings
D011506 Proteins Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein. Gene Products, Protein,Gene Proteins,Protein,Protein Gene Products,Proteins, Gene
D003198 Computer Simulation Computer-based representation of physical systems and phenomena such as chemical processes. Computational Modeling,Computational Modelling,Computer Models,In silico Modeling,In silico Models,In silico Simulation,Models, Computer,Computerized Models,Computer Model,Computer Simulations,Computerized Model,In silico Model,Model, Computer,Model, Computerized,Model, In silico,Modeling, Computational,Modeling, In silico,Modelling, Computational,Simulation, Computer,Simulation, In silico,Simulations, Computer
D012689 Sequence Homology, Nucleic Acid The sequential correspondence of nucleotides in one nucleic acid molecule with those of another nucleic acid molecule. Sequence homology is an indication of the genetic relatedness of different organisms and gene function. Base Sequence Homology,Homologous Sequences, Nucleic Acid,Homologs, Nucleic Acid Sequence,Homology, Base Sequence,Homology, Nucleic Acid Sequence,Nucleic Acid Sequence Homologs,Nucleic Acid Sequence Homology,Sequence Homology, Base,Base Sequence Homologies,Homologies, Base Sequence,Sequence Homologies, Base
D015233 Models, Statistical Statistical formulations or analyses which, when applied to data and found to fit the data, are then used to verify the assumptions and parameters used in the analysis. Examples of statistical models are the linear model, binomial model, polynomial model, two-parameter model, etc. Probabilistic Models,Statistical Models,Two-Parameter Models,Model, Statistical,Models, Binomial,Models, Polynomial,Statistical Model,Binomial Model,Binomial Models,Model, Binomial,Model, Polynomial,Model, Probabilistic,Model, Two-Parameter,Models, Probabilistic,Models, Two-Parameter,Polynomial Model,Polynomial Models,Probabilistic Model,Two Parameter Models,Two-Parameter Model
D016208 Databases, Factual Extensive collections, reputedly complete, of facts and data garnered from material of a specialized subject area and made available for analysis and application. The collection can be automated by various contemporary methods for retrieval. The concept should be differentiated from DATABASES, BIBLIOGRAPHIC which is restricted to collections of bibliographic references. Databanks, Factual,Data Banks, Factual,Data Bases, Factual,Data Bank, Factual,Data Base, Factual,Databank, Factual,Database, Factual,Factual Data Bank,Factual Data Banks,Factual Data Base,Factual Data Bases,Factual Databank,Factual Databanks,Factual Database,Factual Databases
D016415 Sequence Alignment The arrangement of two or more amino acid or base sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms. Sequence Homology Determination,Determination, Sequence Homology,Alignment, Sequence,Alignments, Sequence,Determinations, Sequence Homology,Sequence Alignments,Sequence Homology Determinations

Related Publications

R Spang, and M Vingron
January 1998, Journal of computational biology : a journal of computational molecular cell biology,
R Spang, and M Vingron
January 1997, Methods in molecular biology (Clifton, N.J.),
R Spang, and M Vingron
December 2007, CSH protocols,
R Spang, and M Vingron
January 2011, Advances in experimental medicine and biology,
R Spang, and M Vingron
April 2005, Bioinformatics (Oxford, England),
R Spang, and M Vingron
October 2005, Protein and peptide letters,
R Spang, and M Vingron
June 2005, Current opinion in structural biology,
R Spang, and M Vingron
August 2006, Bioinformatics (Oxford, England),
R Spang, and M Vingron
January 1996, European journal of human genetics : EJHG,
Copied contents to your clipboard!