An expert system for processing sequence homology data. 1994

E L Sonnhammer, and R Durbin
Sanger Centre, Hinxton, Cambridge, UK.

When confronted with the task of finding homology to large numbers of sequences, database searching tools such as Blast and Fasta generate prohibitively large amounts of information. An automatic way of making most of the decisions a trained sequence analyst would make was developed by means of a rule-based expert system combined with an algorithm to avoid non-informative biased residue composition matches. The results found relevant by the system are presented in a very concise and clear way, so that the homology can be assessed with minimum effort. The expert system, HSPcrunch, was implemented to process the output to the programs in the BLAST suite. HSPcrunch embodies rules on detecting distant similarities when pairs of weak matches are consistent with a larger gapped alignment, i.e. when Blast has broken a longer gapped alignment up into smaller ungapped ones. This way, more distant similarities can be detected with no or little side-effects of more spurious matches. The rules for how small the gaps must be to be considered significant have been derived empirically. Currently a set of rules are used that operate on two different scoring levels, one for very weak matches that have very small gaps and one for medium weak matches that have slightly larger gaps. This set of rules proved to be robust for most cases and gives high fidelity separation between real homologies and spurious matches. One of the most important rules for reducing the amount of output is to limit the number of overlapping matches to the same region of the query sequence.(ABSTRACT TRUNCATED AT 250 WORDS)

UI MeSH Term Description Entries
D008969 Molecular Sequence Data Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories. Sequence Data, Molecular,Molecular Sequencing Data,Data, Molecular Sequence,Data, Molecular Sequencing,Sequencing Data, Molecular
D005103 Expert Systems Computer programs based on knowledge developed from consultation with experts on a problem, and the processing and/or formalizing of this knowledge using these programs in such a manner that the problems may be solved. Expert System,System, Expert,Systems, Expert
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000595 Amino Acid Sequence The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION. Protein Structure, Primary,Amino Acid Sequences,Sequence, Amino Acid,Sequences, Amino Acid,Primary Protein Structure,Primary Protein Structures,Protein Structures, Primary,Structure, Primary Protein,Structures, Primary Protein
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D012984 Software Sequential operating programs and data which instruct the functioning of a digital computer. Computer Programs,Computer Software,Open Source Software,Software Engineering,Software Tools,Computer Applications Software,Computer Programs and Programming,Computer Software Applications,Application, Computer Software,Applications Software, Computer,Applications Softwares, Computer,Applications, Computer Software,Computer Applications Softwares,Computer Program,Computer Software Application,Engineering, Software,Open Source Softwares,Program, Computer,Programs, Computer,Software Application, Computer,Software Applications, Computer,Software Tool,Software, Computer,Software, Computer Applications,Software, Open Source,Softwares, Computer Applications,Softwares, Open Source,Source Software, Open,Source Softwares, Open,Tool, Software,Tools, Software
D017385 Sequence Homology The degree of similarity between sequences. Studies of AMINO ACID SEQUENCE HOMOLOGY and NUCLEIC ACID SEQUENCE HOMOLOGY provide useful information about the genetic relatedness of genes, gene products, and species. Homologous Sequences,Homologs, Sequence,Sequence Homologs,Homolog, Sequence,Homologies, Sequence,Homologous Sequence,Homology, Sequence,Sequence Homolog,Sequence Homologies,Sequence, Homologous,Sequences, Homologous
D017421 Sequence Analysis A multistage process that includes the determination of a sequence (protein, carbohydrate, etc.), its fragmentation and analysis, and the interpretation of the resulting sequence information. Sequence Determination,Analysis, Sequence,Determination, Sequence,Determinations, Sequence,Sequence Determinations,Analyses, Sequence,Sequence Analyses

Related Publications

E L Sonnhammer, and R Durbin
March 1988, Computer applications in the biosciences : CABIOS,
E L Sonnhammer, and R Durbin
January 1995, Environmental monitoring and assessment,
E L Sonnhammer, and R Durbin
October 1986, Das Offentliche Gesundheitswesen,
E L Sonnhammer, and R Durbin
August 1996, Computer applications in the biosciences : CABIOS,
E L Sonnhammer, and R Durbin
June 2007, Systematic biology,
E L Sonnhammer, and R Durbin
January 1971, The Medical journal of Australia,
E L Sonnhammer, and R Durbin
June 1983, Computers and biomedical research, an international journal,
E L Sonnhammer, and R Durbin
May 1972, Radiology,
Copied contents to your clipboard!