Combining evidence using p-values: application to sequence homology searches. 1998

T L Bailey, and M Gribskov
San Diego Supercomputer Center, CA 92186-9784, USA.

BACKGROUND To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns in sequence homology searches. RESULTS In sequence analysis, two or more (approximately) independent measures of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. An example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence as a p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.

UI MeSH Term Description Entries
D008432 Mathematical Computing Computer-assisted interpretation and analysis of various mathematical functions related to a particular problem. Statistical Computing,Computing, Statistical,Mathematic Computing,Statistical Programs, Computer Based,Computing, Mathematic,Computing, Mathematical,Computings, Mathematic,Computings, Mathematical,Computings, Statistical,Mathematic Computings,Mathematical Computings,Statistical Computings
D011506 Proteins Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein. Gene Products, Protein,Gene Proteins,Protein,Protein Gene Products,Proteins, Gene
D004247 DNA A deoxyribonucleotide polymer that is the primary genetic material of all cells. Eukaryotic and prokaryotic organisms normally contain DNA in a double-stranded state, yet several important biological processes transiently involve single-stranded regions. DNA, which consists of a polysugar-phosphate backbone possessing projections of purines (adenine and guanine) and pyrimidines (thymine and cytosine), forms a double helix that is held together by hydrogen bonds between these purines and pyrimidines (adenine to thymine and guanine to cytosine). DNA, Double-Stranded,Deoxyribonucleic Acid,ds-DNA,DNA, Double Stranded,Double-Stranded DNA,ds DNA
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D012689 Sequence Homology, Nucleic Acid The sequential correspondence of nucleotides in one nucleic acid molecule with those of another nucleic acid molecule. Sequence homology is an indication of the genetic relatedness of different organisms and gene function. Base Sequence Homology,Homologous Sequences, Nucleic Acid,Homologs, Nucleic Acid Sequence,Homology, Base Sequence,Homology, Nucleic Acid Sequence,Nucleic Acid Sequence Homologs,Nucleic Acid Sequence Homology,Sequence Homology, Base,Base Sequence Homologies,Homologies, Base Sequence,Sequence Homologies, Base
D017386 Sequence Homology, Amino Acid The degree of similarity between sequences of amino acids. This information is useful for the analyzing genetic relatedness of proteins and species. Homologous Sequences, Amino Acid,Amino Acid Sequence Homology,Homologs, Amino Acid Sequence,Homologs, Protein Sequence,Homology, Protein Sequence,Protein Sequence Homologs,Protein Sequence Homology,Sequence Homology, Protein,Homolog, Protein Sequence,Homologies, Protein Sequence,Protein Sequence Homolog,Protein Sequence Homologies,Sequence Homolog, Protein,Sequence Homologies, Protein,Sequence Homologs, Protein

Related Publications

T L Bailey, and M Gribskov
April 2015, Bioinformatics (Oxford, England),
T L Bailey, and M Gribskov
January 2003, Applied bioinformatics,
T L Bailey, and M Gribskov
January 2019, Methods in molecular biology (Clifton, N.J.),
T L Bailey, and M Gribskov
January 1994, Methods in molecular biology (Clifton, N.J.),
T L Bailey, and M Gribskov
January 1986, Bulletin of mathematical biology,
T L Bailey, and M Gribskov
November 2020, Statistical applications in genetics and molecular biology,
Copied contents to your clipboard!