Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data. 2012

Alexandros Iliadis, and Dimitris Anastassiou, and Xiaodong Wang
Center for Computational Biology and Bioinformatics and Department of Electrical Engineering, Columbia University, New York, NY, USA.

BACKGROUND Typically, the first phase of a genome wide association study (GWAS) includes genotyping across hundreds of individuals and validation of the most significant SNPs. Allelotyping of pooled genomic DNA is a common approach to reduce the overall cost of the study. Knowledge of haplotype structure can provide additional information to single locus analyses. Several methods have been proposed for estimating haplotype frequencies in a population from pooled DNA data. RESULTS We introduce a technique for haplotype frequency estimation in a population from pooled DNA samples focusing on datasets containing a small number of individuals per pool (2 or 3 individuals) and a large number of markers. We compare our method with the publicly available state-of-the-art algorithms HIPPO and HAPLOPOOL on datasets of varying number of pools and marker sizes. We demonstrate that our algorithm provides improvements in terms of accuracy and computational time over competing methods for large number of markers while demonstrating comparable performance for smaller marker sizes. Our method is implemented in the "Tree-Based Deterministic Sampling Pool" (TDSPool) package which is available for download at http://www.ee.columbia.edu/~anastas/tdspool. CONCLUSIONS Using a tree-based determinstic sampling technique we present an algorithm for haplotype frequency estimation from pooled data. Our method demonstrates superior performance in datasets with large number of markers and could be the method of choice for haplotype frequency estimation in such datasets.

UI MeSH Term Description Entries
D008957 Models, Genetic Theoretical representations that simulate the behavior or activity of genetic processes or phenomena. They include the use of mathematical equations, computers, and other electronic equipment. Genetic Models,Genetic Model,Model, Genetic
D004247 DNA A deoxyribonucleotide polymer that is the primary genetic material of all cells. Eukaryotic and prokaryotic organisms normally contain DNA in a double-stranded state, yet several important biological processes transiently involve single-stranded regions. DNA, which consists of a polysugar-phosphate backbone possessing projections of purines (adenine and guanine) and pyrimidines (thymine and cytosine), forms a double helix that is held together by hydrogen bonds between these purines and pyrimidines (adenine to thymine and guanine to cytosine). DNA, Double-Stranded,Deoxyribonucleic Acid,ds-DNA,DNA, Double Stranded,Double-Stranded DNA,ds DNA
D005787 Gene Frequency The proportion of one particular in the total of all ALLELES for one genetic locus in a breeding POPULATION. Allele Frequency,Genetic Equilibrium,Equilibrium, Genetic,Allele Frequencies,Frequencies, Allele,Frequencies, Gene,Frequency, Allele,Frequency, Gene,Gene Frequencies
D005788 Gene Pool The total genetic information possessed by the reproductive members of a POPULATION of sexually reproducing organisms. Gene Pools,Pool, Gene,Pools, Gene
D005819 Genetic Markers A phenotypically recognizable genetic trait which can be used to identify a genetic locus, a linkage group, or a recombination event. Chromosome Markers,DNA Markers,Markers, DNA,Markers, Genetic,Genetic Marker,Marker, Genetic,Chromosome Marker,DNA Marker,Marker, Chromosome,Marker, DNA,Markers, Chromosome
D006239 Haplotypes The genetic constitution of individuals with respect to one member of a pair of allelic genes, or sets of genes that are closely linked and tend to be inherited together such as those of the MAJOR HISTOCOMPATIBILITY COMPLEX. Haplotype
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D055106 Genome-Wide Association Study An analysis comparing the allele frequencies of all available (or a whole GENOME representative set of) polymorphic markers to identify gene candidates or quantitative trait loci associated with a specific organism trait or specific disease or condition. Genome Wide Association Analysis,Genome Wide Association Study,GWA Study,Genome Wide Association Scan,Genome Wide Association Studies,Whole Genome Association Analysis,Whole Genome Association Study,Association Studies, Genome-Wide,Association Study, Genome-Wide,GWA Studies,Genome-Wide Association Studies,Studies, GWA,Studies, Genome-Wide Association,Study, GWA,Study, Genome-Wide Association
D030541 Databases, Genetic Databases devoted to knowledge about specific genes and gene products. Genetic Databases,Genetic Sequence Databases,OMIM,Online Mendelian Inheritance In Man,Genetic Data Banks,Genetic Data Bases,Genetic Databanks,Genetic Information Databases,Bank, Genetic Data,Banks, Genetic Data,Data Bank, Genetic,Data Banks, Genetic,Data Base, Genetic,Data Bases, Genetic,Databank, Genetic,Databanks, Genetic,Database, Genetic,Database, Genetic Information,Database, Genetic Sequence,Databases, Genetic Information,Databases, Genetic Sequence,Genetic Data Bank,Genetic Data Base,Genetic Databank,Genetic Database,Genetic Information Database,Genetic Sequence Database,Information Database, Genetic,Information Databases, Genetic,Sequence Database, Genetic,Sequence Databases, Genetic

Related Publications

Alexandros Iliadis, and Dimitris Anastassiou, and Xiaodong Wang
February 2015, Bioinformatics (Oxford, England),
Alexandros Iliadis, and Dimitris Anastassiou, and Xiaodong Wang
February 2005, Nucleic acids research,
Alexandros Iliadis, and Dimitris Anastassiou, and Xiaodong Wang
March 2015, Bioinformatics (Oxford, England),
Alexandros Iliadis, and Dimitris Anastassiou, and Xiaodong Wang
June 2003, Proceedings of the National Academy of Sciences of the United States of America,
Alexandros Iliadis, and Dimitris Anastassiou, and Xiaodong Wang
October 2001, American journal of human genetics,
Alexandros Iliadis, and Dimitris Anastassiou, and Xiaodong Wang
March 2010, Journal of computational biology : a journal of computational molecular cell biology,
Alexandros Iliadis, and Dimitris Anastassiou, and Xiaodong Wang
May 2022, GigaScience,
Alexandros Iliadis, and Dimitris Anastassiou, and Xiaodong Wang
August 2013, Tissue antigens,
Alexandros Iliadis, and Dimitris Anastassiou, and Xiaodong Wang
February 2003, American journal of human genetics,
Alexandros Iliadis, and Dimitris Anastassiou, and Xiaodong Wang
February 2009, Bioinformatics (Oxford, England),
Copied contents to your clipboard!