Efficient haplotype inference from pedigrees with missing data using linear systems with disjoint-set data structures. 2008

Xin Li, and Jing Li
Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA.

We study the haplotype inference problem from pedigree data under the zero recombination assumption, which is well supported by real data for tightly linked markers (i.e., single nucleotide polymorphisms (SNPs)) over a relatively large chromosome segment. We solve the problem in a rigorous mathematical manner by formulating genotype constraints as a linear system of inheritance variables. We then utilize disjoint-set structures to encode connectivity information among individuals, to detect constraints from genotypes, and to check consistency of constraints. On a tree pedigree without missing data, our algorithm can output a general solution as well as the number of total specific solutions in a nearly linear time O (mn x alpha(n)), where m is the number of loci, n is the number of individuals and alpha is the inverse Ackermann function, which is a further improvement over existing ones. We also extend the idea to looped pedigrees and pedigrees with missing data by considering existing (partial) constraints on inheritance variables. The algorithm has been implemented in C++ and will be incorporated into our PedPhase package. Experimental results show that it can correctly identify all 0-recombinant solutions with great efficiency. Comparisons with other two popular algorithms show that the proposed algorithm achieves 10 to 10(5)-fold improvements over a variety of parameter settings. The experimental study also provides empirical evidences on the complexity bounds suggested by theoretical analysis.

UI MeSH Term Description Entries
D008957 Models, Genetic Theoretical representations that simulate the behavior or activity of genetic processes or phenomena. They include the use of mathematical equations, computers, and other electronic equipment. Genetic Models,Genetic Model,Model, Genetic
D008969 Molecular Sequence Data Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories. Sequence Data, Molecular,Molecular Sequencing Data,Data, Molecular Sequence,Data, Molecular Sequencing,Sequencing Data, Molecular
D010375 Pedigree The record of descent or ancestry, particularly of a particular condition or trait, indicating individual family members, their relationships, and their status with respect to the trait or condition. Family Tree,Genealogical Tree,Genealogic Tree,Genetic Identity,Identity, Genetic,Family Trees,Genealogic Trees,Genealogical Trees,Genetic Identities,Identities, Genetic,Tree, Family,Tree, Genealogic,Tree, Genealogical,Trees, Family,Trees, Genealogic,Trees, Genealogical
D002874 Chromosome Mapping Any method used for determining the location of and relative distances between genes on a chromosome. Gene Mapping,Linkage Mapping,Genome Mapping,Chromosome Mappings,Gene Mappings,Genome Mappings,Linkage Mappings,Mapping, Chromosome,Mapping, Gene,Mapping, Genome,Mapping, Linkage,Mappings, Chromosome,Mappings, Gene,Mappings, Genome,Mappings, Linkage
D003198 Computer Simulation Computer-based representation of physical systems and phenomena such as chemical processes. Computational Modeling,Computational Modelling,Computer Models,In silico Modeling,In silico Models,In silico Simulation,Models, Computer,Computerized Models,Computer Model,Computer Simulations,Computerized Model,In silico Model,Model, Computer,Model, Computerized,Model, In silico,Modeling, Computational,Modeling, In silico,Modelling, Computational,Simulation, Computer,Simulation, In silico,Simulations, Computer
D006239 Haplotypes The genetic constitution of individuals with respect to one member of a pair of allelic genes, or sets of genes that are closely linked and tend to be inherited together such as those of the MAJOR HISTOCOMPATIBILITY COMPLEX. Haplotype
D001483 Base Sequence The sequence of PURINES and PYRIMIDINES in nucleic acids and polynucleotides. It is also called nucleotide sequence. DNA Sequence,Nucleotide Sequence,RNA Sequence,DNA Sequences,Base Sequences,Nucleotide Sequences,RNA Sequences,Sequence, Base,Sequence, DNA,Sequence, Nucleotide,Sequence, RNA,Sequences, Base,Sequences, DNA,Sequences, Nucleotide,Sequences, RNA
D016014 Linear Models Statistical models in which the value of a parameter for a given value of a factor is assumed to be equal to a + bx, where a and b are constants. The models predict a linear regression. Linear Regression,Log-Linear Models,Models, Linear,Linear Model,Linear Regressions,Log Linear Models,Log-Linear Model,Model, Linear,Model, Log-Linear,Models, Log-Linear,Regression, Linear,Regressions, Linear
D016247 Information Storage and Retrieval Organized activities related to the storage, location, search, and retrieval of information. Information Retrieval,Data Files,Data Linkage,Data Retrieval,Data Storage,Data Storage and Retrieval,Information Extraction,Information Storage,Machine-Readable Data Files,Data File,Data File, Machine-Readable,Data Files, Machine-Readable,Extraction, Information,Files, Machine-Readable Data,Information Extractions,Machine Readable Data Files,Machine-Readable Data File,Retrieval, Data,Storage, Data
D017422 Sequence Analysis, DNA A multistage process that includes cloning, physical mapping, subcloning, determination of the DNA SEQUENCE, and information analysis. DNA Sequence Analysis,Sequence Determination, DNA,Analysis, DNA Sequence,DNA Sequence Determination,DNA Sequence Determinations,DNA Sequencing,Determination, DNA Sequence,Determinations, DNA Sequence,Sequence Determinations, DNA,Analyses, DNA Sequence,DNA Sequence Analyses,Sequence Analyses, DNA,Sequencing, DNA

Related Publications

Xin Li, and Jing Li
March 2010, Journal of computational biology : a journal of computational molecular cell biology,
Xin Li, and Jing Li
January 2012, IEEE/ACM transactions on computational biology and bioinformatics,
Xin Li, and Jing Li
August 2004, Genome research,
Xin Li, and Jing Li
May 2011, BMC proceedings,
Xin Li, and Jing Li
January 2019, Journal of biopharmaceutical statistics,
Xin Li, and Jing Li
June 2010, Bioinformatics (Oxford, England),
Xin Li, and Jing Li
January 2012, PLoS genetics,
Copied contents to your clipboard!