High rates of phasing errors in highly polymorphic species with low levels of linkage disequilibrium. 2016

Marek Bukowicki, and Susanne U Franssen, and Christian Schlötterer
Institut für Populationsgenetik, Vetmeduni Vienna, 1210 Wien, Veterinärplatz 1, Austria.

Short read sequencing of diploid individuals does not permit the direct inference of the sequence on each of the two homologous chromosomes. Although various phasing software packages exist, they were primarily tailored for and tested on human data, which differ from other species in factors that influence phasing, such as SNP density, amounts of linkage disequilibrium (LD) and sample sizes. Despite becoming increasingly popular for other species, the reliability of phasing in non-human data has not been evaluated to a sufficient extent. We scrutinized the phasing accuracy for Drosophila melanogaster, a species with high polymorphism levels and reduced LD relative to humans. We phased two D. melanogaster populations and compared the results to the known haplotypes. The performance increased with size of the reference panel and was highest when the reference panel and phased individuals were from the same population. Full genomic SNP data and inclusion of sequence read information also improved phasing. Despite humans and Drosophila having similar switch error rates between polymorphic sites, the distances between switch errors were much shorter in Drosophila with only fragments <300-1500 bp being correctly phased with ≥95% confidence. This suggests that the higher SNP density cannot compensate for the higher recombination rate in D. melanogaster. Furthermore, we show that populations that have gone through demographic events such as bottlenecks can be phased with higher accuracy. Our results highlight that statistically phased data are particularly error prone in species with large population sizes or populations lacking suitable reference panels.

UI MeSH Term Description Entries
D003951 Diagnostic Errors Incorrect or incomplete diagnoses following clinical or technical diagnostic procedures. Diagnostic Blind Spots,Errors, Diagnostic,Misdiagnosis,Blind Spot, Diagnostic,Blind Spots, Diagnostic,Diagnostic Blind Spot,Diagnostic Error,Error, Diagnostic,Misdiagnoses
D004331 Drosophila melanogaster A species of fruit fly frequently used in genetics because of the large size of its chromosomes. D. melanogaster,Drosophila melanogasters,melanogaster, Drosophila
D005828 Genetics, Population The discipline studying genetic composition of populations and effects of factors such as GENETIC SELECTION, population size, MUTATION, migration, and GENETIC DRIFT on the frequencies of various GENOTYPES and PHENOTYPES using a variety of GENETIC TECHNIQUES. Population Genetics
D006239 Haplotypes The genetic constitution of individuals with respect to one member of a pair of allelic genes, or sets of genes that are closely linked and tend to be inherited together such as those of the MAJOR HISTOCOMPATIBILITY COMPLEX. Haplotype
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D014644 Genetic Variation Genotypic differences observed among individuals in a population. Genetic Diversity,Variation, Genetic,Diversity, Genetic,Diversities, Genetic,Genetic Diversities,Genetic Variations,Variations, Genetic
D015810 Linkage Disequilibrium Nonrandom association of linked genes. This is the tendency of the alleles of two separate but already linked loci to be found together more frequently than would be expected by chance alone. Disequilibrium, Linkage,Disequilibriums, Linkage,Linkage Disequilibriums
D056808 Biostatistics The application of STATISTICS to biological systems and organisms involving the retrieval or collection, analysis, reduction, and interpretation of qualitative and quantitative data. Biological Statistics,Biological Statistic,Statistic, Biological,Statistics, Biological
D060005 Genotyping Techniques Methods used to determine individuals' specific ALLELES or SNPS (single nucleotide polymorphisms). Genotype Assignment Methodology,Genotype Calling Methods,Genotype Determination Methods,Assignment Methodologies, Genotype,Assignment Methodology, Genotype,Calling Method, Genotype,Calling Methods, Genotype,Determination Method, Genotype,Determination Methods, Genotype,Genotype Assignment Methodologies,Genotype Calling Method,Genotype Determination Method,Genotyping Technique,Method, Genotype Calling,Method, Genotype Determination,Methodologies, Genotype Assignment,Methodology, Genotype Assignment,Methods, Genotype Calling,Methods, Genotype Determination,Technique, Genotyping,Techniques, Genotyping
D019295 Computational Biology A field of biology concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories for solving biological problems including manipulation of models and datasets. Bioinformatics,Molecular Biology, Computational,Bio-Informatics,Biology, Computational,Computational Molecular Biology,Bio Informatics,Bio-Informatic,Bioinformatic,Biologies, Computational Molecular,Biology, Computational Molecular,Computational Molecular Biologies,Molecular Biologies, Computational

Related Publications

Marek Bukowicki, and Susanne U Franssen, and Christian Schlötterer
November 1991, American journal of human genetics,
Marek Bukowicki, and Susanne U Franssen, and Christian Schlötterer
December 1996, Human immunology,
Marek Bukowicki, and Susanne U Franssen, and Christian Schlötterer
February 2005, Proceedings of the National Academy of Sciences of the United States of America,
Marek Bukowicki, and Susanne U Franssen, and Christian Schlötterer
January 1994, Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie,
Marek Bukowicki, and Susanne U Franssen, and Christian Schlötterer
September 2010, Tissue antigens,
Marek Bukowicki, and Susanne U Franssen, and Christian Schlötterer
June 2018, Genetics,
Marek Bukowicki, and Susanne U Franssen, and Christian Schlötterer
July 1986, South African medical journal = Suid-Afrikaanse tydskrif vir geneeskunde,
Marek Bukowicki, and Susanne U Franssen, and Christian Schlötterer
May 2014, Gene,
Marek Bukowicki, and Susanne U Franssen, and Christian Schlötterer
August 2013, Journal of animal science and biotechnology,
Marek Bukowicki, and Susanne U Franssen, and Christian Schlötterer
January 1994, American journal of human genetics,
Copied contents to your clipboard!