SALAI-Net: species-agnostic local ancestry inference network. 2022

Benet Oriol Sabat, and Daniel Mas Montserrat, and Xavier Giro-I-Nieto, and Alexander G Ioannidis
Department of Signal Theory and Communications, Universitat Politecnica de Catalunya, Barcelona 08034, Spain.

Local ancestry inference (LAI) is the high resolution prediction of ancestry labels along a DNA sequence. LAI is important in the study of human history and migrations, and it is beginning to play a role in precision medicine applications including ancestry-adjusted genome-wide association studies (GWASs) and polygenic risk scores (PRSs). Existing LAI models do not generalize well between species, chromosomes or even ancestry groups, requiring re-training for each different setting. Furthermore, such methods can lack interpretability, which is an important element in each of these applications. We present SALAI-Net, a portable statistical LAI method that can be applied on any set of species and ancestries (species-agnostic), requiring only haplotype data and no other biological parameters. Inspired by identity by descent methods, SALAI-Net estimates population labels for each segment of DNA by performing a reference matching approach, which leads to an interpretable and fast technique. We benchmark our models on whole-genome data of humans and we test these models' ability to generalize to dog breeds when trained on human data. SALAI-Net outperforms previous methods in terms of balanced accuracy, while generalizing between different settings, species and datasets. Moreover, it is up to two orders of magnitude faster and uses considerably less RAM memory than competing methods. We provide an open source implementation and links to publicly available data at github.com/AI-sandbox/SALAI-Net. Data is publicly available as follows: https://www.internationalgenome.org (1000 Genomes), https://www.simonsfoundation.org/simons-genome-diversity-project (Simons Genome Diversity Project), https://www.sanger.ac.uk/resources/downloads/human/hapmap3.html (HapMap), ftp://ngs.sanger.ac.uk/production/hgdp/hgdp_wgs.20190516 (Human Genome Diversity Project) and https://www.ncbi.nlm.nih.gov/bioproject/PRJNA448733 (Canid genomes). Supplementary data are available from Bioinformatics online.

UI MeSH Term Description Entries
D004285 Dogs The domestic dog, Canis familiaris, comprising about 400 breeds, of the carnivore family CANIDAE. They are worldwide in distribution and live in association with people. (Walker's Mammals of the World, 5th ed, p1065) Canis familiaris,Dog
D006239 Haplotypes The genetic constitution of individuals with respect to one member of a pair of allelic genes, or sets of genes that are closely linked and tend to be inherited together such as those of the MAJOR HISTOCOMPATIBILITY COMPLEX. Haplotype
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D055106 Genome-Wide Association Study An analysis comparing the allele frequencies of all available (or a whole GENOME representative set of) polymorphic markers to identify gene candidates or quantitative trait loci associated with a specific organism trait or specific disease or condition. Genome Wide Association Analysis,Genome Wide Association Study,GWA Study,Genome Wide Association Scan,Genome Wide Association Studies,Whole Genome Association Analysis,Whole Genome Association Study,Association Studies, Genome-Wide,Association Study, Genome-Wide,GWA Studies,Genome-Wide Association Studies,Studies, GWA,Studies, Genome-Wide Association,Study, GWA,Study, Genome-Wide Association

Related Publications

Benet Oriol Sabat, and Daniel Mas Montserrat, and Xavier Giro-I-Nieto, and Alexander G Ioannidis
November 2013, Bioinformatics (Oxford, England),
Benet Oriol Sabat, and Daniel Mas Montserrat, and Xavier Giro-I-Nieto, and Alexander G Ioannidis
September 2017, BMC genetics,
Benet Oriol Sabat, and Daniel Mas Montserrat, and Xavier Giro-I-Nieto, and Alexander G Ioannidis
January 2020, Scientific reports,
Benet Oriol Sabat, and Daniel Mas Montserrat, and Xavier Giro-I-Nieto, and Alexander G Ioannidis
February 2023, American journal of human genetics,
Benet Oriol Sabat, and Daniel Mas Montserrat, and Xavier Giro-I-Nieto, and Alexander G Ioannidis
October 2020, Scientific reports,
Benet Oriol Sabat, and Daniel Mas Montserrat, and Xavier Giro-I-Nieto, and Alexander G Ioannidis
November 2023, bioRxiv : the preprint server for biology,
Benet Oriol Sabat, and Daniel Mas Montserrat, and Xavier Giro-I-Nieto, and Alexander G Ioannidis
July 2014, Scientific reports,
Benet Oriol Sabat, and Daniel Mas Montserrat, and Xavier Giro-I-Nieto, and Alexander G Ioannidis
May 2012, Bioinformatics (Oxford, England),
Benet Oriol Sabat, and Daniel Mas Montserrat, and Xavier Giro-I-Nieto, and Alexander G Ioannidis
November 2018, Genetics,
Benet Oriol Sabat, and Daniel Mas Montserrat, and Xavier Giro-I-Nieto, and Alexander G Ioannidis
May 2024, Molecular ecology resources,
Copied contents to your clipboard!