SigHunt: horizontal gene transfer finder optimized for eukaryotic genomes. 2014

Kamil S Jaron, and Jiří C Moravec, and Natália Martínková
Institute of Biostatistics and Analyses, Masaryk University and Institute of Vertebrate Biology, Academy of Sciences of the Czech Republic, Brno, Czech Republic.

Genomic islands (GIs) are DNA fragments incorporated into a genome through horizontal gene transfer (also called lateral gene transfer), often with functions novel for a given organism. While methods for their detection are well researched in prokaryotes, the complexity of eukaryotic genomes makes direct utilization of these methods unreliable, and so labour-intensive phylogenetic searches are used instead. We present a surrogate method that investigates nucleotide base composition of the DNA sequence in a eukaryotic genome and identifies putative GIs. We calculate a genomic signature as a vector of tetranucleotide (4-mer) frequencies using a sliding window approach. Extending the neighbourhood of the sliding window, we establish a local kernel density estimate of the 4-mer frequency. We score the number of 4-mer frequencies in the sliding window that deviate from the credibility interval of their local genomic density using a newly developed discrete interval accumulative score (DIAS). To further improve the effectiveness of DIAS, we select informative 4-mers in a range of organisms using the tetranucleotide quality score developed herein. We show that the SigHunt method is computationally efficient and able to detect GIs in eukaryotic genomes that represent non-ameliorated integration. Thus, it is suited to scanning for change in organisms with different DNA composition. Source code and scripts freely available for download at http://www.iba.muni.cz/index-en.php?pg=research-data-analysis-tools-sighunt are implemented in C and R and are platform-independent. 376090@mail.muni.cz or martinkova@ivb.cz.

UI MeSH Term Description Entries
D010802 Phylogeny The relationships of groups of organisms as reflected by their genetic makeup. Community Phylogenetics,Molecular Phylogenetics,Phylogenetic Analyses,Phylogenetic Analysis,Phylogenetic Clustering,Phylogenetic Comparative Analysis,Phylogenetic Comparative Methods,Phylogenetic Distance,Phylogenetic Generalized Least Squares,Phylogenetic Groups,Phylogenetic Incongruence,Phylogenetic Inference,Phylogenetic Networks,Phylogenetic Reconstruction,Phylogenetic Relatedness,Phylogenetic Relationships,Phylogenetic Signal,Phylogenetic Structure,Phylogenetic Tree,Phylogenetic Trees,Phylogenomics,Analyse, Phylogenetic,Analysis, Phylogenetic,Analysis, Phylogenetic Comparative,Clustering, Phylogenetic,Community Phylogenetic,Comparative Analysis, Phylogenetic,Comparative Method, Phylogenetic,Distance, Phylogenetic,Group, Phylogenetic,Incongruence, Phylogenetic,Inference, Phylogenetic,Method, Phylogenetic Comparative,Molecular Phylogenetic,Network, Phylogenetic,Phylogenetic Analyse,Phylogenetic Clusterings,Phylogenetic Comparative Analyses,Phylogenetic Comparative Method,Phylogenetic Distances,Phylogenetic Group,Phylogenetic Incongruences,Phylogenetic Inferences,Phylogenetic Network,Phylogenetic Reconstructions,Phylogenetic Relatednesses,Phylogenetic Relationship,Phylogenetic Signals,Phylogenetic Structures,Phylogenetic, Community,Phylogenetic, Molecular,Phylogenies,Phylogenomic,Reconstruction, Phylogenetic,Relatedness, Phylogenetic,Relationship, Phylogenetic,Signal, Phylogenetic,Structure, Phylogenetic,Tree, Phylogenetic
D001482 Base Composition The relative amounts of the PURINES and PYRIMIDINES in a nucleic acid. Base Ratio,G+C Composition,Guanine + Cytosine Composition,G+C Content,GC Composition,GC Content,Guanine + Cytosine Content,Base Compositions,Base Ratios,Composition, Base,Composition, G+C,Composition, GC,Compositions, Base,Compositions, G+C,Compositions, GC,Content, G+C,Content, GC,Contents, G+C,Contents, GC,G+C Compositions,G+C Contents,GC Compositions,GC Contents,Ratio, Base,Ratios, Base
D017422 Sequence Analysis, DNA A multistage process that includes cloning, physical mapping, subcloning, determination of the DNA SEQUENCE, and information analysis. DNA Sequence Analysis,Sequence Determination, DNA,Analysis, DNA Sequence,DNA Sequence Determination,DNA Sequence Determinations,DNA Sequencing,Determination, DNA Sequence,Determinations, DNA Sequence,Sequence Determinations, DNA,Analyses, DNA Sequence,DNA Sequence Analyses,Sequence Analyses, DNA,Sequencing, DNA
D044404 Genomic Islands Distinct units in some bacterial, bacteriophage or plasmid GENOMES that are types of MOBILE GENETIC ELEMENTS. Encoded in them are a variety of fitness conferring genes, such as VIRULENCE FACTORS (in "pathogenicity islands or islets"), ANTIBIOTIC RESISTANCE genes, or genes required for SYMBIOSIS (in "symbiosis islands or islets"). They range in size from 10 - 500 kilobases, and their GC CONTENT and CODON usage differ from the rest of the genome. They typically contain an INTEGRASE gene, although in some cases this gene has been deleted resulting in "anchored genomic islands". Pathogenicity Islands,Anchored Genomic Islands,Genomic Islets,Pathogenicity Islets,Symbiosis Islands,Symbiosis Islets,Anchored Genomic Island,Genomic Island,Genomic Island, Anchored,Genomic Islands, Anchored,Genomic Islet,Island, Anchored Genomic,Island, Genomic,Island, Pathogenicity,Island, Symbiosis,Islands, Anchored Genomic,Islands, Genomic,Islands, Pathogenicity,Islands, Symbiosis,Islet, Genomic,Islet, Pathogenicity,Islet, Symbiosis,Islets, Genomic,Islets, Pathogenicity,Islets, Symbiosis,Pathogenicity Island,Pathogenicity Islet,Symbiosis Island,Symbiosis Islet
D056890 Eukaryota One of the three domains of life (the others being BACTERIA and ARCHAEA), also called Eukarya. These are organisms whose cells are enclosed in membranes and possess a nucleus. They comprise almost all multicellular and many unicellular organisms, and are traditionally divided into groups (sometimes called kingdoms) including ANIMALS; PLANTS; FUNGI; and various algae and other taxa that were previously part of the old kingdom Protista. Eukaryotes,Eucarya,Eukarya,Eukaryotas,Eukaryote
D019295 Computational Biology A field of biology concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories for solving biological problems including manipulation of models and datasets. Bioinformatics,Molecular Biology, Computational,Bio-Informatics,Biology, Computational,Computational Molecular Biology,Bio Informatics,Bio-Informatic,Bioinformatic,Biologies, Computational Molecular,Biology, Computational Molecular,Computational Molecular Biologies,Molecular Biologies, Computational
D022761 Gene Transfer, Horizontal The naturally occurring transmission of genetic information between organisms, related or unrelated, circumventing parent-to-offspring transmission. Horizontal gene transfer may occur via a variety of naturally occurring processes such as GENETIC CONJUGATION; GENETIC TRANSDUCTION; and TRANSFECTION. It may result in a change of the recipient organism's genetic composition (TRANSFORMATION, GENETIC). Gene Transfer, Lateral,Horizontal Gene Transfer,Lateral Gene Transfer,Recombination, Interspecies,Recombination, Interspecific,Gene Transfers, Lateral,Interspecies Recombination,Interspecific Recombination,Lateral Gene Transfers
D023281 Genomics The systematic study of the complete DNA sequences (GENOME) of organisms. Included is construction of complete genetic, physical, and transcript maps, and the analysis of this structural genomic information on a global scale such as in GENOME WIDE ASSOCIATION STUDIES. Functional Genomics,Structural Genomics,Comparative Genomics,Genomics, Comparative,Genomics, Functional,Genomics, Structural

Related Publications

Kamil S Jaron, and Jiří C Moravec, and Natália Martínková
October 2015, Toxins,
Kamil S Jaron, and Jiří C Moravec, and Natália Martínková
September 2011, BMC evolutionary biology,
Kamil S Jaron, and Jiří C Moravec, and Natália Martínková
November 2010, Journal of virology,
Kamil S Jaron, and Jiří C Moravec, and Natália Martínková
August 2008, Nature reviews. Genetics,
Kamil S Jaron, and Jiří C Moravec, and Natália Martínková
January 2014, Annual review of phytopathology,
Kamil S Jaron, and Jiří C Moravec, and Natália Martínková
June 2003, Proceedings of the National Academy of Sciences of the United States of America,
Kamil S Jaron, and Jiří C Moravec, and Natália Martínková
July 2023, BioEssays : news and reviews in molecular, cellular and developmental biology,
Kamil S Jaron, and Jiří C Moravec, and Natália Martínková
February 2004, Nature reviews. Genetics,
Kamil S Jaron, and Jiří C Moravec, and Natália Martínková
July 2016, Genetika,
Kamil S Jaron, and Jiří C Moravec, and Natália Martínková
March 1999, Proceedings of the National Academy of Sciences of the United States of America,
Copied contents to your clipboard!