digIS: towards detecting distant and putative novel insertion sequence elements in prokaryotic genomes. 2021

Janka Puterová, and Tomáš Martínek
IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, Bozetechova 2, 612 66, Brno, Czechia.

BACKGROUND The insertion sequence elements (IS elements) represent the smallest and the most abundant mobile elements in prokaryotic genomes. It has been shown that they play a significant role in genome organization and evolution. To better understand their function in the host genome, it is desirable to have an effective detection and annotation tool. This need becomes even more crucial when considering rapid-growing genomic and metagenomic data. The existing tools for IS elements detection and annotation are usually based on comparing sequence similarity with a database of known IS families. Thus, they have limited ability to discover distant and putative novel IS elements. RESULTS In this paper, we present digIS, a software tool based on profile hidden Markov models assembled from catalytic domains of transposases. It shows a very good performance in detecting known IS elements when tested on datasets with manually curated annotation. The main contribution of digIS is in its ability to detect distant and putative novel IS elements while maintaining a moderate level of false positives. In this category it outperforms existing tools, especially when tested on large datasets of archaeal and bacterial genomes. CONCLUSIONS We provide digIS, a software tool using a novel approach based on manually curated profile hidden Markov models, which is able to detect distant and putative novel IS elements. Although digIS can find known IS elements as well, we expect it to be used primarily by scientists interested in finding novel IS elements. The tool is available at https://github.com/janka2012/digIS.

UI MeSH Term Description Entries
D011387 Prokaryotic Cells Cells lacking a nuclear membrane so that the nuclear material is either scattered in the cytoplasm or collected in a nucleoid region. Cell, Prokaryotic,Cells, Prokaryotic,Prokaryotic Cell
D004251 DNA Transposable Elements Discrete segments of DNA which can excise and reintegrate to another site in the genome. Most are inactive, i.e., have not been found to exist outside the integrated state. DNA transposable elements include bacterial IS (insertion sequence) elements, Tn elements, the maize controlling elements Ac and Ds, Drosophila P, gypsy, and pogo elements, the human Tigger elements and the Tc and mariner elements which are found throughout the animal kingdom. DNA Insertion Elements,DNA Transposons,IS Elements,Insertion Sequence Elements,Tn Elements,Transposable Elements,Elements, Insertion Sequence,Sequence Elements, Insertion,DNA Insertion Element,DNA Transposable Element,DNA Transposon,Element, DNA Insertion,Element, DNA Transposable,Element, IS,Element, Insertion Sequence,Element, Tn,Element, Transposable,Elements, DNA Insertion,Elements, DNA Transposable,Elements, IS,Elements, Tn,Elements, Transposable,IS Element,Insertion Element, DNA,Insertion Elements, DNA,Insertion Sequence Element,Sequence Element, Insertion,Tn Element,Transposable Element,Transposable Element, DNA,Transposable Elements, DNA,Transposon, DNA,Transposons, DNA
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D012984 Software Sequential operating programs and data which instruct the functioning of a digital computer. Computer Programs,Computer Software,Open Source Software,Software Engineering,Software Tools,Computer Applications Software,Computer Programs and Programming,Computer Software Applications,Application, Computer Software,Applications Software, Computer,Applications Softwares, Computer,Applications, Computer Software,Computer Applications Softwares,Computer Program,Computer Software Application,Engineering, Software,Open Source Softwares,Program, Computer,Programs, Computer,Software Application, Computer,Software Applications, Computer,Software Tool,Software, Computer,Software, Computer Applications,Software, Open Source,Softwares, Computer Applications,Softwares, Open Source,Source Software, Open,Source Softwares, Open,Tool, Software,Tools, Software
D016680 Genome, Bacterial The genetic complement of a BACTERIA as represented in its DNA. Bacterial Genome,Bacterial Genomes,Genomes, Bacterial
D023281 Genomics The systematic study of the complete DNA sequences (GENOME) of organisms. Included is construction of complete genetic, physical, and transcript maps, and the analysis of this structural genomic information on a global scale such as in GENOME WIDE ASSOCIATION STUDIES. Functional Genomics,Structural Genomics,Comparative Genomics,Genomics, Comparative,Genomics, Functional,Genomics, Structural

Related Publications

Janka Puterová, and Tomáš Martínek
November 2017, Bioinformatics (Oxford, England),
Janka Puterová, and Tomáš Martínek
October 2006, Current opinion in microbiology,
Janka Puterová, and Tomáš Martínek
January 2018, Mobile DNA,
Janka Puterová, and Tomáš Martínek
January 2006, Nucleic acids research,
Janka Puterová, and Tomáš Martínek
May 2007, Proceedings of the National Academy of Sciences of the United States of America,
Janka Puterová, and Tomáš Martínek
December 2018, BMC genomics,
Janka Puterová, and Tomáš Martínek
April 2007, Molecular biology and evolution,
Janka Puterová, and Tomáš Martínek
May 2021, Molecular systems biology,
Janka Puterová, and Tomáš Martínek
June 1998, Microbiology and molecular biology reviews : MMBR,
Copied contents to your clipboard!