PhaGenus: genus-level classification of bacteriophages using a Transformer model. 2023

Jiaojiao Guan, and Cheng Peng, and Jiayu Shang, and Xubo Tang, and Yanni Sun
Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong (SAR), China.

Bacteriophages (phages for short), which prey on and replicate within bacterial cells, have a significant role in modulating microbial communities and hold potential applications in treating antibiotic resistance. The advancement of high-throughput sequencing technology contributes to the discovery of phages tremendously. However, the taxonomic classification of assembled phage contigs still faces several challenges, including high genetic diversity, lack of a stable taxonomy system and limited knowledge of phage annotations. Despite extensive efforts, existing tools have not yet achieved an optimal balance between prediction rate and accuracy. In this work, we develop a learning-based model named PhaGenus, which conducts genus-level taxonomic classification for phage contigs. PhaGenus utilizes a powerful Transformer model to learn the association between protein clusters and support the classification of up to 508 genera. We tested PhaGenus on four datasets in different scenarios. The experimental results show that PhaGenus outperforms state-of-the-art methods in predicting low-similarity datasets, achieving an improvement of at least 13.7%. Additionally, PhaGenus is highly effective at identifying previously uncharacterized genera that are not represented in reference databases, with an improvement of 8.52%. The analysis of the infants' gut and GOV2.0 dataset demonstrates that PhaGenus can be used to classify more contigs with higher accuracy.

UI MeSH Term Description Entries
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D001435 Bacteriophages Viruses whose hosts are bacterial cells. Phages,Bacteriophage,Phage
D059014 High-Throughput Nucleotide Sequencing Techniques of nucleotide sequence analysis that increase the range, complexity, sensitivity, and accuracy of results by greatly increasing the scale of operations and thus the number of nucleotides, and the number of copies of each nucleotide sequenced. The sequencing may be done by analysis of the synthesis or ligation products, hybridization to preexisting sequences, etc. High-Throughput Sequencing,Illumina Sequencing,Ion Proton Sequencing,Ion Torrent Sequencing,Next-Generation Sequencing,Deep Sequencing,High-Throughput DNA Sequencing,High-Throughput RNA Sequencing,Massively-Parallel Sequencing,Pyrosequencing,DNA Sequencing, High-Throughput,High Throughput DNA Sequencing,High Throughput Nucleotide Sequencing,High Throughput RNA Sequencing,High Throughput Sequencing,Massively Parallel Sequencing,Next Generation Sequencing,Nucleotide Sequencing, High-Throughput,RNA Sequencing, High-Throughput,Sequencing, Deep,Sequencing, High-Throughput,Sequencing, High-Throughput DNA,Sequencing, High-Throughput Nucleotide,Sequencing, High-Throughput RNA,Sequencing, Illumina,Sequencing, Ion Proton,Sequencing, Ion Torrent,Sequencing, Massively-Parallel,Sequencing, Next-Generation
D064307 Microbiota The full collection of microbes (bacteria, fungi, virus, etc.) that naturally exist within a particular biological niche such as an organism, soil, a body of water, etc. Human Microbiome,Microbiome,Microbiome, Human,Microbial Community,Microbial Community Composition,Microbial Community Structure,Community Composition, Microbial,Community Structure, Microbial,Community, Microbial,Composition, Microbial Community,Human Microbiomes,Microbial Communities,Microbial Community Compositions,Microbial Community Structures,Microbiomes,Microbiotas

Related Publications

Jiaojiao Guan, and Cheng Peng, and Jiayu Shang, and Xubo Tang, and Yanni Sun
July 2022, Briefings in bioinformatics,
Jiaojiao Guan, and Cheng Peng, and Jiayu Shang, and Xubo Tang, and Yanni Sun
January 2024, Sensors (Basel, Switzerland),
Jiaojiao Guan, and Cheng Peng, and Jiayu Shang, and Xubo Tang, and Yanni Sun
November 2022, BMC bioinformatics,
Jiaojiao Guan, and Cheng Peng, and Jiayu Shang, and Xubo Tang, and Yanni Sun
June 1973, Shigaku = Odontology; journal of Nihon Dental College,
Jiaojiao Guan, and Cheng Peng, and Jiayu Shang, and Xubo Tang, and Yanni Sun
September 2023, Journal of neural engineering,
Jiaojiao Guan, and Cheng Peng, and Jiayu Shang, and Xubo Tang, and Yanni Sun
December 2023, BMJ open ophthalmology,
Jiaojiao Guan, and Cheng Peng, and Jiayu Shang, and Xubo Tang, and Yanni Sun
December 2021, Physics in medicine and biology,
Jiaojiao Guan, and Cheng Peng, and Jiayu Shang, and Xubo Tang, and Yanni Sun
September 2022, Computers in biology and medicine,
Jiaojiao Guan, and Cheng Peng, and Jiayu Shang, and Xubo Tang, and Yanni Sun
June 2022, IEEE journal of biomedical and health informatics,
Jiaojiao Guan, and Cheng Peng, and Jiayu Shang, and Xubo Tang, and Yanni Sun
July 2023, Neural networks : the official journal of the International Neural Network Society,
Copied contents to your clipboard!