Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses. 2022

Chuanyi Zhang, and Palash Sashittal, and Michael Xiang, and Yichi Zhang, and Ayesha Kazi, and Mohammed El-Kebir
Department of Electrical & Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA.

Transcription regulatory sequences (TRSs), which occur upstream of structural and accessory genes as well as the 5' end of a coronavirus genome, play a critical role in discontinuous transcription in coronaviruses. We introduce two problems collectively aimed at identifying these regulatory sequences as well as their associated genes. First, we formulate the TRS Identification problem of identifying TRS sites in a coronavirus genome sequence with prescribed gene locations. We introduce CORSID-A, an algorithm that solves this problem to optimality in polynomial time. We demonstrate that CORSID-A outperforms existing motif-based methods in identifying TRS sites in coronaviruses. Second, we demonstrate for the first time how TRS sites can be leveraged to identify gene locations in the coronavirus genome. To that end, we formulate the TRS and Gene Identification problem of simultaneously identifying TRS sites and gene locations in unannotated coronavirus genomes. We introduce CORSID to solve this problem, which includes a web-based visualization tool to explore the space of near-optimal solutions. We show that CORSID outperforms state-of-the-art gene finding methods in coronavirus genomes. Furthermore, we demonstrate that CORSID enables de novo identification of TRS sites and genes in previously unannotated coronavirus genomes. CORSID is the first method to perform accurate and simultaneous identification of TRS sites and genes in coronavirus genomes without the use of any prior information.

UI MeSH Term Description Entries
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D012333 RNA, Messenger RNA sequences that serve as templates for protein synthesis. Bacterial mRNAs are generally primary transcripts in that they do not require post-transcriptional processing. Eukaryotic mRNA is synthesized in the nucleus and must be exported to the cytoplasm for translation. Most eukaryotic mRNAs have a sequence of polyadenylic acid at the 3' end, referred to as the poly(A) tail. The function of this tail is not known for certain, but it may play a role in the export of mature mRNA from the nucleus as well as in helping stabilize some mRNA molecules by retarding their degradation in the cytoplasm. Messenger RNA,Messenger RNA, Polyadenylated,Poly(A) Tail,Poly(A)+ RNA,Poly(A)+ mRNA,RNA, Messenger, Polyadenylated,RNA, Polyadenylated,mRNA,mRNA, Non-Polyadenylated,mRNA, Polyadenylated,Non-Polyadenylated mRNA,Poly(A) RNA,Polyadenylated mRNA,Non Polyadenylated mRNA,Polyadenylated Messenger RNA,Polyadenylated RNA,RNA, Polyadenylated Messenger,mRNA, Non Polyadenylated
D012367 RNA, Viral Ribonucleic acid that makes up the genetic material of viruses. Viral RNA
D014158 Transcription, Genetic The biosynthesis of RNA carried out on a template of DNA. The biosynthesis of DNA from an RNA template is called REVERSE TRANSCRIPTION. Genetic Transcription
D017934 Coronavirus A member of CORONAVIRIDAE which causes respiratory or gastrointestinal disease in a variety of vertebrates. Coronavirus, Rabbit,Coronaviruses,Rabbit Coronavirus,Coronaviruses, Rabbit,Rabbit Coronaviruses
D018352 Coronavirus Infections Virus diseases caused by the CORONAVIRUS genus. Some specifics include transmissible enteritis of turkeys (ENTERITIS, TRANSMISSIBLE, OF TURKEYS); FELINE INFECTIOUS PERITONITIS; and transmissible gastroenteritis of swine (GASTROENTERITIS, TRANSMISSIBLE, OF SWINE). Infections, Coronavirus,MERS (Middle East Respiratory Syndrome),Middle East Respiratory Syndrome,Coronavirus Infection,Infection, Coronavirus

Related Publications

Chuanyi Zhang, and Palash Sashittal, and Michael Xiang, and Yichi Zhang, and Ayesha Kazi, and Mohammed El-Kebir
June 2020, bioRxiv : the preprint server for biology,
Chuanyi Zhang, and Palash Sashittal, and Michael Xiang, and Yichi Zhang, and Ayesha Kazi, and Mohammed El-Kebir
April 2021, Molecular biology and evolution,
Chuanyi Zhang, and Palash Sashittal, and Michael Xiang, and Yichi Zhang, and Ayesha Kazi, and Mohammed El-Kebir
November 1986, Molecular and cellular biology,
Chuanyi Zhang, and Palash Sashittal, and Michael Xiang, and Yichi Zhang, and Ayesha Kazi, and Mohammed El-Kebir
February 2022, Viruses,
Chuanyi Zhang, and Palash Sashittal, and Michael Xiang, and Yichi Zhang, and Ayesha Kazi, and Mohammed El-Kebir
January 1986, Annales d'endocrinologie,
Chuanyi Zhang, and Palash Sashittal, and Michael Xiang, and Yichi Zhang, and Ayesha Kazi, and Mohammed El-Kebir
March 2009, Proceedings of the National Academy of Sciences of the United States of America,
Chuanyi Zhang, and Palash Sashittal, and Michael Xiang, and Yichi Zhang, and Ayesha Kazi, and Mohammed El-Kebir
December 2004, Proceedings of the National Academy of Sciences of the United States of America,
Chuanyi Zhang, and Palash Sashittal, and Michael Xiang, and Yichi Zhang, and Ayesha Kazi, and Mohammed El-Kebir
May 2019, Thrombosis and haemostasis,
Chuanyi Zhang, and Palash Sashittal, and Michael Xiang, and Yichi Zhang, and Ayesha Kazi, and Mohammed El-Kebir
January 1984, Advances in experimental medicine and biology,
Chuanyi Zhang, and Palash Sashittal, and Michael Xiang, and Yichi Zhang, and Ayesha Kazi, and Mohammed El-Kebir
September 1996, Gene,
Copied contents to your clipboard!