Joint estimation of isoform expression and isoform-specific read distribution using multisample RNA-Seq data. 2014

Chen Suo, and Stefano Calza, and Agus Salim, and Yudi Pawitan
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden, Department of Molecular and Translational Medicine, University of Brescia, Italy and Department of Mathematics and Statistics, La Trobe University, Australia.

BACKGROUND RNA-sequencing technologies provide a powerful tool for expression analysis at gene and isoform level, but accurate estimation of isoform abundance is still a challenge. Standard assumption of uniform read intensity would yield biased estimates when the read intensity is in fact non-uniform. The problem is that, without strong assumptions, the read intensity pattern is not identifiable from data observed in a single sample. RESULTS We develop a joint statistical model that accounts for non-uniform isoform-specific read distribution and gene isoform expression estimation. The main challenge is in dealing with the large number of isoform-specific read distributions, which potentially are as many as the number of splice variants in the genome. A statistical regularization via a smoothing penalty is imposed to control the estimation. Also, for identifiability reasons, the method uses information across samples from the same region. We develop a fast and robust computational procedure based on the iterated-weighted least-squares algorithm, and apply it to simulated data and two real RNA-Seq datasets with reverse transcription-polymerase chain reaction validation. Empirical tests show that our model performs better than existing methods in terms of increasing precision in isoform-level estimation. METHODS We have implemented our method in an R package called Sequgio as a pipeline for fast processing of RNA-Seq data.

UI MeSH Term Description Entries
D008099 Liver A large lobed glandular organ in the abdomen of vertebrates that is responsible for detoxification, metabolism, synthesis and storage of various substances. Livers
D001921 Brain The part of CENTRAL NERVOUS SYSTEM that is contained within the skull (CRANIUM). Arising from the NEURAL TUBE, the embryonic brain is comprised of three major parts including PROSENCEPHALON (the forebrain); MESENCEPHALON (the midbrain); and RHOMBENCEPHALON (the hindbrain). The developed brain consists of CEREBRUM; CEREBELLUM; and other structures in the BRAIN STEM. Encephalon
D003198 Computer Simulation Computer-based representation of physical systems and phenomena such as chemical processes. Computational Modeling,Computational Modelling,Computer Models,In silico Modeling,In silico Models,In silico Simulation,Models, Computer,Computerized Models,Computer Model,Computer Simulations,Computerized Model,In silico Model,Model, Computer,Model, Computerized,Model, In silico,Modeling, Computational,Modeling, In silico,Modelling, Computational,Simulation, Computer,Simulation, In silico,Simulations, Computer
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D012333 RNA, Messenger RNA sequences that serve as templates for protein synthesis. Bacterial mRNAs are generally primary transcripts in that they do not require post-transcriptional processing. Eukaryotic mRNA is synthesized in the nucleus and must be exported to the cytoplasm for translation. Most eukaryotic mRNAs have a sequence of polyadenylic acid at the 3' end, referred to as the poly(A) tail. The function of this tail is not known for certain, but it may play a role in the export of mature mRNA from the nucleus as well as in helping stabilize some mRNA molecules by retarding their degradation in the cytoplasm. Messenger RNA,Messenger RNA, Polyadenylated,Poly(A) Tail,Poly(A)+ RNA,Poly(A)+ mRNA,RNA, Messenger, Polyadenylated,RNA, Polyadenylated,mRNA,mRNA, Non-Polyadenylated,mRNA, Polyadenylated,Non-Polyadenylated mRNA,Poly(A) RNA,Polyadenylated mRNA,Non Polyadenylated mRNA,Polyadenylated Messenger RNA,Polyadenylated RNA,RNA, Polyadenylated Messenger,mRNA, Non Polyadenylated
D015233 Models, Statistical Statistical formulations or analyses which, when applied to data and found to fit the data, are then used to verify the assumptions and parameters used in the analysis. Examples of statistical models are the linear model, binomial model, polynomial model, two-parameter model, etc. Probabilistic Models,Statistical Models,Two-Parameter Models,Model, Statistical,Models, Binomial,Models, Polynomial,Statistical Model,Binomial Model,Binomial Models,Model, Binomial,Model, Polynomial,Model, Probabilistic,Model, Two-Parameter,Models, Probabilistic,Models, Two-Parameter,Polynomial Model,Polynomial Models,Probabilistic Model,Two Parameter Models,Two-Parameter Model
D017398 Alternative Splicing A process whereby multiple RNA transcripts are generated from a single gene. Alternative splicing involves the splicing together of other possible sets of EXONS during the processing of some, but not all, transcripts of the gene. Thus a particular exon may be connected to any one of several alternative exons to form a mature RNA. The alternative forms of mature MESSENGER RNA produce PROTEIN ISOFORMS in which one part of the isoforms is common while the other parts are different. RNA Splicing, Alternative,Splicing, Alternative,Alternate Splicing,Nested Transcripts,Alternate Splicings,Alternative RNA Splicing,Alternative RNA Splicings,Alternative Splicings,Nested Transcript,RNA Splicings, Alternative,Splicing, Alternate,Splicing, Alternative RNA,Splicings, Alternate,Splicings, Alternative,Splicings, Alternative RNA,Transcript, Nested,Transcripts, Nested
D017423 Sequence Analysis, RNA A multistage process that includes cloning, physical mapping, subcloning, sequencing, and information analysis of an RNA SEQUENCE. RNA Sequence Analysis,Sequence Determination, RNA,Analysis, RNA Sequence,Determination, RNA Sequence,Determinations, RNA Sequence,RNA Sequence Determination,RNA Sequence Determinations,RNA Sequencing,Sequence Determinations, RNA,Analyses, RNA Sequence,RNA Sequence Analyses,Sequence Analyses, RNA,Sequencing, RNA

Related Publications

Chen Suo, and Stefano Calza, and Agus Salim, and Yudi Pawitan
October 2015, BMC bioinformatics,
Chen Suo, and Stefano Calza, and Agus Salim, and Yudi Pawitan
January 2011, Genome biology,
Chen Suo, and Stefano Calza, and Agus Salim, and Yudi Pawitan
February 2011, Bioinformatics (Oxford, England),
Chen Suo, and Stefano Calza, and Agus Salim, and Yudi Pawitan
December 2015, Journal of bioinformatics and computational biology,
Chen Suo, and Stefano Calza, and Agus Salim, and Yudi Pawitan
February 2014, Nucleic acids research,
Chen Suo, and Stefano Calza, and Agus Salim, and Yudi Pawitan
March 2023, NAR genomics and bioinformatics,
Chen Suo, and Stefano Calza, and Agus Salim, and Yudi Pawitan
February 2010, Bioinformatics (Oxford, England),
Chen Suo, and Stefano Calza, and Agus Salim, and Yudi Pawitan
August 2015, Bioinformatics (Oxford, England),
Chen Suo, and Stefano Calza, and Agus Salim, and Yudi Pawitan
April 2011, Algorithms for molecular biology : AMB,
Chen Suo, and Stefano Calza, and Agus Salim, and Yudi Pawitan
March 2013, Genome research,
Copied contents to your clipboard!