An information-theoretic approach to single cell sequencing analysis. 2023

Michael J Casey, and Jörg Fliege, and Rubén J Sánchez-García, and Ben D MacArthur
Mathematical Sciences, University of Southampton, Southampton, UK.

BACKGROUND Single-cell sequencing (sc-Seq) experiments are producing increasingly large data sets. However, large data sets do not necessarily contain large amounts of information. RESULTS Here, we formally quantify the information obtained from a sc-Seq experiment and show that it corresponds to an intuitive notion of gene expression heterogeneity. We demonstrate a natural relation between our notion of heterogeneity and that of cell type, decomposing heterogeneity into that component attributable to differential expression between cell types (inter-cluster heterogeneity) and that remaining (intra-cluster heterogeneity). We test our definition of heterogeneity as the objective function of a clustering algorithm, and show that it is a useful descriptor for gene expression patterns associated with different cell types. CONCLUSIONS Thus, our definition of gene heterogeneity leads to a biologically meaningful notion of cell type, as groups of cells that are statistically equivalent with respect to their patterns of gene expression. Our measure of heterogeneity, and its decomposition into inter- and intra-cluster, is non-parametric, intrinsic, unbiased, and requires no additional assumptions about expression patterns. Based on this theory, we develop an efficient method for the automatic unsupervised clustering of cells from sc-Seq data, and provide an R package implementation.

UI MeSH Term Description Entries
D000081246 RNA-Seq High-throughput nucleotide sequencing techniques developed for determining and analyzing the composition of the TRANSCRIPTOME of a sample. Whole Transcriptome Shotgun Sequencing
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D016000 Cluster Analysis A set of statistical methods used to group variables or observations into strongly inter-related subgroups. In epidemiology, it may be used to analyze a closely grouped series of events or cases of disease or other health-related phenomenon with well-defined distribution patterns in relation to time or place or both. Clustering,Analyses, Cluster,Analysis, Cluster,Cluster Analyses,Clusterings
D017423 Sequence Analysis, RNA A multistage process that includes cloning, physical mapping, subcloning, sequencing, and information analysis of an RNA SEQUENCE. RNA Sequence Analysis,Sequence Determination, RNA,Analysis, RNA Sequence,Determination, RNA Sequence,Determinations, RNA Sequence,RNA Sequence Determination,RNA Sequence Determinations,RNA Sequencing,Sequence Determinations, RNA,Analyses, RNA Sequence,RNA Sequence Analyses,Sequence Analyses, RNA,Sequencing, RNA
D059010 Single-Cell Analysis Assaying the products of or monitoring various biochemical processes and reactions in an individual cell. Analyses, Single-Cell,Analysis, Single-Cell,Single Cell Analysis,Single-Cell Analyses
D020869 Gene Expression Profiling The determination of the pattern of genes expressed at the level of GENETIC TRANSCRIPTION, under specific circumstances or in a specific cell. Gene Expression Analysis,Gene Expression Pattern Analysis,Transcript Expression Analysis,Transcriptome Profiling,Transcriptomics,mRNA Differential Display,Gene Expression Monitoring,Transcriptome Analysis,Analyses, Gene Expression,Analyses, Transcript Expression,Analyses, Transcriptome,Analysis, Gene Expression,Analysis, Transcript Expression,Analysis, Transcriptome,Differential Display, mRNA,Differential Displays, mRNA,Expression Analyses, Gene,Expression Analysis, Gene,Gene Expression Analyses,Gene Expression Monitorings,Gene Expression Profilings,Monitoring, Gene Expression,Monitorings, Gene Expression,Profiling, Gene Expression,Profiling, Transcriptome,Profilings, Gene Expression,Profilings, Transcriptome,Transcript Expression Analyses,Transcriptome Analyses,Transcriptome Profilings,mRNA Differential Displays

Related Publications

Michael J Casey, and Jörg Fliege, and Rubén J Sánchez-García, and Ben D MacArthur
March 2018, BMC bioinformatics,
Michael J Casey, and Jörg Fliege, and Rubén J Sánchez-García, and Ben D MacArthur
January 2019, Advances in experimental medicine and biology,
Michael J Casey, and Jörg Fliege, and Rubén J Sánchez-García, and Ben D MacArthur
January 2013, PloS one,
Michael J Casey, and Jörg Fliege, and Rubén J Sánchez-García, and Ben D MacArthur
February 2016, Theoretical population biology,
Michael J Casey, and Jörg Fliege, and Rubén J Sánchez-García, and Ben D MacArthur
July 2019, PLoS computational biology,
Michael J Casey, and Jörg Fliege, and Rubén J Sánchez-García, and Ben D MacArthur
January 1999, Spatial vision,
Michael J Casey, and Jörg Fliege, and Rubén J Sánchez-García, and Ben D MacArthur
March 2016, Seminars in cell & developmental biology,
Michael J Casey, and Jörg Fliege, and Rubén J Sánchez-García, and Ben D MacArthur
June 2004, Computers in biology and medicine,
Michael J Casey, and Jörg Fliege, and Rubén J Sánchez-García, and Ben D MacArthur
June 2023, Cell reports methods,
Michael J Casey, and Jörg Fliege, and Rubén J Sánchez-García, and Ben D MacArthur
September 2012, Theory in biosciences = Theorie in den Biowissenschaften,
Copied contents to your clipboard!