Statistical approach for selection of biologically informative genes. 2018

Samarendra Das, and Anil Rai, and D C Mishra, and Shesh N Rai
Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India; Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India. Electronic address: samarendra.das@louisville.edu.

Selection of informative genes from high dimensional gene expression data has emerged as an important research area in genomics. Many gene selection techniques have been proposed so far are either based on relevancy or redundancy measure. Further, the performance of these techniques has been adjudged through post selection classification accuracy computed through a classifier using the selected genes. This performance metric may be statistically sound but may not be biologically relevant. A statistical approach, i.e. Boot-MRMR, was proposed based on a composite measure of maximum relevance and minimum redundancy, which is both statistically sound and biologically relevant for informative gene selection. For comparative evaluation of the proposed approach, we developed two biological sufficient criteria, i.e. Gene Set Enrichment with QTL (GSEQ) and biological similarity score based on Gene Ontology (GO). Further, a systematic and rigorous evaluation of the proposed technique with 12 existing gene selection techniques was carried out using five gene expression datasets. This evaluation was based on a broad spectrum of statistically sound (e.g. subject classification) and biological relevant (based on QTL and GO) criteria under a multiple criteria decision-making framework. The performance analysis showed that the proposed technique selects informative genes which are more biologically relevant. The proposed technique is also found to be quite competitive with the existing techniques with respect to subject classification and computational time. Our results also showed that under the multiple criteria decision-making setup, the proposed technique is best for informative gene selection over the available alternatives. Based on the proposed approach, an R Package, i.e. BootMRMR has been developed and available at https://cran.r-project.org/web/packages/BootMRMR. This study will provide a practical guide to select statistical techniques for selecting informative genes from high dimensional expression data for breeding and system biology studies.

UI MeSH Term Description Entries
D003627 Data Interpretation, Statistical Application of statistical procedures to analyze specific observed or assumed facts from a particular study. Data Analysis, Statistical,Data Interpretations, Statistical,Interpretation, Statistical Data,Statistical Data Analysis,Statistical Data Interpretation,Analyses, Statistical Data,Analysis, Statistical Data,Data Analyses, Statistical,Interpretations, Statistical Data,Statistical Data Analyses,Statistical Data Interpretations
D005796 Genes A category of nucleic acid sequences that function as units of heredity and which code for the basic instructions for the development, reproduction, and maintenance of organisms. Cistron,Gene,Genetic Materials,Cistrons,Genetic Material,Material, Genetic,Materials, Genetic
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D015894 Genome, Human The complete genetic complement contained in the DNA of a set of CHROMOSOMES in a HUMAN. The length of the human genome is about 3 billion base pairs. Human Genome,Genomes, Human,Human Genomes
D018401 Sample Size The number of units (persons, animals, patients, specified circumstances, etc.) in a population to be studied. The sample size should be big enough to have a high likelihood of detecting a true difference between two groups. (From Wassertheil-Smoller, Biostatistics and Epidemiology, 1990, p95) Sample Sizes,Size, Sample,Sizes, Sample
D019295 Computational Biology A field of biology concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories for solving biological problems including manipulation of models and datasets. Bioinformatics,Molecular Biology, Computational,Bio-Informatics,Biology, Computational,Computational Molecular Biology,Bio Informatics,Bio-Informatic,Bioinformatic,Biologies, Computational Molecular,Biology, Computational Molecular,Computational Molecular Biologies,Molecular Biologies, Computational
D020411 Oligonucleotide Array Sequence Analysis Hybridization of a nucleic acid sample to a very large set of OLIGONUCLEOTIDE PROBES, which have been attached individually in columns and rows to a solid support, to determine a BASE SEQUENCE, or to detect variations in a gene sequence, GENE EXPRESSION, or for GENE MAPPING. DNA Microarrays,Gene Expression Microarray Analysis,Oligonucleotide Arrays,cDNA Microarrays,DNA Arrays,DNA Chips,DNA Microchips,Gene Chips,Oligodeoxyribonucleotide Array Sequence Analysis,Oligonucleotide Microarrays,Sequence Analysis, Oligonucleotide Array,cDNA Arrays,Array, DNA,Array, Oligonucleotide,Array, cDNA,Arrays, DNA,Arrays, Oligonucleotide,Arrays, cDNA,Chip, DNA,Chip, Gene,Chips, DNA,Chips, Gene,DNA Array,DNA Chip,DNA Microarray,DNA Microchip,Gene Chip,Microarray, DNA,Microarray, Oligonucleotide,Microarray, cDNA,Microarrays, DNA,Microarrays, Oligonucleotide,Microarrays, cDNA,Microchip, DNA,Microchips, DNA,Oligonucleotide Array,Oligonucleotide Microarray,cDNA Array,cDNA Microarray
D020869 Gene Expression Profiling The determination of the pattern of genes expressed at the level of GENETIC TRANSCRIPTION, under specific circumstances or in a specific cell. Gene Expression Analysis,Gene Expression Pattern Analysis,Transcript Expression Analysis,Transcriptome Profiling,Transcriptomics,mRNA Differential Display,Gene Expression Monitoring,Transcriptome Analysis,Analyses, Gene Expression,Analyses, Transcript Expression,Analyses, Transcriptome,Analysis, Gene Expression,Analysis, Transcript Expression,Analysis, Transcriptome,Differential Display, mRNA,Differential Displays, mRNA,Expression Analyses, Gene,Expression Analysis, Gene,Gene Expression Analyses,Gene Expression Monitorings,Gene Expression Profilings,Monitoring, Gene Expression,Monitorings, Gene Expression,Profiling, Gene Expression,Profiling, Transcriptome,Profilings, Gene Expression,Profilings, Transcriptome,Transcript Expression Analyses,Transcriptome Analyses,Transcriptome Profilings,mRNA Differential Displays
D023281 Genomics The systematic study of the complete DNA sequences (GENOME) of organisms. Included is construction of complete genetic, physical, and transcript maps, and the analysis of this structural genomic information on a global scale such as in GENOME WIDE ASSOCIATION STUDIES. Functional Genomics,Structural Genomics,Comparative Genomics,Genomics, Comparative,Genomics, Functional,Genomics, Structural

Related Publications

Samarendra Das, and Anil Rai, and D C Mishra, and Shesh N Rai
May 2020, Journal of integrative bioinformatics,
Samarendra Das, and Anil Rai, and D C Mishra, and Shesh N Rai
June 2014, Biology direct,
Samarendra Das, and Anil Rai, and D C Mishra, and Shesh N Rai
December 2019, Genes & genomics,
Samarendra Das, and Anil Rai, and D C Mishra, and Shesh N Rai
January 2006, Statistical applications in genetics and molecular biology,
Samarendra Das, and Anil Rai, and D C Mishra, and Shesh N Rai
January 2010, Advances in experimental medicine and biology,
Samarendra Das, and Anil Rai, and D C Mishra, and Shesh N Rai
January 2007, Statistical applications in genetics and molecular biology,
Samarendra Das, and Anil Rai, and D C Mishra, and Shesh N Rai
January 2013, PloS one,
Samarendra Das, and Anil Rai, and D C Mishra, and Shesh N Rai
January 1962, Archiv fur Gynakologie,
Samarendra Das, and Anil Rai, and D C Mishra, and Shesh N Rai
March 2008, Current computer-aided drug design,
Samarendra Das, and Anil Rai, and D C Mishra, and Shesh N Rai
January 2005, Journal of chemical information and modeling,
Copied contents to your clipboard!