Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study. 2020

Theodoulos Rodosthenous, and Vahid Shahrezaei, and Marina Evangelou
Department of Mathematics, Imperial College London, London SW7 2AZ, UK.

Recent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional (p≫n) data, such as OMICS. The sparse variant of canonical correlation analysis (CCA) approach is a promising one that seeks to penalize the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets. Through a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al., penalized matrix decomposition CCA proposed by Witten and Tibshirani and its extension proposed by Suo et al. The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement over conventional predictive models that include one or multiple datasets. https://github.com/theorod93/sCCA. Supplementary data are available at Bioinformatics online.

UI MeSH Term Description Entries
D010641 Phenotype The outward appearance of the individual. It is the product of interactions between genes, and between the GENOTYPE and the environment. Phenotypes
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D015999 Multivariate Analysis A set of techniques used when variation in several variables are studied simultaneously. In statistics, multivariate analysis is interpreted as any analytic method that allows simultaneous study of two or more dependent variables. Analysis, Multivariate,Multivariate Analyses
D020412 Multifactorial Inheritance A pattern of inheritance of a trait that includes the contributions from more than one gene. Oligogenic Inheritance,Polygenic Inheritance,Polygenic Traits,Complex Inheritance,Complex Traits,Multigenic Inheritance,Multigenic Traits,Oligogenic Traits,Polygenic Characters,Character, Polygenic,Characters, Polygenic,Complex Trait,Inheritance, Complex,Inheritance, Multifactorial,Inheritance, Multigenic,Inheritance, Oligogenic,Inheritance, Polygenic,Multigenic Trait,Oligogenic Trait,Polygenic Character,Polygenic Trait,Trait, Complex,Trait, Multigenic,Trait, Oligogenic,Trait, Polygenic,Traits, Complex,Traits, Multigenic,Traits, Oligogenic,Traits, Polygenic

Related Publications

Theodoulos Rodosthenous, and Vahid Shahrezaei, and Marina Evangelou
January 2021, Frontiers in genetics,
Theodoulos Rodosthenous, and Vahid Shahrezaei, and Marina Evangelou
March 2024, BMC bioinformatics,
Theodoulos Rodosthenous, and Vahid Shahrezaei, and Marina Evangelou
July 2021, Computers in biology and medicine,
Theodoulos Rodosthenous, and Vahid Shahrezaei, and Marina Evangelou
February 2012, Computational statistics & data analysis,
Theodoulos Rodosthenous, and Vahid Shahrezaei, and Marina Evangelou
August 2013, BMC bioinformatics,
Theodoulos Rodosthenous, and Vahid Shahrezaei, and Marina Evangelou
May 2023, PLoS genetics,
Theodoulos Rodosthenous, and Vahid Shahrezaei, and Marina Evangelou
July 2020, BMC bioinformatics,
Theodoulos Rodosthenous, and Vahid Shahrezaei, and Marina Evangelou
September 2020, Biometrika,
Theodoulos Rodosthenous, and Vahid Shahrezaei, and Marina Evangelou
March 2024, Journal of biomedical informatics,
Theodoulos Rodosthenous, and Vahid Shahrezaei, and Marina Evangelou
March 2024, Biometrical journal. Biometrische Zeitschrift,
Copied contents to your clipboard!