Iterative weighting of multiblock data in the orthogonal partial least squares framework. 2014

Julien Boccard, and Douglas N Rutledge
AgroParisTech, UMR 1145 Ingénierie Procédés Aliments, 16, rue Claude Bernard, F-75005 Paris, France; INRA, UMR 1145 Ingénierie Procédés Aliments, F-75005 Paris, France.

The integration of multiple data sources has emerged as a pivotal aspect to assess complex systems comprehensively. This new paradigm requires the ability to separate common and redundant from specific and complementary information during the joint analysis of several data blocks. However, inherent problems encountered when analysing single tables are amplified with the generation of multiblock datasets. Finding the relationships between data layers of increasing complexity constitutes therefore a challenging task. In the present work, an algorithm is proposed for the supervised analysis of multiblock data structures. It associates the advantages of interpretability from the orthogonal partial least squares (OPLS) framework and the ability of common component and specific weights analysis (CCSWA) to weight each data table individually in order to grasp its specificities and handle efficiently the different sources of Y-orthogonal variation. Three applications are proposed for illustration purposes. A first example refers to a quantitative structure-activity relationship study aiming to predict the binding affinity of flavonoids toward the P-glycoprotein based on physicochemical properties. A second application concerns the integration of several groups of sensory attributes for overall quality assessment of a series of red wines. A third case study highlights the ability of the method to combine very large heterogeneous data blocks from Omics experiments in systems biology. Results were compared to the reference multiblock partial least squares (MBPLS) method to assess the performance of the proposed algorithm in terms of predictive ability and model interpretability. In all cases, ComDim-OPLS was demonstrated as a relevant data mining strategy for the simultaneous analysis of multiblock structures by accounting for specific variation sources in each dataset and providing a balance between predictive and descriptive purpose.

UI MeSH Term Description Entries
D005419 Flavonoids A group of phenyl benzopyrans named for having structures like FLAVONES. 2-Phenyl-Benzopyran,2-Phenyl-Chromene,Bioflavonoid,Bioflavonoids,Flavonoid,2-Phenyl-Benzopyrans,2-Phenyl-Chromenes,2 Phenyl Benzopyran,2 Phenyl Benzopyrans,2 Phenyl Chromene,2 Phenyl Chromenes
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D014920 Wine Fermented juice of fresh grapes or of other fruit or plant products used as a beverage. Wines
D016018 Least-Squares Analysis A principle of estimation in which the estimates of a set of parameters in a statistical model are those quantities minimizing the sum of squared differences between the observed values of a dependent variable and the values predicted by the model. Rietveld Refinement,Analysis, Least-Squares,Least Squares,Analyses, Least-Squares,Analysis, Least Squares,Least Squares Analysis,Least-Squares Analyses,Refinement, Rietveld
D016208 Databases, Factual Extensive collections, reputedly complete, of facts and data garnered from material of a specialized subject area and made available for analysis and application. The collection can be automated by various contemporary methods for retrieval. The concept should be differentiated from DATABASES, BIBLIOGRAPHIC which is restricted to collections of bibliographic references. Databanks, Factual,Data Banks, Factual,Data Bases, Factual,Data Bank, Factual,Data Base, Factual,Databank, Factual,Database, Factual,Factual Data Bank,Factual Data Banks,Factual Data Base,Factual Data Bases,Factual Databank,Factual Databanks,Factual Database,Factual Databases
D045744 Cell Line, Tumor A cell line derived from cultured tumor cells. Tumor Cell Line,Cell Lines, Tumor,Line, Tumor Cell,Lines, Tumor Cell,Tumor Cell Lines
D049490 Systems Biology Comprehensive, methodical analysis of complex biological systems by monitoring responses to perturbations of biological processes. Large scale, computerized collection and analysis of the data are used to develop and test models of biological systems. Biology, Systems
D057225 Data Mining Use of sophisticated analysis tools to sort through, organize, examine, and combine large sets of information. Text Mining,Mining, Data,Mining, Text
D020168 ATP Binding Cassette Transporter, Subfamily B, Member 1 A 170-kDa transmembrane glycoprotein from the superfamily of ATP-BINDING CASSETTE TRANSPORTERS. It serves as an ATP-dependent efflux pump for a variety of chemicals, including many ANTINEOPLASTIC AGENTS. Overexpression of this glycoprotein is associated with multidrug resistance (see DRUG RESISTANCE, MULTIPLE). ATP-Dependent Translocase ABCB1,MDR1 Protein,MDR1B Protein,Multidrug Resistance Protein 1,P-Glycoprotein,P-Glycoprotein 1,ABCB1 Protein,ATP Binding Cassette Transporter, Sub-Family B, Member 1,ATP-Binding Cassette, Sub-Family B, Member 1,CD243 Antigen,PGY-1 Protein,1, P-Glycoprotein,ABCB1, ATP-Dependent Translocase,ATP Dependent Translocase ABCB1,Antigen, CD243,P Glycoprotein,P Glycoprotein 1,PGY 1 Protein,Protein, MDR1B,Translocase ABCB1, ATP-Dependent

Related Publications

Julien Boccard, and Douglas N Rutledge
October 1976, Journal of pharmaceutical sciences,
Julien Boccard, and Douglas N Rutledge
May 1989, Trends in pharmacological sciences,
Julien Boccard, and Douglas N Rutledge
June 2016, Biometrika,
Julien Boccard, and Douglas N Rutledge
January 2022, Analytica chimica acta,
Julien Boccard, and Douglas N Rutledge
April 2009, The journal of physical chemistry. B,
Julien Boccard, and Douglas N Rutledge
January 2013, Methods in molecular biology (Clifton, N.J.),
Copied contents to your clipboard!