1 / 33

Michel Tenenhaus

PLS path modeling and Regularized Generalized Canonical Correlation Analysis for multi-block data analysi s. Michel Tenenhaus. 3 Appellations. 4 Soils. 4 blocks of variables. X 3. Illustrative variable. Sensory analysis of 21 Loire Red Wines. A famous example of Jérôme Pagès. X 1. X 2.

river
Télécharger la présentation

Michel Tenenhaus

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PLS path modeling and Regularized Generalized Canonical Correlation Analysis for multi-block data analysis Michel Tenenhaus

  2. 3 Appellations 4 Soils 4 blocks of variables X3 Illustrative variable Sensory analysis of 21 Loire Red Wines A famous example of Jérôme Pagès X1 X2 X4 X1 = Smell at rest, X2 = View, X3 = Smell after shaking, X4 = Tasting

  3. PCA of each block: Correlation loadings 2 dimensions 1 dimension Are these first components positively correlated ? Same question for the second components. 2 dimensions 2 dimensions

  4. Using XLSTAT-PSLPM / Mode PCA on variables more correlated to PC1 than to PC2 Model 1 Outer model PCA optimizes the RGCCA is a compromise between PCA and PLS-mode B Inner model PLS-mode B optimizes the

  5. Model 1 : PCA of each block All loadings are significant (except one).

  6. Model 1 : PCA of each block PCA is very stable. All weights are significant.

  7. Multi-Block Analysis is a factor analysis of tables : PLS-Mode B: F1h,…,FJh optimize the inner model. PCA: Fj1,…,Fjmj optimize the outer model. subject to constraints : • Factors (LV, Scores, Components) • are well explaining their own block . RGCCA gives a compromise between these two objectives. and/or • Same order factors are well ( positively ) correlated ( to improve interpretation ).

  8. PLS-mode B and RGCCAfor Multi-Block data Analysis • Inner model: connections between LV’s • Outer model: connections between MV’s and their LV’s. • Maximizing correlations for inner model: PLS-mode B (H. Wold, 1982 and Hanafi, 2007). But, for each block, more observations than variables are needed. • Maximizing correlations for inner model and explained variances for outer model: Regularized Generalized Canonical Correlation Analysis (A. & M. Tenenhaus, 2011). No constraints on block dimensions when the “shrinkage constants” are positive. • PLS-mode B is a special case of RGCCA.

  9. PLS-mode B where: H. Wold (1982) has described a monotone convergent algorithm related to this optimization problem. (Proof by Hanafi in 2007.)

  10. yj = Xjaj Outer component (summarizes the block) Initial step aj Inner component (takes into account relations between blocks) • Choice of inner weights ejk: • Horst : ejk = cjk • Centroid : ejk = cjksign(Cor(yk,yj)) • Factorial : ejk = cjkCor(yk,yj) cjk = 1 if blocks are connected, 0 otherwise Wold’s algorithm: PLS-Mode B Iterate until convergence of the criterion. (Hanafi, 2007) Limitation: nj > pj

  11. Optimizing the inner model (with XLSTAT) <=> PLS-mode B, Centroid scheme pj < nj Model 2 One step average two-block CCA Inner model

  12. Optimizing the inner model (with XLSTAT) Mode B, Factoriel pj < nj <=> Model 3 One step average two-block CCA Inner model

  13. Model 3

  14. Model 3 PLS-mode B is very unstable.

  15. Conclusion • Many weights are not significant !!! • If you want the butter (good correlations for the inner and outer models) andthe money of the butter (significant weights) , you must switch to Regularized Generalized Canonical Correlation Analysis (RGCCA).

  16. Regularized generalized CCA where: A monotone convergent algorithm related to this optimization problem is proposed (A.& M. Tenenhaus, 2011). and:

  17. yj = Xjaj Outer component (summarizes the block) Initial step aj Inner component (takes into account relations between blocks) • Choice of inner weights ejk: • Horst : ejk = cjk • Centroid : ejk = cjksign(Cor(yk,yj)) • Factorial : ejk = cjkCov(yk,yj) The PLS algorithm for RGCCA Iterate until convergence of the criterion. nj can be <= pj, for j > 0. cjk = 1 if blocks are connected, 0 otherwise.

  18. yj = Xjaj Outer component (summarizes the block) Initial step aj Inner component (takes into account relations between blocks) • Choice of inner weights ejk: • Horst : ejk = cjk • Centroid : ejk = cjksign(Cor(yk,yj)) • Factorial : ejk = cjkCor(yk,yj) cjk = 1 if blocks are connected, 0 otherwise. All j = 0, RGCCA = PLS-Mode B Iterate until convergence of the criterion.

  19. yj = Xjaj Outer component (summarizes the block) Initial step aj Inner component (takes into account relations between blocks) • Choice of inner weights ejk: • Horst : ejk = cjk • Centroid : ejk = cjksign(Cor(yk,yj)) • Factorial : ejk = cjkCov(yk,yj) cjk = 1 if blocks are connected, 0 otherwise. All j = 1, RGCCA - Mode A Iterate until convergence of the criterion. nj can be <= pj.

  20. Model 4 : RGCCA, factorial scheme, mode A One step average two-block PLS regression Latent variables have been afterwards standardized.

  21. Model 4 All loadings are significant.

  22. Model 4 RGCCA-mode A is very stable. All weights are also significant.

  23. Model Comparison (Schäfer & Strimmer, 2005) R-code (Arthur T.)

  24. AVE outer Same  for all blocks AVE inner  Mode A :  = 1 Mode B :  = 0 Mode A favors the outer model. Mode B favors the inner model.

  25. Hierarchical model for wine data: Model 5 RGCCA: Factorial, Mode A Dimension 1 One-step hierarchical PLS regression 2nd order block “Global” contains all the MV’s of the 1st order blocks

  26. Hierarchical model for wine data: Model 6 RGCCA: Factorial, Mode A for initial blocks, Mode B for global block Mode A This method has been proposed independently at least three times: - Covariance criterion (J.D. Carroll, 1968) - Consensus PCA (S. Wold et al., 1987) - Multiple co-inertia analysis (Chessel & Hanafi, 1996) Mode B Mode A Mode A Mode A One-step hierarchical redundancy analysis 2nd order block “Global” contains all the MV’s of the 1st order blocks

  27. Hierarchical model for wine data: Model 7 RGCCA, Factorial, Mode A Dimension 2 Block View is given up. 2nd order block “Global” contains all the MV’s of the 1st order blocks

  28. Mapping of the correlations with the global components

  29. Wine visualization in the global component space Wines marked by Appellation

  30. Wine visualization in the global component space Wines marked by Soil DAM = Dampierre-sur-Loire GOOD QUALITY

  31. A soft, warm, blackberry nose. A good core of fruit on the palate with quite well worked tannin and acidity on the finish; Good length and a lot of potential. DECANTER (mai 1997) (DECANTER AWARD ***** : Outstanding quality, a virtually perfect example) Cuvée Lisagathe 1995

  32. References

  33. All the proofs of a pudding are in the eating, but it will taste even better if you know the cooking. Final conclusion

More Related