Rafael46
Uploaded by
10 SLIDES
119 VUES
100LIKES

(OPLS-DA) Strategy

DESCRIPTION

Chemometrics for metabolomics

1 / 10

Download Presentation
Télécharger la présentation

(OPLS-DA) Strategy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analytica Chimica Acta 769 (2013) 30– 39 Contents lists available atSciVerse ScienceDirect Analytica Chimica Acta journal hom epa ge: www.elsevier.com/locate/aca A (OPLS-DA) consensus orthogonal partial least squares discriminant analysis strategy for multiblock Omics data fusion Julien Boccard, Douglas N. Rutledge∗ Laboratoire de Chimie Analytique, AgroParisTech, Paris, France h i g h l i g h t s g r a p h i c a l a b s t r a c t ? Omics obtained devices. ? Extracting multiple ? A data studies generate massive data from different analytical knowledge from these blocks is challenging. generic methodology for Omics fusion is proposed. a r t i c l e i n f o a b s t r a c t Article Received Received 28 Accepted Available online 21 January 2013 history: Omics as matrices Omics These ing. methodology kernel ple illustrates itive dataset dimensional of able transcriptomics The logical to noisy approaches have proven their value to provide a broad monitoring of biological systems. However, 27August 2012 no single analytical technique is sufficient to reveal the full biochemical content of complex biological inrevised form or biofluids, the fusion of information from several data sources has become a decisive issue. November 2012 studies generate an increasing amount of massive data obtained from different analytical devices. 14January 2013 data are usually high dimensional and extracting knowledge from these multiple blocks is challeng- Appropriate tools are therefore needed to handle these datasets suitably. For that purpose, a generic is proposed by combining the strengths of established data analysis strategies, i.e.multiple Keywords: Omics Metabolomics Data Multiblock Consensus OPLS-DA learning and OPLS-DA to offer an efficient tool for the fusion of Omics data obtained from multi- sources. Three real case studies are proposed to assess the potential ofthe method. A first example the fusion of mass spectrometry-based metabolomic data acquired in both negative and pos- fusion electrospray ionisation modes, from leaf samples of the model plant thaliana. A second Arabidopsis model involves the classification of wine grape varieties based on polyphenolic extracts analysed by two- heteronuclear magnetic resonance spectroscopy. A third case study underlines the ability the method to combine heterogeneous data from systems biology with the analysis of publicly avail- data related to NCI-60 cancer cell lines from different tissue origins, which include metabolomics, and proteomics. fusion of Omics data from different sources is expected to provide a more complete view ofbio- systems. The proposed method was demonstrated as a relevant and widely applicable alternative handle efficiently the inherent characteristics of multiple Omics data, such as very large numbers of collinear variables. © 2013 Elsevier B.V. All rights reserved. 1.Introduction Omics approaches have become akey tool formany research fields metabolism, untargeted including disease diagnosis (e.g. cancer and diabetes), drug ∗Corresponding rue E-mail author at: Laboratoire de Chimie Analytique, AgroParisTech, 16 natural product discovery and toxicology [1]. The Claude Bernard, 75231 Paris, France. Tel.: +33 14408 1647. monitoring of thousands of lowweight molecules, rutledge@agroparistech.fr (D.N. Rutledge). address: 0003-2670/$ http://dx.doi.org/10.1016/j.aca.2013.01.022 –see front matter © 2013 Elsevier B.V. All rights reserved.

  2. 31 J.Boccard, D.N. Rutledge /Analytica Chimica Acta 769 (2013) 30– 39 transcripts profiles abiological driven modelling Most ples The able meaningful data tive regression structures loadings lated of [4] an tions response Its as while standard ier i.e. component(s). vectors that component(s). systematic logical (KOPLS) linear Despite platforms, ficient A of and ing overwhelming Different from pendent combination Onthe ples multiblock assessed blocks. ces multiple asummary Standard catenated generation ables, becomes forming or Several simultaneous ferent not orproteins allows the characterisation of multivariate search mensional The of block method of an proteomic strategy data script learning data in poses modelling Resampling models mates an tion summary matrices. while tion memory applied and highlight for directions of similar sample distributions inthe multidi- involving alarge number of related features, todescribe spaces defined by each table, common components. i.e. reality. The extensive data collection of such a contribution of each individual block isexpressed as ameasure data- methodology, gives rise tonew challenges regarding data the extracted variability for each common dimension. The multi- and knowledge discovery. PLS algorithm (MBPLS) [10],awell-established supervised Omics experimental setups aim atthe comparison of sam- for multiblock modelling, was applied forthe integration from acontrol and from acase group (e.g. disease or treatment). MS-based metabolomic data [11] and very recently reported as goal of such differential analysis istherefore tobuild amodel appropriate predictive method for relating metabolomic and todistinguish the classes of observations and toprovide a data [12].Additionally, ahierarchical OPLS modelling interpretation of the observed differences. As Omics was proposed for analysing multiple blocks of spectral are highly multivariate (kvariables ?nobservations), predic- [13] and the O2PLS algorithm was implemented torelate tran- models based on latent variables such as partial least squares and metabolite data blocks [14].Additionally, multiple kernel (PLS) are particularly well suited tohandle these data (MKL) approaches were developed tocombine several for discriminant analysis [2,3].However, PLS scores and sources using kernel matrices [15,16] and recently applied become rotated when strong systematic variations unre- the context of metabolomics [17].The present approach pro- tothe response are present inthe data. The interpretation totake advantage of the easy interpretability of OPLS-DA the models then becomes more difficult. The OPLS algorithm with adimensionality reduction and data fusion step. isan extension of the PLS regression method which integrates experiments constitute very useful tools toensure orthogonal signal correction filter [5] todistinguish the varia- validity and evaluate confidence intervals or robust esti- inthe data that are useful for the prediction of aquantitative ofcomputed parameters [18].The proposed methodology is from the variations that are orthogonal tothe prediction. extension of the MKL approach toOPLS-DA, based on the extrac- discriminant analysis counterpart (OPLS-DA) was demonstrated of consensus components the decomposition of the small via X·XTproduct apowerful tool for the analysis ofqualitative data structures, data table computed as aweighted sum of the prediction results are equivalent toclassification using Itprovides the advantage of improved prediction ability, PLS-DA [6]. The main advantage of OPLS-DA isaneas- allowing computer-intensive resampling, such as permuta- interpretation ofthe models, as itfocuses on the predictive, tests and bootstrapping without prohibitive needs regarding discriminant, information that issummarised inthe predictive resources and computation time. This methodology was The corresponding predictive scores and loading tothree real case studies with Omics multiblock datasets, are therefore less subject toorthogonal variation. Variation predictions were compared tothe reference MBPLS method, to is unrelated tothe class response isdescribed inthe orthogonal the method’s advantages inthis context. The latter can be useful tohighlight unexpected variability related toan experimental bias or tobio- 2.Description strategy of the Consensus OPLS-DA data modelling variations [4].Akernel reformulation of the OPLS algorithm was proposed recently toextend its applicability tonon- relationships [7]. The presented Consensus OPLS-DA multiblock data modelling the wealth of data generated bymodern analytical strategy mentation the DA reformulations to based strated original loading proposes tocombine the principle of the kernel imple- the analysis of asingle dataset may be limited and insuf- ofthe OPLS algorithm with adata fusion procedure for toprovide aholistic picture of the phenomenon under study. simultaneous evaluation of multiple data blocks inthe OPLS- growing number of studies take advantage ofthe combination modelling framework. Ithas tobe mentioned that kernel-based several data sources toaccount for complementary information of the NIPALS algorithm were initially developed provide amore global picture of biological systems. Extract- enable amore compact representation of the initial data table the relevant information from multiple data tables among the on linear transformations [19,20].Rantalainen et al. demon- amount of data has become adecisive issue. that the KOPLS algorithm preserves the properties of the strategies can be applied toassociate data originating OPLS method regarding the interpretation ofscore and several sources. data fusion involves the inde- High-level vectors [7]. The OPLS model isdefined as follows: processing and modelling ofthe data tables and the =tppT topT X E p+ p+F o+ of the results toprovide ameaningful synthesis [8]. (1) other hand, when adirect link exists between the sam- tpqT Y = of each data block, other alternatives are provided. Inthe situation, itis assumed that the same samples are where loading Y-orthogonal matrix respectively. Similarly multiblock similarities ration by centred malising As for ences matrices the tp is the Y-predictive score matrix, pp isthe Y-predictive bydifferent analytical protocols togenerate the data matrix for X, toisthe Y-orthogonal score matrix, poisthe Inthat context, the horizontal concatenation of data matri- loading matrix for X, qpisthe Y-predictive loading is astraightforward solution tointegrate information from for Y, and Eand Fare the residual matrices for Xand Y, sources. As itmixes variables of different origins tobuild table, such data fusion approach iscoined low-level. tokernel approaches, several methods from the multivariate analysis methods can be applied tothe con- framework rely on anassociation matrix that reflects data [9]. The major drawback of this approach isthe among the rows of a given block X,i.e. the configu- of data structures with aprohibitive number of vari- of the observations. Inthe linear case, this matrix is given X·XTproduct, aggravating the curse ofdimensionality. This particularly the that corresponds tothe linear kernel of the aserious issue inthe case of Omics data analysis when per- Xmatrix. The fairness between blocks is ensured by nor- computer-intensive procedures, such as cross-validation the matrices according toascaling factor, their norm. e.g. bootstrapping. a multiblock model could be dominated by the largest matrices multiblock strategies have been developed for the numerical reasons, block scaling isuseful toaccount fordiffer- X·XTproduct analysis of multiple data tables, signals from dif- regarding the sizes of the data tables. Similarly, e.g. analytical platforms describing the same observations, but may take larger values incase of larger initial matrices. In necessarily with the same number of variables. These methods proposed strategy, the scaling of association matrices ensures

  3. 32 J. Boccard, D.N. Rutledge /Analytica Chimica Acta 769 (2013) 30– 39 the present In is tive space step ments adapted the modified range dimensional RV-coefficients the towards tive about S1. follows: ? It speaking provide studies, absolute performance. is under information gonal former, terion detrimental The calculated feature nels that addition matrices. ysis few of should and sources weights that catenated the orthogonal the optimal (DQ2)is predictions metric the and tables. in As data variables. fairness between blocks tofind anunderlying structure that is interpreted The variable the and with respect tothe original variables inthe usual way. inall data blocks. loadings of the initial variables are computed for agiven latent the MKL framework, the combination of multiple datasets iand for adata block aaccording tothe RV-coefficient of then carried out bycomputing aweighted sum ofthe respec- block with respect toY, the scores vector for the component i association matrix from each block todetermine aconsensus the pre-treated data block a, as follows: optimising the prediction accuracy. This weight optimisation pai=RVXaY∗XT a∗ti constitutes therefore akey point ofMKL and major improve- (3) tT i∗ti of the prediction ability and/or sparsity can be obtained with strategies, including L1and L2-norm penalisation [21].In It has tobe noted that such loadings back-calculation is limited proposed method, block weighting is implemented using the to prediction loadings In to component way: the linear case and other strategies based onpseudo-samples RV-coefficient developed bySmilde et al., adjusted tothe were developed toextend the assessment of variables [0,1]. This index characterises the relationship between high- tonon-linear kernels [25]. datasets byameasure of common information [22]. the multiblock framework, the contribution of each data block are computed between individual data blocks and agiven latent variable can be computed from the scores of the Yresponse tobe predicted toorientate the consensus kernel X·XTproduct and the kernel matrix inthe following better predictive ability, as blocks bearing more predic- information will have astronger impact on the model. Details tT (4) Cai= i∗Ka∗ti the RV-coefficient are provided as Supplementary material The computation of the consensus kernel isthen carried out as 3. Experimental RVXaY∗Ka (2) The proposed multiblock data analysis approach was developed K = MATLAB®7environment under Consensus toolboxes computed et in validation the MBPLS with the (The MathWorks, Natick, USA). OPLS-DA models were computed with combinations of has however tobe noted that unlike typical MKL, nostrictly and in-house functions. Modified RV-coefficients were optimisation of the blocks weights iscarried out to with the publicly available MATLAB m-file from Smilde an optimal prediction power. Actually, inmost Omics al. [22].KOPLS-DA was assessed with routines implemented the value of multivariate models resides more intheir the KOPLS open source package [26].For each model, cross- biological and chemical relevance than intheir predictive DQ2index was performed and the was calculated with More specifically, the examination of OPLS models script provided by Westerhuis etal. [24] toassess the model fit. expected toprovide explicit knowledge about the phenomena regression with deflation on the super scores was computed study based on predictive component(s), but also additional the Multi-block Toolbox for MATLAB v.02 [27]. about systematic variations summarised inortho- component(s). Ifthe prediction performance isrelated tothe 4. Real case studies – results and discussion the latter cannot be optimised according tothe same cri- and prediction-optimised weighting schemes could even be tointerpretation. 4.1. Arabidopsis thaliana (dataset 1) model estimation isthen made on the basis of the new summary data table. From akernel point of view, the 4.1.1. Dataset The approach wild and death, (ecotype protocol matography experiments ent sample negative 100–1000 data the 24 namely harvested 2mutant of (1149 cal samples. Each strategy stituted situations, the 1–characteristics and modelling space corresponds tothe weighted sum of the linear ker- first dataset illustrates the potencies of the proposed calculated independently for each data block. It has tobe noted toassess the metabolic effects ofwounding inleaves from simple mathematical operations such as multiplication and type specimens ofthe model plant (ecotype Col-0) A. thaliana preserve the kernel properties and produce valid kernel alesion mimic mutant that exhibits spontaneous spreading cell Such anapproach is particularly well suited for the anal- [28].A. specimens i.e. accelerated cell death acd2-2 thaliana of high dimensional data tables from Omics studies with only Col-0) were grown, harvested and extracted according toa observations. Itallows data fusion without worsening the curse described elsewhere [29].Ultra-high pressure liquid chro- dimensionality and has computationally beneficial properties. It time-of-flight mass spectrometry (UHPLC-TOF/MS) be stressed that the number of data blocks isnot limited were achieved with a rapid chromatographic gradi- this strategy can be applied toany number of distinct data toobtain ashort analysis time of less than 7 min allowing high with any kind of relations between the tables. Ifblocks throughput. The MS electrospray interface was operated in are all set toone, this approach leads toasimple solution mode (ESI−)and positive mode (ESI+) inthe m/z range is equal tothe analysis of the individually normalised, con- during separate experiments. The processing of the raw datasets [23].The KOPLS-DA algorithm isthen applied to was performed independently for each ionisation mode with MakerLynxTMsoftware, consensus kernel todetermine the underlying predictive and as described in[30].Two data tables of dimensions common toall data tables with respect to observations were obtained with three classes of plant samples, Yclass response. Cross-validation is performed toevaluate the wild type specimens (Ctrl, =8), wounded wild type plants n Q2index model size and avoid overfitting. The Discriminant after 90minutes of incubation (Wnd90, n=8)and acd2- used toassess the model fitas itdoes not penalise class plants (acd2-2,n=8). The integration of the two blocks DQ2 beyond the class label value [24].Details about the MS-based metabolomic data, ESI− (686 variables) and ESI+ i.e. are provided as Supplementary material S2. The outputs of variables), was investigated toprovide an extended analyti- model are similar tostandard OPLS but the corresponding coverage of the biochemical diversity characterising the leaves tp toscores matrices constitute acompromise between the data An overview of the Consensus OPLS-DA workflow isdepicted data block was autoscaled and the Consensus OPLS-DA Fig. 1. was applied for data fusion. As the control group con- X·XTproducts kernels are linear combinations of the initial a common reference for the comparison of the three tables, the model outputs are directly related tothe measured two discriminant models were computed todistinguish The modelling results can therefore be displayed and metabolic patterns of (i) control and wild type wounded plants

  4. 33 J.Boccard, D.N. Rutledge /Analytica Chimica Acta 769 (2013) 30– 39 Fig. 1.The Consensus OPLS-DA workflow. Fig. from 2. Consensus OPLS-DA score plots (A) and block contributions (B) for the dataset. Samples from the Ctrl class are symbolised by white circles (?),observations Arabidopsis (?) (?). the Wnd90 class byblack squares and plants from the class byblack triangles acd2-2

  5. 34 J. Boccard, D.N. Rutledge /Analytica Chimica Acta 769 (2013) 30– 39 (Ctrl imens predictive the one-out DQ2=0.996 tal models dictive well-balanced and sented As of ing were ues for true different mean 2: tional technique would with data. A method taneous various ues to same validation 0.995 results based response catenation. requirements Table Moreover, strong which for of Y-predictive the and at fication As correlation Wnd90, model 1)and (ii) control and mutant spec- vs. acd2-2 (Ctrl acd2-2, model 2). Inboth cases, amodel with one vs. and one orthogonal latent variable was evaluated as DQ2value best model based on the computed during leave- DQ2=0.831 cross validation (LOOCV), for model 1and i.e. for model 2. Satisfactory partitions of the experimen- groups were obtained and the contributions of the blocks tothe indicated apreponderance of the ESI− block inthe pre- component of model 1(ESI−60.9% and ESI+ 39.1%) and a contribution of both tables for model 2(ESI− 50.1% ESI+ 49.9%). Both score plot and block contributions are pre- inFig. 2. cross-validation may lead tooverly optimistic results, aseries 103permutation tests were carried out for each model by mix- randomly the original Yclass response. The dummy models DQ2val- evaluated according toLOOCV and the corresponding were collected. The histograms of these values are presented each model as Supplementary material S3. Inboth cases, the DQ2value model was clearly distinguished and statistically from the random models collection (model 1: p<0.001, Fig. acd2-2, both minus 3.Joint =−0.337, standard deviation (SD) =0.323, n=1000; model SUS-plot with four highlighted areas: (1) increase inboth Wnd90 and (2) increase inacd2-2 only, (3) increase inWnd90 only and (4) decrease in p<0.001, mean =−0.304, SD=0.229, n=1000), providing addi- Wnd90 and acd2-2.Variables detected inthe ESI− mode are symbolised by a confidence for model validity. Such acomputer-intensive sign (−)and variables detected inthe ESI+ mode by aplus sign (+). was easily performed with the proposed method but be prohibitive interms of computing time and resources block correlation Xaand the predictive component ofeach model tptogenerate traditional approaches due tothe high dimensionality of the and build aSUS-plot for further interpretation. loadings comparison of the prediction results with the reference MBPLS was then performed. The MBPLS algorithm isasimul- 4.1.2. Dataset Correlation were contributions four (area both detected oxylipin These at 1.35 (OPC-4) were isoleucine m/z324.2173 338.1963 carboxy [M+H]+m/z The compounds type. as at 1.37 Additionally, 1–biological interpretation component method which has proven its usefulness in loadings from the ESI− and the ESI+ data blocks applications [31].For both models, the Ypredicted val- combined toprovide ajoint SUS-plot (Fig. 3). The variable obtained for each observation during LOOCV were compared were analysed with respect totheir position and those obtained with the MBPLS algorithm computed for the areas of interest were highlighted. The upper right region pre-processed data, amodel of identical size and asimilar 1) corresponded tocompounds with increased levels in DQ2prediction procedure. accuracy indices of 0.816 and Wnd90 and when compared toCtrl. Several ions acd2-2 were obtained for model 1and model 2, respectively. These inthe ESI− mode could be related tocompounds from the highlighted aslight improvement of the proposed approach family, known tobe involved inthe plant defence reaction. on the weighted combination of linear kernels interms of compounds include jasmonic acid (JA) [M−H]−m/z 209.1156 prediction and its advantages over MBPLS and data con- 2.40 min, hydroxylated JA(HO-JA) [M−H]−m/z 225.1105 at A summary of prediction performance indices and time min and 3-oxo-2-(2Z-pentenyl) cyclopentane-1-butyric acid with respect todata size isprovided for all models in [M−H]−m/z 237.1495 at 2.95 min. Moreover, some ions 1. detected inboth ionisation modes, such as jasmonoyl- PLS scores and loadings become rotated when a [M+H]+ (JA-Ile) [M−H]−m/z 322.2007 at3.03 min and structure uncorrelated tothe Yresponse ispresent inX, at3.02 min, hydroxy JA-Ile (HO-JA-Ile) [M−H]−m/z makes the interpretation more difficult. This isnot the case [M+H]+m/z at 2.03 min and 340.2104 at 2.03 min, and the proposed Consensus OPLS-DA strategy, as ittakes advantage JA-Ile (HOOC-JA-Ile) [M−H]−m/z 352.1771 at2.01 min and the interpretation ease of the OPLS framework byseparating the 354.1946 at2.02 min. variability from the orthogonal variability. This allows upper centre part of the SUS-plot (area 2) corresponded to implementation ofOPLS diagnostic tools, such as the Shared increased ordetected uniquely inthe pheno- acd2-2 Unique Structure (SUS) plot, adata representation which aims Relevant ions inthe ESI− mode were tentatively identified the evaluation of the contributions of variables todifferent classi- dinor-oxophytodienoic acid (dn-OPDA) [M−H]−m/z 263.1647 models bythe comparison with acommon reference [32]. 3.28 min, indol-3-ylmethyl-ascorbate [M−H]−m/z304.0788 at the Ctrl class of specimens can play this role, the Arabidopsis min and neoascorbigen [M−H]−m/z 334.0926 at2.02 min. vector corr(tp,Xa)was computed between each data complementary information was obtained from the Table Prediction 1 performance indices, data size and time requirements for MBPLS and Consensus OPLS. Dataset Model Dataset (data size Kernel (data size Size factor gain MBPLS Consensus OPLS Time gain factor units) units) DQ2 DQ2 Time (s) Time Ctrl Ctrl Grape NSCL Wnd90 29,360 29,360 256 256 6561 289 115 115 3210 783 0.816 0.995 0.847a 0.699a 2.5 2.5 NC NC 0.831 0.996 0.876 0.723 0.08 0.08 10.5 0.6 >30 >30 NC NC Arabidopsis Arabidopsis Red NCI-60 vs. acd-2 vs. wines varieties 21,060,000 226,236 MELA vs. NC =not aMBPLS computable due tomemory requirements. was not computable and the values were calculated with equal weights of one for each block.

  6. 35 J.Boccard, D.N. Rutledge /Analytica Chimica Acta 769 (2013) 30– 39 [M+H]+m/z ESI+ sulforaphan 8-(methylsulfinyl)-octane N-hydroxy-l-tryptophan phytodienoic i.e. 3.94 Theright only compounds the [M+HCOO]−m/z OPC-4-Glc dn-OPDA 2.72 whose analyses. Finally, and the detected and Taken metabolic and pathway direct hormone), strain reaction, OPDA strongly ously thaliana only are OPC-4. of accumulation the a indole was tern. sinapoyl pathway. caceae esters sistent response meaningful observed Putative comparison ous nuclear tion and targeted From between isation metabolic drawn. expected cal mode, camalexin 201.0442 at 2.30 min, by tion can Additionally, two mentary investigation compounds are valid than multiple analytical sources. Inthe particular case of the integra- i.e. [M+H]+m/z 178.0334 at1.39 min, 1-isothiocyanato- of ESI+ and ESI− data blocks, information about molecular ions [M+H]+m/z 234.0951 at2.95 min and be derived from the comparison ofthe two ionisation modes. [M+H]+m/z 220.0813 at2.61 min. Oxo- while some signals are expected tobe common tothe acid (OPDA) was detected inboth ionisation modes, data blocks, the remaining would be specific and bring comple- [M+H]+m/z [M−H]−m/z 291.1931 at3.96 min and 293.2097 at information, as illustrated inthis example. The integrative min. of analytical data matrices highlights therefore the part of the plot (area 3) corresponding toions increased that are measured incommon, aswell as those that inthe Wnd90 specimens was poorly populated but relevant measured by only one method. Finally, itismore likely tofind could be detected inthe negative mode. They included and reliable information when multiple data sources agree formiate adducts of OPC-4 glucoside conjugate (OPC-4-Glc) when relying on asingle source. 445.2053 at 2.41 min and putative hydroxylated (HO-OPC-4-Glc) [M+HCOO]−m/z 461.2028 at 2.20, and 4.2. Wine grape varieties (dataset 2) glucoside (dn-OPDA-Glc) [M+HCOO]−m/z 471.223 at min. The two latter compounds are possibly novel oxylipins 4.2.1. Dataset The extracts (Saint-Émilion) NMR widespread However, associated These pigments, ing composition ers variety, The sauvignon types (vintages), to types of connectivity–gradient-accelerated performed [38]. nolic removed two table cates, 27 (1Hdimension) Alternatively, block measured DA A wine puted grape model obtained ing cross-validation ple 103random DQ2value those SD the DQ2=0.847 ting prediction 2–characteristics and modelling definite identification will require additional spectroscopic second example involves the analysis of polyphenolic from aseries of 27 red wines from Château Cheval Blanc (1H–13C) compounds with decreased levels inboth the wound based on two-dimensional heteronuclear the mutant phenotype were located inthe bottom left part of spectroscopy. Polyphenols are bioactive compounds that are plot (area 4). Relevant compounds could be associated tothe innature and display a number of biological activities. ions, such as sinapic acid [M−H]−m/z 223.059 at 1.62 min subtle structural variations and relative ratio are closely sinapoyl malate [M−H]−m/z 339.0664 at1.63 min. with the botanical origin and the plant’s environment. together, these results provided acoherent view of the compounds, derived from grape tannins and anthocyanin pathways involved inthe defence response towounding are related tokey sensory properties of red wines, includ- the lesion mimic mutant. Inboth cases, the jasmonate colour, taste and mouthfeel. The analysis of the polyphenolic acd2-2 was strongly activated with increased levels of JA, its istherefore expected tohighlight relevant biomark- precursor OPC-4, HO-JA, JA-Ile (the most potent bioactive for quality assessment and authentication inrelation togrape HO-JA-Ile, and HOOC-JA-Ile [33].However, the production area and vintage. acd2-2 constituted aneven more extreme phenotype of the defence dataset included three grape varieties, namely cabernet as massive amounts of JAwere synthesised. Additionally, (CS), cabernet franc (CF) and merlot noir (M); three and dn-OPDA levels, two precursors of JAbiosynthesis, were of soil, clay, sand and gravel; and three years of harvest i.e. increased inthe phenotype. This fact was previ- namely 1997–1999. Each combination was investigated acd2-2 reported as aconsequence of hypersensitive response inA. explore the complete experimental space (3 varieties ×3soil [34].Polar glucosylated metabolites that were detected ×3vintages). An extraction ofthe total phenolic content inwounded plants, namely OPC-4-Glc [35] and HO-OPC-4-Glc each wine was achieved and heteronuclear multiple bonding possibly associated toan inactivation or elimination process of spectroscopy (HMBC–GAS) was This suggests aloss of regulation that allows the spreading according tothe experimental protocol described in lesions tothe whole leaf inthe mutant plants. Camalexin NMR parameters were optimised for the analysis of phe- acd2-2 was also observed inthe mutant strain with secondary metabolites. Uninformative spectral regions were acd2-2 N-hydroxy-l-tryptophan, detection of the compound itself and and the data resolution was reduced byaveraging every metabolite related toits biosynthesis. Astrong activation of the points along both dimensions, generating a413 ×625 data glucosinolate pathways constitutes another hypothesis that for each observation. As three independent analytical repli- suggested by the results obtained for the metabolic pat- including extraction and NMR, were measured for each of the acd2-2 On the other hand, the marked decreases of sinapic acid and wines, amultiway data structure of81 samples ×413 variables (13Cdimension) malate might indicate adisruption of the phenlypropanoid ×625 variables was obtained. Under normal conditions, and other Brassi- such a multiway tensor can be considered as amulti- A.thaliana accumulate UVprotective compounds, such as sinapic acid dataset with amatrix of 81 observations and 625 variables and sinapoylmalate inleaves. Globally, the results were con- across 413 data blocks. The proposed Consensus OPLS- with the current biological knowledge about the wound approach is therefore suited tohandle this type of data structure. [36] and the strain phenotype [37] and provided log transform was applied for the pretreatment of the 81 NMR acd2-2 insights into the metabolic networks involved inthe spectra. AConsensus OPLS-DA classification model was com- phenomena. tohighlight specific polyphenolic patterns according tothe identities of metabolites were partially based onthe varieties of the three classes corresponding toCS, CF and M. A of exact mass and retention time values from previ- with two predictive and two orthogonal latent variables was DQ2value studies, including MS fragmentation patterns [35] and capillary as the best model based on the computed dur- (DQ2=0.876). magnetic resonance (CapNMR) [33].Adefinitive identifica- cross validation Inthat case, aleave-three-out of the other relevant compounds remains however challenging procedure was performed toaccount for sam- these preliminary results constitute astarting point for further analytical replicates. Permutation tests were carried out with investigations. models toassess the model validity. The true model ananalytical point of view, even if partial redundancy was clearly separated and statistically different from data matrices was observed, the combination of both ion- obtained with the random models (p< 0.001, mean = −0.124, modes allowed amore complete picture ofthe complex =0.086, n=1000). Due tothe high dimensionality of the dataset, events occurring inleaves from plants tobe MBPLS model was not computable but a prediction accuracy of Arabidopsis Even ifnomajor improvements interms of prediction are could be calculated for concatenated matrices by set- inthe case of redundant datasets, chemical or biologi- an equal weight of one toeach block, highlighting the improved knowledge can be gained by the combination of data provided ability of the proposed method. The distribution of the

  7. 36 J. Boccard, D.N. Rutledge /Analytica Chimica Acta 769 (2013) 30– 39 (?), Fig. from the 4. Consensus OPLS-DA predictive tp2(A) and orthogonal to2(B) score plots. (A) Observations from the CS class are symbolised byblack diamonds samples tp1vs. to1vs. the CF class by black crosses (×) and wines from the Mclass byblack circles (?). B:Samples from the Clay soil type are symbolised bywhite triangles (?), wines from (♦), Sand group bywhite diamonds and observations from the Gravel soil type by blacks stars (*). wine score an both firming a predictive plot the posed that It provide useful samples was assessed bytwo score plots, namely apredictive based phenolic on the soil types orthe vintages may behelpful tohighlight based on the two predictive components (tp1vs. tp2)and patterns related tothese specific aspects. plot based on the orthogonal ones (to1vs. to2).In orthogonal score plot score plots, analytical replicates were tightly clustered, con- 4.3. NCI-60 cancer cell lines (dataset 3) the repeatability of the experimental protocol. As expected, clear partition of the three grape varieties was observed on the 4.3.1. Dataset The licly NCI-60 data vides derived blood cancer screening. Two from (MELA, data (2D lysate GC–MS, expected tive was origins Autoscaling with uated (DQ2=0.723) block specific tomic plot Fig. Permutation validity and mean ality the DQ2=0.699, 3 – characteristics and modelling (Fig. 4A). Interestingly, the score plot orthogonal score third example refers toaselection of data from apub- provided additional information with arough separation of available repository of the National Cancer Institute, the i.e. soil types (Fig. 4B). This fact illustrates the ability ofthe pro- dataset, which includes gene expression analysis as well as method todistinguish predictive from orthogonal variations from metabolomics and proteomics experiments [39].Itpro- exist inmultiblock datasets, based on the OPLS-DA framework. experimental data obtained from 60human cancer cell lines also has tobeemphasised that orthogonal latent variables may from nine tissue origins, such as breast, colon, lung, ovary, relevant additional information about samples variability, and skin. These cell lines constitute key models for invitro for classification purposes. research and they are used for extensive anti-cancer drug classes were selected inthis study, namely cell lines Dataset Red these samples of 413 resentation the the1H predictive material (chemical the for13C). NMR could compounds scope This Consensus data number ground the extracted related are 4.2.2. 2–biological interpretation wines possess ahigh content of phenolic compounds and non-small-cell lung carcinoma (NSCLC, n=9) and melanoma secondary metabolites were found tobe discriminant for n=8). Three data sources were chosen, transcriptomics i.e. classification according tothe grape varieties. The loadings (Affymetrix U95A chip, 12,626 variables), proteomics data each block were back-calculated and concatenated togenerate a gel electrophoresis, Western blots and reverse-phase protein ×625 loading matrix for each latent variable. Such a map rep- microarrays, 330 variables), and metabolomics data (LC and of the loadings allows interpreting the contributions of 352 variables). The combination of these data sources was variables directly inthe original variable space with respect to toprovide aglobal profiling of the cell lines inanintegra- and13Cdimensions. The loading maps computed for the two systems biology perspective. The Consensus OPLS-DA strategy components pp1and pp2are provided as Supplementary applied for the differential analysis of the two selected tumour S4. The most relevant areas were the aromatic region and the simultaneous analysis of the three blocks of data. for1Hand for13C) shifts of ∼7ppm 120–150 ppm and was performed as data pre-processing. A model for1Hand alkyl area (chemical shifts of 0–2 ppm 20–40 ppm one predictive and one orthogonal latent variable was eval- toDQ2estimated Numerous discriminant signals were detected and relevant asthe best model according byLOOCV patterns related tospin systems, such as doublets and triplets, and aclear partition of the classes was obtained. The be highlighted. The identification of individual polyphenolic contributions of the predictive latent variable indicated the from the spectral descriptors ishowever beyond the importance of the proteomic block (39.1%), the transcrip- of this manuscript. data (33.3%) and the metabolomic table (27.6%). The score second example highlighted the ability of the proposed and the contributions of the NCI-60 model are presented in OPLS-DA methodology tohandle efficiently multiway 5. 103replicates structures as aspecial case of multiblock data fusion, when the tests were done with totest model DQ2value of variables isidentical for all data blocks. The genetic back- and the true model was clearly distinguished ofthe grape had an influence on polyphenolic profiles, but statistically different from the random models (p< 0.001, type of soil was also responsible for major variations that were = −0.295, SD=0.229, n=1000). Due tothe high dimension- inthe orthogonal components. NMR spectral variations of the dataset, the MBPLS model was not computable but tothe year of harvest were not observed inthe model but prediction accuracy computed with equal weights of one was also likely tooccur. Additional prediction models with classes indicating alower prediction power compared to

  8. 37 J.Boccard, D.N. Rutledge /Analytica Chimica Acta 769 (2013) 30– 39 Fig. from 5. Consensus OPLS-DA score plot (A) and contributions (B) for the NCI-60 model. Samples from the NSCLC class are symbolised byblack crosses (×)and observations (?). the MELA class byblack diamonds Fig. indicates 6.Loadings Q–Q plots ofsample data standard normal distribution for the metabolomic (A), the transcriptomic (B) and the proteomic data block (C). Grey area vs. selected variables for further biological interpretation. Consensus calculated protein cell each loadings were allowed promising (Fig. OPLS-DA. Individual loadings of each block were back- pathways phosphate tose metabolism. Separate for and Annotation, [41]. responsible transport, mal associated Notch binding. Protein to serine/threonine tion phenotype regulation and In the tissue classes metabolomics layers with and could be associated with the NSCLC cell lines, pentose i.e. for the predictive latent variable, todetect metabolite, pathway, fructose and mannose degradation, galac- ortranscript level differences between NSCLC and MELA metabolism, tryptophan metabolism, and purine/pyrimidine lines. AQ–Q plot was assessed tocompare the loadings of block toanormal distribution. The majority of the variables’ gene and protein enrichment analyses were performed were normally distributed, while extreme loading values the differentially expressed variables from the transcriptomic expected tocorrespond torelevant biomarkers. Such a plot proteomic data blocks, respectively, with the Database for threshold values tobe defined for the selection of the most Visualisation and Integrated Discovery v6.7 (DAVID) variables byasimple visual examination of the loadings Melanoma cell lines were characterised by gene transcripts 6). for melanocyte differentiation, pigmentation, protein melanin metabolic process, melanosome and mesenchy- cell development. On the other hand, the NSCLC class was 4.3.2. Dataset 3–biological interpretation with regulation ofcell death, protein kinase cascade, It should be emphasised that the majority of the variables of the signalling, regulation of cell morphogenesis and nucleotide three This interpretation. considered variables tothe anapproach ular or and A the related onine of ammonia glutathione data blocks were identified genes, metabolites or proteins. information isof crucial importance for subsequent model enrichment set analysis linked melanoma cell lines As the variables of multivariate profiles have tobe biological processes such as protein kinase phosphorylation, together rather than individually, subsets of relevant protein kinase, ErbB signalling pathway, regula- detected on the Q–Q plots according totheir contribution of cell proliferation and regulation of cell cycle. The NSCLC model were further processed by ontological analysis. Such was associated with negative regulation ofcell death, aims torelate arelevant biological context toa partic- of epithelial cell proliferation, tyrosine protein kinase set ofup- ordown-regulated identified metabolites, proteins EGF signalling pathway. genes. For that purpose, on-line bioinformatics tools were used aglobal perspective, these results provide abroad overview of each method was performed with defaults parameters. biochemical events occurring incancer cells according totheir metabolite set enrichment analysis was performed with origin, towards integrative biology. The distinction ofthe two MetaboAnalyst v2.0 web server [40].The metabolic pattern of tumour was achieved on the basis of transcriptomics, tothe melanoma cell lines indicated analtered methi- and proteomics, allowing the integration of different metabolism, abnormal protein biosynthesis, dysregulation of molecular information and providing aconsistent picture the malate–aspartate shuttle and modifications related tothe current biological knowledge. Sound results were obtained recycling and urea cycle, the citric acid cycle and the relevant biological pathways could be evidenced inboth cases. metabolism. On the other hand, several metabolic

  9. 38 J. Boccard, D.N. Rutledge /Analytica Chimica Acta 769 (2013) 30– 39 blocks. weighting The first highlight ical regulatory and could toachieve ple, (1H–13C) to tiway classified mation the power data the origins, available tion characteristic pathways ability lations. As fusion is approaches sively expected Additionally, earity be strategies assess with edge method setup observations. Further developments will be undertaken todevelop a scheme accounting for these specific variations. method was implemented for three real cases examples. A dataset from the metabolomics context was investigated to its ability tointegrate data from complementary analyt- conditions inMS (ESI− and ESI+). Aglobal monitoring of the events occurring inA. leaves after wounding thaliana inthe mutant strain was obtained. Known biomarkers acd2-2 be detected and the fusion of the two data sources was helpful a coherent biological interpretation. A second exam- based on spectroscopic data obtained from two-dimensional heteronuclear NMR, illustrated the aptitude of the method separate the predictive and the orthogonal variability from mul- data structures efficiently. Red wines phenolic extracts were reliably according togrape variety and additional infor- 103bootstrap was obtained regarding the influence ofthe soil type on Fig. resampling lapping 7. Frequency distribution plot ofthe Ypredicted values from experiments. The NCSLC and MELA populations showed slightly over- polyphenolic profiles. Athird dataset was used toillustrate the distributions. of the Consensus OPLS-DA strategy tohandle heterogeneous tables from transcriptomics, proteomics and metabolomics in context of integrative biology. Cancer cell lines from two tissue Classification Finally, could treatment, assessed native empirical pled replacement validated. NSCLC The to A classes Y while diate wrong tion 20% vided robustness that 4.3.3. robustness –bootstrap resampling NSCLC and melanoma, were selected from the publicly i.e. as anaccurate classification of the tumour origin NCI-60 dataset for adifferential analysis. The combina- be helpful toimprove clinical diagnostics and subsequent ofthe three data sources provided an overall picture of the the robustness of the model predictive ability was patterns of each tumour type. Significant metabolic by abootstrap procedure. Such anapproach isanalter- and biological processes were found and the prediction toinference based on parametric assumptions that provides of the model was confirmed bybootstrap resampling calcu- 103resam- estimates of computed parameters. Aseries of dataset replicates, with observations randomly taken with kernel-based methods constitute promising tools for the from the original dataset, were computed and cross- of heterogeneous data types, the presented methodology Apredicted value of 0corresponded toasample of the expected tohave abroad field of applications for integrative class, while avalue of 1was associated with MELA cell lines. involving several data sources. Despite being exclu- resulting distributions ofthe predicted responses were helpful applied todiscriminant analysis inthe present study, itis evaluate the overlap of predictions from the two populations. tobe also relevant for quantitative regression purposes. frequency distribution plot of the predicted Yvalues from both X·XTmatrix the product was applied topreserve lin- is provided inFig. 7. According tothese results, a predicted but non-linear kernels (e.g. polynomial orGaussian) can value below 0.25 can beassociated with cell from NSCLC origin, integrated inthe modelling strategy. Inthat case, adequate avalue beyond 0.65 relates tomelanoma cells. The interme- such as pseudo-samples prediction will be mandatory to area (0.25–0.65) cannot exclude the (unlikely) occurrence of variables contributions. Promising perspectives are foreseen predictions ifathreshold of 0.5 isapplied for class attribu- the incorporation of prior biological orchemical knowl- (NSCLC <0.5 and MELA >0.5). This zone corresponds toabout through the kernel function. Finally, the Consensus OPLS-DA of the predicted values. Bootstrap resampling experiments pro- presented inthis work can be applied toany experimental therefore valuable information about the distribution and the involving several data sources tocharacterise agiven set of of the response prediction. However, ithas tobe noted the small sample size presented some limitations. Acknowledgements 5.Conclusion The Swiss Foundation for Grants inBiology and Medicine, and This work proposes arelevant methodology for Omics data Novartis 140064 the regarding are thanked for supporting this work (grant PASMP3- fusion its OPLS partitioning Aconsensus integrate single ments are compared and tests However, tion athand. constitutes dictive by based onamultiblock strategy and OPLS-DA. Itissimple in toJB). We thank Gaétan Glauser and Jean-Luc Wolfender for implementation, computationally rational and itmaintains the UHPLC-TOF/MS metabolomic data and their helpful comments modelling advantages, including interpretability based on the the dataset. Arabidopsis of systematic Y-predictive and orthogonal variation. space from multiple data blocks can be computed to Appendix A. Supplementary data complementary or shared pieces of information into a model. This strategy isparsimonious incomputer require- Supplementary the data associated with this article can be found, in and therefore well-adapted tothe massive datasets that online version, athttp://dx.doi.org/10.1016/j.aca.2013.01.022. common inOmics studies. The prediction results are improved tothose obtained with the established MBPLS algorithm References data concatenation, while computer-intensive permutation and bootstrap resampling can be easily and quickly achieved. [1] A.R. Joyce, B.O. Palsson, The model organism as a system: integrating ‘omics’ datasets, [2]P. Maibaum, validation Analyst [3] M. data:apartial Genet. Nat. Rev. Mol. Cell Biol. 7(2006) 198–210. the proposed strategy isnot purely dedicated topredic- Jonsson, S.J. Bruce, T. Moritz, J.Trygg, M. Sjostrom, R. Plumb, J.Granger, E. but also tothe description of the different sources of variation J.K. Nicholson, E. Holmes, H. Antti, Extraction, interpretation and Indeed, the lack of optimisation of the block weights of information for comparing samples inmetabolic LC/MS data sets, 130 (2005) 701–707. a limitation of the proposed method interms of pre- Perez-Enciso, M. Tenenhaus, Prediction of clinical outcome with microarray power. Moreover, the directions of variations summarised least squares discriminant analysis (PLS-DA) approach, Hum. orthogonal latent variables constitute anaverage over all data 112 (2003) 581–592.

  10. 39 J.Boccard, D.N. Rutledge /Analytica Chimica Acta 769 (2013) 30– 39 J.Trygg, Chemom. [5]S.Wold, near-infrared [6] discriminant cation, [7] M. Kernel-based 21 [8]T.G. theincrease Acta705 [9] A.K. component [10] L.E. investigating [11] Jellema, 77 [12]T.Moyon, Antignac, metabolomics Metabolomics [13] and 352–361. [14] M. biology: data, [15]F.Bach, SMO on [16] matrixwith [17] A. Wijmenga, fusioninkernel of [18] M.E. confidence Math. [19]F.Lindgren, 45–59. [20] S. with (1994) [21] S. ing:methods Heidelberg, [22]A.K. lations 25 [23]K. structured Bioinform. [24]J.A. (DQ(2)) 293–296. [4] S. Wold, Orthogonal projections tolatent structures (O-PLS), J. [25] P.W.T. and with 7007. Krooshof, B. Üstün, G.J. Postma, L.M.C. Buydens, Visualization 16(2002) 119–128. recovery of the (bio)chemical interesting variables indata analysis H. Antti, F. Lindgren, J.Ohman, Orthogonal signal correction of support vector machine classification, Anal. Chem. 82 (2010) 7000– spectra, Chemom. Intell. Lab. Syst. 44(1998) 175–185. M. Bylesjö, M. Rantalainen, O. Cloarec, J.K. Nicholson, E. Holmes, J.Trygg, OPLS Bylesjö, kernel-based interpretation [27] F. [28] plants defense [29] G. Veuthey, spectrometry extracts Chromatogr. [30] Wolfender, applied biomarkers 20–27. [31] calPCA [32] J. datafor models, [33]G. Spatial inArabidopsis 16407. [34] M.X. W.H. the oxo-phytodienoic 281 [35] metabolomics discovery: (2010) [36] mone 10280–10289. [37] G.K. Moulin, eratedcell cell [38] S. Rutledge, vectorquantization analysis, [39] R.H. Nat. [40] for W652–W660. [41] pathstoward Acids [26]M. M. Rantalainen, J.K. Nicholson, E.Holmes, J.Trygg, K-OPLS package: analysis: combining the strengths of PLS-DA and SIMCA classifi- orthogonal projections tolatent structures for prediction and J.Chemom. 20 (2006) 341–351. infeature space, BMC Bioinform. 9(2008). Rantalainen, M. Bylesjo, O. Cloarec, J.K. Nicholson, E. Holmes, J.Trygg, van den Berg, Multi-block Toolbox for MATLAB, 2004. orthogonal projections tolatent structures (K-OPLS), J.Chemom. J.T. Greenberg, A.L. Guo, D.F. Klessig, F.M. Ausubel, Programmed cell-death in (2007) 376–385. – a pathogen-triggered response activated coordinately with multiple Doeswijk, A.K. Smilde, J.A. Hageman, J.A. Westerhuis, F.A. van Eeuwijk, On functions, Cell 77 (1994) 551–563. of predictive performance with high-level data fusion, Anal. Chim. Glauser, D. Guillarme, E.Grata, J.Boccard, A. Thiocone, P.A. Carrupt, J.L. (2011) 41–47. S. Rudaz, J.L. Wolfender, Optimized liquid chromatography-mass Smilde, J.A. Westerhuis, S. de Jong, Aframework for sequential multiblock approach for the isolation of minor stress biomarkers inplant methods, J.Chemom. 17(2003) 323–337. and their identification by capillary nuclear magnetic resonance, J. Wangen, B.R. Kowalski, Amultiblock partial least squares algorithm for A1180 (2008) 90–98. complex chemical systems, J.Chemom. 3(1989) 3–20. J.Boccard, A. Kalousis, M. Hilario, P. Lanteri, M. Hanafi, G. Mazerolles, J.L. A.K. Smilde, M.J. van der Werf, S. Bijlsma, B.J. van der Werff-van-der Vat, R.H. P.A. Carrupt, S. Rudaz, Standard machine learning algorithms Fusion of mass spectrometry-based metabolomics data, Anal. Chem. toUPLC-TOF/MS metabolic fingerprinting for the discovery of wound (2005) 6729–6736. inArabidopsis thaliana,Chemom. Intell. Lab. Syst. 104 (2010) F.LeMarec, E.M. Qannari, E. Vigneau, A. LePlain, F. Courant, J.P. P. Parnet, M.C. Alexandre-Gouabau, Statistical strategies for relating J.A. Westerhuis, T. Kourti, J.F. MacGregor, Analysis of multiblock and hierarchi- and proteomics data: areal case study innutrition research area, and PLS models, J.Chemom. 12(1998) 301–321. (2012) 1–12. S.Wiklund, E. Johansson, L.Sjostrom, E.J. Mellerowicz, U. Edlund, J.P. Shockcor, L.Eriksson, M. Toft, E. Johansson, S. Wold, J.Trygg, Separating Y-predictive Gottfries, T. Moritz, J.Trygg, Visualization of GC/TOF-MS-based metabolomics Y-orthogonal variation inmulti-block spectral data, J.Chemom. 20 (2006) identification of biochemically interesting compounds using OPLS class Anal. Chem. 80 (2008) 115–122. Bylesjö, D. Eriksson, M. Kusano, T. Moritz, J.Trygg, Data integration in plant Glauser, E. Grata, L. Dubugnon, S. Rudaz, E.E. Farmer, J.L. Wolfender, the O2PLS method for combined modeling of transcript and metabolite and temporal dynamics of jasmonate synthesis and accumulation Plant J.52 (2007) 1181–1191. inresponse towounding, J.Biol. Chem. 283 (2008) 16400– G. Lanckriet, M. Jordan, Multiple kernel learning, conic duality, and the algorithm, in: Proceedings of the Twenty-first International Conference Andersson, M. Hamberg, O. Kourtchenko, A. Brunnstrom, K.L. McPhail, Machine Learning, ACM, Banff, Alberta, Canada, 2004, p. 6. Gerwick, C. Gobel, I.Feussner, M. Ellerstrom, Oxylipin profiling of G. Lanckriet, N. Cristianini, P. Bartlett, L. Ghaoui, M. Jordan, Learning the kernel hypersensitive response inArabidopsis –formation of anovel thaliana semidefinite programming, J.Mach. Learn. Res. 5(2004) 27–72. acid-containing galactolipid, arabidopside E, J.Biol. Chem. Smolinska, L. Blanchet, L. Coulier, K.A.M. Ampt, T. Luider, R.Q. Hintzen, S.S. (2006) 31528–31537. L.M.C. Buydens, Interpretation and visualization of non-linear data G. Glauser, J.Boccard, S. Rudaz, J.L. Wolfender, Mass spectrometry-based space: study on metabolomic characterization ofprogression oriented by correlation analysis for wound-induced molecule multiple sclerosis, PLoS ONE 7(2012) e38163. identification of anovel jasmonate glucoside, Phytochem. Anal. 21 Timmerman, H.A.L. Kiers, A.K. Smilde, E. Ceulemans, J.Stouten, Bootstrap 95–101. intervals inmulti-level simultaneous component analysis, Br. J. M. Erb, G. Glauser, Family business: multiple members of major phytohor- Stat. Psychol. 62 (2009) 299–318. classes orchestrate plant stress responses, Chem. Eur. J.16(2010) P. Geladi, S. Wold, The kernel algorithm for PLS, J. Chemom. 7(1993) Pattanayak, S. Venkataramani, S. Hortensteiner, L. Kunz, B. Christ, M. Rännar, F. Lindgren, P. Geladi, S. Wold, A PLS kernel algorithm for data sets A.G. Smith, Y. Okamoto, H. Tamiaki, M. Sugishima, J.T. Greenberg, Accel- many variables and fewer objects. 1. Theory and algorithm, J.Chemom. 8 death 2suppresses mitochondrial oxidative bursts and modulates 111–125. death inArabidopsis,Plant J.69 (2012) 589–600. Yu, L.C. Tranchevent, Y. Moreau, Kernel-based data fusion for machine learn- Masoum, D. Jouan-Rimbaud Bouveresse, J.Vercauteren, M. Jalali-Heravi, D.N. and applications inbioinformatics and text mining, Springer, Discrimination of wines based on 2DNMR spectra using learning Germany, 2011. neural networks and partial least squares discriminant Smilde, H.A.L. Kiers, S. Bijlsma, C.M. Rubingh, M.J. van Erk, Matrix corre- Anal. Chim. Acta 558 (2006) 144–149. for high-dimensional data: the modified RV-coefficient, Bioinformatics Shoemaker, The NCI60 human tumour cell line anticancer drug screen, (2009) 401–405. Rev. Cancer 6(2006) 813–823. Van Deun, A.K. Smilde, M.J. van der Werf, H.A.L. Kiers, I.Van Mechelen, A J.G. Xia, N. Psychogios, N. Young, D.S. Wishart, MetaboAnalyst: aweb server overview ofsimultaneous component based data integration, BMC metabolomic data analysis and interpretation, Nucleic Acids Res. 37 (2009) 10(2009). Westerhuis, E.J.J. van Velzen, H.C.J. Hoefsloot, A.K. Smilde, Discriminant Q(2) D.W. Huang, B.T. Sherman, R.A. Lempicki, Bioinformatics enrichment tools: for improved discrimination inPLSDA models, Metabolomics 4(2008) the comprehensive functional analysis of large gene lists, Nucleic Res. 37(2009) 1–13.

More Related
SlideServe
Audio
Live Player
Audio Wave
Play slide audio to activate visualizer