1 / 46

John Kalivas, Josh Ottaway , Jeremy Farrell, Parviz Shahbazikah Department of Chemistry

CONSENSUS MULTIVARIATE CALIBRATION OR MAINTENANCE WITHOUT REFERENCE SAMPLES USING TIKHONOV TYPE REGULARIZATION APPROACHES . John Kalivas, Josh Ottaway , Jeremy Farrell, Parviz Shahbazikah Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA. Outline.

jewel
Télécharger la présentation

John Kalivas, Josh Ottaway , Jeremy Farrell, Parviz Shahbazikah Department of Chemistry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CONSENSUS MULTIVARIATE CALIBRATION OR MAINTENANCE WITHOUT REFERENCE SAMPLES USING TIKHONOV TYPE REGULARIZATION APPROACHES John Kalivas, Josh Ottaway, Jeremy Farrell,ParvizShahbazikah Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA

  2. Outline • Multivariate calibration • Tikhonov regularization (TR) • TR calibration maintenance with reference samples to form full wavelength or sparse models • Selecting “a” model • Selecting a collection of models • Comparison to PLS • TR calibration or maintenance without reference samples • Examples with comparison to PLS • Summary TR variant equations

  3. Spectral Multivariate Calibration • y = Xb y = m x 1 vector of analyte reference values for m calibration samples X = m x n matrix of spectra for n wavelengths b = n x 1 regression (model) vector MLR solution; requires m≥ p (wavelength selection) Biased regression solutions such as TR, RR (a TR variant), PLS, and PCR • Requires meta-parameter (tuning parameter) selection

  4. Quantitation by Tikhonov Regularization (TR) = Euclidian vector 2-norm (vector magnitude or length) • General TR in 2-norm • Ridge regression (RR) when L = I Depending on the calibration goal, L can have different forms • RR is regularized by using I and selecting ηto minimize prediction errors (low bias) simultaneously shrinking the model vector (low variance)

  5. overfitting best model underfitting Selecting η • Cross-validation • L-curve graphic (can use with RMSEC) • Bias/Variance can be assessed • Useful for putting RR, PLS, etc. on one plot for objective comparison • C.L. Lawson, et.al., Solving Least-Squares Problems. Prentice-Hall, (1974) • P. C. Hansen, Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion, SIAM Press (1998)

  6. Calibration Maintenance • Need primary model to function over time and/or under new secondary conditions • Prepare calibration samples to span all potential spectral variances • Not possible with a seasonal or geographical effects in some data sets • Preprocess primary and secondary data to be robust to new conditions • Adjust spectra measured under new conditions to fit the primary model • Update the primary model to predict in the new conditions

  7. Calibration Maintenance with TR2 • Model updating a RR model requires a new penalty term • Minimize prediction errors for a few samples from new secondary conditions M = spectra from secondaryconditions yM= analyte reference values • Avoid measuring many samples by tuning with λ • Local centering • Respectively mean center X, y, M, and yM • Validation spectra centered to M

  8. Pharmaceutical Example • M. Dyrby, et. al., Appl. Spectrosc. 56 (2002) 579-585 • http://www.models.life.ku.dk/datasets; Dept. of Food Sciences, Univ. of Copenhagen • 310 Escitolopram tablets measured in NIR from 7,400-10,507 cm-1 at resolution 6 cm-1for 404 wavelengths • Four tablet types based on nominal weight: type 1, type 2, type 3, and type 4 • Three tablet batches (production scale): laboratory, pilot, and full • 30 tablets for each batch tablet type combination  Lab, type1  Lab, type 2  Lab, type 3  Lab, type 4  Full, type 1  Full, type 2 Full, type 3  Full, type 4

  9. Objective • Using laboratory produced tablets as the primary calibration set • Determine active pharmaceutical ingredient (API) concentration in new tablets produced in full production (secondary condition) Primary Calibration Space: 30 random lab batch samples with 15 from types 1 and 2 each Secondary Calibration Space: 30 random full batch samples with 15 from types 1 and 2 each Standardization Set M: 4 random full batch samples with 2 from types 1 and 2 each Validation Space: Remaining 30 full batch types 1 and 2 • Other batch type combinations studied

  10. Example Model Merit Landscapes RMSEC η η λ λ RMSEM η η λ λ

  11. Model Merit Landscapes RMSEC η η RMSEM λ λ • Convergence at small λ • Secondary conditions are not included in new model • Amounts to using primary RR with local centering where secondary validation samples are centered to the mean of M Best local centered models A tradeoff region Prediction of primary degrades while the prediction of secondary improves

  12. Model Merit Landscapes RMSEC η η RMSEM λ λ too large η A tradeoff region Prediction of primary degrades while the prediction of secondary improves • Further tradeoffs • Tradeoff region between and RMSEC and RMSEM • Can use an L-curve at a fixed λ value λ

  13. Model Merit Evaluations • Multiple merits can be used to assess tradeoff • Respective RMSEC and RMSEM landscapes for R2, slope, and intercept • L-curves at selected η and λ values λ = 54.29 RMSEV η H η λ λ = 54.29

  14. Model Updating Results Updating Primary Lab Batch Types 1 and 2 to Predict Secondary Full Batch Types 1 and 2 • Updated primary models predicts equivalently to the secondary model predicting the secondary validation samples Lab and Full Batches Types 1 and 2 Self Predicting Using RR

  15. Model Vectors RR Lab Batch TR2 Wavelength, cm-1 RR Full Batch Wavelength, cm-1 Wavelength, cm-1

  16. Using PLS • PLS (and other methods) can also be used • With PLS, the PLS latent vectors (PLS factors) replace the η values TR2 PLS

  17. PLS Model Merit Landscapes RMSEM RMSEC Factors • Similar landscape trends • The discrete factor aspect of PLS can make it difficult to capture the underlying continuity of the landscapes λ λ RMSEV Factors λ λ

  18. PLS and TR2 Model Updating Results Updating Primary Lab Batch Types 1 and 2 to Predict Secondary Full Batch Types 1 and 2 • PLS prediction equivalent to TR2 • The discrete factor aspect of PLS can make it difficult to capture the underlying continuity of the landscapes

  19. Sparse TR Calibration Maintenance • TR2: • TR2b (sparse model): • L = diagonal matrix with lii = 1/│bi│ • Gorodnitsky IF, RaoBD. IEEE Transactions on Signal Processing 1997; 45: 600-616. • TR2-1 (sparse model):

  20. TR2-1 Sparse Model Merit Landscapes RMSEC RMSEM • Similar landscape trends • For small λ values, the η values are the same across λ • At greater λvalues, the η values vary across λ Models with increasing η RMSEV λ λ

  21. TR2 and TR2-1 Model Updating Results Updating Primary Lab Batch Types 1 and 2 to Predict Secondary Full Batch Types 1 and 2 TR2-1 prediction results improve over TR2 and PLS TR2 TR2-1 PLS cm-1 cm-1 cm-1

  22. Other TR Sparse Maintenance Methods • TR2-1b (sparse models when L = I or L≠I): • TR1-2b (full wavelength when L = I): • TR1 (sparse models when L = I or L≠I):

  23. Other TR Applications • Updating a primary model: • for extra virgin olive oil adulterant quantitation to a new geographical region (applicable to new seasons) • to a new temperature • formed on one instrument to work on another

  24. Summary • Only a few samples needed for M with appropriate weighting • Same samples measured in primary and secondary conditions are not needed • Avoids long term stability issue • PLS and other methods can be used • Discrete nature (PLS factors) can limit landscapes • Need to select a pair of tuning parameters for “a” model • Requires reference values foryM

  25. Consensus (Ensemble) Modeling • Samples predicted with a collection of models • Composite (fused) prediction is formed • Simple mean prediction used here • Typically form models by random sampling across calibration samples and/or variables • From collection, filter for model quality • Ideal models: • High degree of prediction accuracy • Small but noteworthy difference between selected models (model diversity)

  26. Consensus TR and PLS Modeling • Models formed from varying tuning parameter values • Plot predicted values against reference values for X,yand M,yM • Use respective R2, slope, and intercept model merit values • Natural target values: • R2 → 1 • Slope → 1 • Intercept → 0

  27. TR and PLS Consensus Models (RMSEV) 1 PLS Model 348 TR2 Models η Factors λ 628 TR2-1 Models • Fewer PLS models selected due to sharpness of landscapes from the discrete factor nature of PLS • Number of “good” models can be made to increase by reducing the increment sizes of η and λ Model with increasi8ng η λ

  28. Consensus Mean Model Updating Results Updating Primary Lab Batch Types 1 and 2 to Predict Secondary Full Batch Types 1 and 2 • The one PLS model predicts best • PLS limited to discrete factors where TR allows 0 ≤ η < ∞ to more fully resolve the landscape

  29. Consensus Models and Correlations TR2 348 models TR2-1 628 models cm-1

  30. Summary • Only a few samples needed for M with appropriate weighting • Same samples measured in primary and secondary conditions are not needed • Avoids long term stability issue • Can select “a” model or a collection of models • Natural target values (thresholds) with model merits R2, slope, and intercept for primary and secondary standardization sets • Work in progress • Requires reference values foryM

  31. Without Reference Samples Beer’s law: x =yaka+yiki+ m + + n ka= pure component (PC) analyte spectrum ki= PC spectrum of ith interferent (drift, background, etc.) m = rest of the sample matrix n = spectral noise • Ideal situation: WHEN: THEN: • Cannot simultaneously satisfy 1, 2, and 3 to obtain 4

  32. Compromise PCTR2 Model N= spectra without analyte, e.g., ki • Minimizing the sum requires a tradeoff between the three conditions • The closer the three conditions are met, the more likely • Updating the non-matrix effected PC ka to predict in current conditions (spanned by N)

  33. Sources of N • PC interferent spectra • Reference values are 0 • Matrix effected samples without the analyte • Reference values are 0 • Constant analyte samples • Reference values are 0 after spectra are mean centered • Estimate using samples with reference values • Samples for N need to be measured at current conditions Goicoechea et al. Chemom. Intell. Lab. Syst. 56 (2001) 73-81

  34. Extra Virgin Olive Oil Adulteration • EVOO samples: Crete, Peloponnese, and Zakynthos • RR calibration: 56 samples spiked 5, 10, and 15% (wt/wt) sunflower oil • Primary: PC sunflower oil, 1 sample • Secondary: EVOO, 25 samples • Validation: 22 spiked samples • Synchronous fluorescence spectra 270 to 340 nm at Δλ=20 nm Zakynthos

  35. Model Merit Landscapes RMSEN RMSEPC λ λ η η RMSEV η η λ λ

  36. H Values RR with PCTR2 cal samples PCTR2 at η= 9.1e3 η λ RR full cal η

  37. Model Updating From PC Sunflower • Updated PC predicts better than a full calibration yi = 0.422xi + 0.048 yi = 0.807xi - 0.0074 yi

  38. Temperature Data Set • Wülfert, et al., Anal. Chem. 70 (1998) 1761-1767 • hhttp://www.models.life.ku.dk/datasets; Dept. of Food Sciences, Univ. of Copenhagen • Water, 2-propanol, ethanol (analyte) • 850 to 1049 nm at 1 nm intervals at 30, 40, 50, 60, and 70°C • Calibration: 13 mixtures from 0% to 67% at 30°C • Validation: 6 mixtures from 16% to 66% at 70°C • Primary: PC ethanol at 30°C • Non-analyte matrix (standardization set) N at 70°C • PC interferents water and 2-propoanol (2 samples) • Blanks (3 samples) • Constant analyte (CA, 5 samples)

  39. PCTR2 Model Merit Landscapes RMSEPC RMSEN η RMSEV η λ λ

  40. Model Updating From PC 30°C to 70°C Updated PC predicts as well as secondary model predicting the secondary validation samples

  41. PCTR2 and PLS Modeling Temperature Updating analyte PC at 30°C to 70°C using interferent PC and blanks PLS and PCTR2 predict similarly PCTR2 RMSEV PLS RMSEV η Factors λ λ

  42. PCTR2 Consensus Modeling Temperature • On-going work • Cannot use R2, slope, and intercept for respective predicted values of the PC and N • Set thresholds for RMSEN, RMSPC, and based on preliminary inspection of landscapes • Tradeoff needed between , RMSEN and RMSEPC • Can further filter based on predicted values • Majority vote • Remove outliers • Combine predicted value of analyte pure component sample with predicated non-analyte samples to obtain R2, slope, and intercept

  43. PCTR Variants (Calibration or Maintenance) • No reference values • With current condition sample reference values • A combination of N and M • Replace with or to obtain sparse models

  44. Summary • PCTR2 calibrates (updates) to current conditions without reference samples • Only a few new samples needed • Can predict better than a full calibration • More focused toorthogonalize to the sample matrix • Requires PC analyte spectrum • Does not have to be matrix effected • Requires non-analyte samples • Can be estimated with reference samples bias variance

  45. Other TR Variants

  46. Other On-Going Consensus Modeling • In addition to combining a set of models, can combine TR2, PLS, PCTR2, … sets of model predictions Consensus TR2 models Consensus PLS models Consensus PCTR2 models Final prediction Consensus TR2-1 models

More Related