1 / 39

Dual data driven SIMCA as a one-class classifier

Dual data driven SIMCA as a one-class classifier. Alexey Pomerantsev ICP RAS. Target. Alternative. One-class classifier, e.g. SIMCA. Standard bi- variate normal distribution. Extremes and Outliers.  =0.01 γ =0.05. α is Extreme significance. γ is Outlier significance. Extreme plot. A.

Télécharger la présentation

Dual data driven SIMCA as a one-class classifier

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dual data driven SIMCA as a one-class classifier Alexey PomerantsevICP RAS WSC-9

  2. Target Alternative One-class classifier, e.g. SIMCA WSC-9

  3. Standard bi-variate normal distribution WSC-9

  4. Extremes and Outliers =0.01γ=0.05 αis Extreme significance γis Outlier significance WSC-9

  5. Extreme plot WSC-9

  6. A t EA I + = × J I I PA J J A X TA Principal Component Analysis Karl Pearson, 1901 WSC-9

  7. OD: distance to the model SD: distance within the model Scores & Orthogonal Distances WSC-9

  8. SD & OD distributions SD OD Westerhuis JA, Gurden SP, Smilde AK.Chemom. Intell. Lab. Syst. 2000; 51: 95–114 De Braekeleer K, et al. Chemom. Intell. Lab. Syst.1999; 46:103–116. Nomikos P, MacGregor JF. Technometrics 1995; 37: 41–59. Pomerantsev A. J. Chemometrics, 2008, 22 : 601-609 WSC-9

  9. hu = v u0= ? N = ? Data Driven SIMCA SD OD WSC-9

  10. Total Distance Scores distance (SD) Orthogonal distance (OD) Total distance (TD) WSC-9

  11. Tolerance Areas αis Extreme significance γis Outlier significance WSC-9

  12. Classical Data Driven (CDD) SIMCA Classical Method of Moments Given Then Where WSC-9

  13. Robust Data Driven (RDD) SIMCA ¼ ¼ u(1) ≤ u(2 )≤ .... ≤ u(I-1) ≤ u(I) ½ ½ Robust Method of Moments Given Then Where M=median(u) R=interquartile(u) WSC-9

  14. Dual Data Driven SIMCA Given X=TtP+E h=(h1,...., hI) v=(v1,...., vI) Then WSC-9

  15. Case study I. Simulated data with outliers The numbers of variables, J=3 The numbers of objects, I=100 The number of principal components, A=2The properties are:E() = 0, v11= v22 = v33 = 0.28, rank(V) = 2. The  component properties are: E() = 0, =0.05 (first 97 objects)E() = 0, =0.2 (last 3 objects) WSC-9

  16. SIMCA plots WSC-9

  17. REFERENCE & RDD-SIMCA WSC-9

  18. Totally in 10 data sets with outliers Expected WSC-9

  19. Case study II. Real world data with 2 groups Substance in the closed PE bags, 82 drums measured by NIR.Totally: 246 spectra Group G1: 200 objectsGroup G2: 46 objects ACA 642 (2009) 222-227 WSC-9

  20. Probe position effect WSC-9

  21. Extreme plots Expected number of extremes N=aI Clean subset G1 Contaminated dataset G1+G2 WSC-9

  22. Results of separation Subset G1 revealed Subset G2 revealed WSC-9

  23. Reference WSC-9

  24. Alternatives Target class One-class classification Type II error =1−Type I error WSC-9

  25. How to find β in case AC is known Target Alternative WSC-9

  26. Two-classes discrimination: plums & apples mesh size ? WSC-9

  27. Errors of Type I and Type II WSC-9

  28. Type II error β Alternative Target PCA PC2 PC1 χ'2 χ2 WSC-9

  29. Non-central chi-squared distribution chi-squared distribution non-central chi-squared distribution the noncentrality parameter WSC-9

  30. Calculation of β Total distance of Target class (TC) h0=? ,v0=?, Nh=?, Nv=? Total distance of Alternative class (AC) Type II error WSC-9

  31. Case study II. Real world data with 2 groups Substance in the closed PE bags, 82 drums measured by NIR.Totally: 246 spectra Group G1: 200 objectsGroup G2: 46 objects Type II error estimation WSC-9

  32. G2 = AC1 + AC2 + AC3 + AC4 WSC-9

  33. Total distance c distributions β α WSC-9

  34. Type II validation WSC-9

  35. Risk management givenα calculatedccrit foundβ givenβ foundα calculatedccrit WSC-9

  36. Conclusion 1 Extreme objects play an important role in data analysis. These objects should not be confused with outliers. The number of extremes should be compared to the expected number, coupled with the significance level . Clean dataset Contaminated dataset WSC-9

  37. Conclusion 2 Errors in decision making are inevitable. Reducing one error, we increase the other. The researcher's task is to find the balance of risks. Our approach provides such an opportunity.Examples will be presented in Oxana’s lecture. β α WSC-9

  38. Conclusion 3 The proposed Dual Data Driven PCA/SIMCA approach looks like a fine competitor to the pure classical and to the strictly robust methods. This technique has demonstrated a proper performance in the analysis of both regular and contaminated data sets. Clean dataset Contaminated dataset WSC-9

  39. Thank you for your attention A Lawyer’s Mistake WSC-9

More Related