1 / 23

A Martingale Framework for Concept Change Detection in Time-Varying Data Stream Ho Shen-Shyang

A Martingale Framework for Concept Change Detection in Time-Varying Data Stream Ho Shen-Shyang sho@gmu.edu Department of Computer Science George Mason University. Preview:. Problem: In a data streaming setting, data points are observed one by one. The concepts to be learned from

devika
Télécharger la présentation

A Martingale Framework for Concept Change Detection in Time-Varying Data Stream Ho Shen-Shyang

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Martingale Framework for Concept Change Detection in Time-Varying Data Stream Ho Shen-Shyang sho@gmu.edu Department of Computer Science George Mason University

  2. Preview: • Problem: In a data streaming setting, data points are observed one by one. The concepts to be learned from the data stream may change infinitely often. • How do we detect the changes efficiently? • Other Topics: Concept Drift, Anamoly detection, ... ... • Testing Exchangeability Online (Vovk et.al., ICML 2003)

  3. Outline: • Background: Strangeness, Martingale, Exchangeability, • Martingale Framework - Two Tests • Theoretical Justifications • Additional Theoretical Results • Experimental Results

  4. Strangeness Measure (Saunders et. al., IJCAI 1999) : scoring how a data point is different from the rest. α • Support Vector Machine: Value of Lagrange Multipler or Distance from the hyperplane (we use SVM/Lagrange Multiplier – incremental SVM (Cauwenberghs and Poggio, NIPS 2000)) • K-nearest-neighbor rule: A/B where A – Sum of the distance of a point from the k nearest points with the same label B – Sum of the distance of a point from the k nearest points with different label

  5. Testing Exchangeability: Definitions Let { Zi : 1 ≤ i < ∞ } be a sequence of r.v. A finite sequence of r.v.Z1,...,Zn is exchangeable if the joint distribution p(Z1,...,Zn) is invariant under any permutation of the indices of the r.v. A martingale is a sequence of r.v. { Mi : 0 ≤ i < ∞ } such that Mn is a measurable function of Z1,...,Zn for all n = 0, 1, ... (M0 is a constant, say 1) and the conditional expectation of Mn+1 given M1,...,Mn is equal to Mn, i.e. E(Mn+1 | M1,...,Mn ) = Mn

  6. Testing Exchangeability (Vovk et. al., ICML 2003) pn= V(Z U {zn}, θn) = where ε in [0,1] (say 0.92) and M0= 1

  7. Performing Kolmogorov-Smirnov Test on the p-value distribution as data is observed one by one. Skewed p-value distribution: small p-values inflate the martingale values

  8. Martingale Framework: Test for Change Detection Consider the simple null hypothesis H0: “no concept change in the data stream” against the alternative hypothesis H1: “concept change occurs in the data stream”

  9. Martingale Framework: Test for Change Detection Martingale Test 1 (MT1) 0 < Mn(ε)< λ where λ is a positive number. One rejects the null hypothesis when Mn(ε) ≥ λ. Martingale Test 2 (MT2) 0 < | Mn(ε) - Mn-1(ε) |< t where t is a positive number. One rejects the null hypothesis when | Mn(ε) - Mn-1(ε) | ≥ t.

  10. Justification for Martingale Test 1: Doob's Maximal Inequality Assuming that { Mi : 0 ≤ i < ∞ } is a nonnegative martingale, the Doob's Maximal Inequality states that for any λ > 0 and 0 ≤ n < ∞, Hence, if E(Mn) = E(M0) = 1, then

  11. Justification for Martingale Test 2 Hoeffding-Azuma Inequality Let c1, ..., cm be positive constants and let Y1, ..., Ym be a martingale difference sequence with |Yk| ≤ ck for each k. Then for any t ≥ 0, At each n, the martingale difference is maximum and bounded when pn is 1/n for the deterministic martingale (θn=1 for all n)

  12. Justification for Martingale Test 2: When m = 1, the Hoeffding-Azuma Inequality becomes Assuming that Mn-1(ε) = M0(ε) = 1,

  13. Comparison:

  14. Some Theoretical Results for Martingale Test 1 (Ho & Wechsler, UAI 2005) • Martingale Test based on the Doob's Inequality is an approximaton of the sequential probability ratio test. Where α is the desirable size (type I error) and β is the probability of the type II error • The mean delay time from the true change point is: where

  15. Experiments Number of Correct Detections Number of Detections Precision = Recall = Number of Correct Detections Number of True Changes Precision: Probability that a detection is actually correct Recall: Probability that the system recognizes a true change Delay time (for a detected change): the number of time units from a true change point to the detected change point, if any

  16. Experimental Results: Synthetic Data Stream with noise (10-D Rotating Hyperplane) – Precision and Recall

  17. Experimental Results: Synthetic Data Stream – Mean and Median Delay Time

  18. Experimental Results: Numerical (WaveNorm & TwoNorm) and Categorical data streams (Nursery)

  19. Experimental Results: Multi-class data streams (Modified USPS data-set) Dataset: 10 classes, 256 dimensions, 7291 data points Data stream: 3 classes.

  20. Experimental Results: Multi-class data streams (Modified USPS data-set)

  21. Conclusions: • Our martingale approach is an efficient, one-pass incremental algorithm that • Does not require a sliding window on the data stream • Does not require monitoring the performance of a base classifier as data is streaming • Works well for high dimensional, multiclass data stream • Theoretically justified.

  22. Conclusions/Future (Current) Work: • Previous works: Kifer et. al. (VLDB 2004), Fan et. al.(SDM 2004), Wald (1947), Page (1957) ...... • Extension to Unlabeled and One-class data streams • Application: Keyframe Extraction, Anomaly Detection, Adaptive Classifier (Ho and Wechsler, IJCAI 2005) • Comparison using different classifiers (i.e. Different strangeness measure, also weak classifiers) • Comparison with other change detection algorithms. • http://cs.gmu.edu/~sho/research/change_detection.html Acknowledgement: Vladimir Vovk, Harry Wechsler.

More Related