1 / 41

Finding surprising patterns in a time series database in linear time and space

Finding surprising patterns in a time series database in linear time and space. Eamonn Keogh University of California, Riverside, CA Stefano Lonardi University of California, Riverside, CA Bill 'Yuan-chi' Chiu University of California, Riverside, CA KDD '02. 鍾宜珍 黃介揚 侯元忠 張仲威. Outline.

astra
Télécharger la présentation

Finding surprising patterns in a time series database in linear time and space

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding surprising patterns in a time series database in linear time and space • Eamonn Keogh University of California, Riverside, CA • Stefano Lonardi University of California, Riverside, CA • Bill 'Yuan-chi' Chiu University of California, Riverside, CA • KDD '02 鍾宜珍黃介揚侯元忠張仲威

  2. Outline • Introduction • TARZAN algorithm • Experimental evaluation • Conclusion

  3. What is time series • Time series is a collection of observations made sequentially in time • example: weather patterns, commodity price, economic activity

  4. What is surprising patterns? • surprising: pattern not expected • in this paper: The departure of frequency of the pattern from its expected frequency is beyond we could accept • Example:human electrocardiogram

  5. Introduce TARZAN

  6. Outline of TARZAN build suffix tree discretize abcaabb… markov chain computing scores calculate expected frequency Surprising pattern??

  7. STEP 1Discretizing time series discretize abcaabb…

  8. why? • Markov chain is a discrete time stochastic process

  9. How?

  10. Discretizing time series

  11. Discretizing time series

  12. Discretizing time series

  13. Discretizing time series

  14. Discretizing time series • 0.90.90.6-1.1-0.3 • After sorting • -1.1-0.30.60.90.9

  15. Discretizing time series

  16. Discretizing time series Aaacb

  17. STEP 2build suffix tree and markov model build suffix tree calculate expected frequency markov chain

  18. Suffix Trees • Advantage: • space-efficient

  19. Suffix Trees • cocoa • cocoa • ocoa • coa • oa • a

  20. What is Markov model?

  21. 0 1 0 0 1 1 0 0 0 1 1 P = Transition matrix 1

  22. 欲知第一天生一隻豬(0),第四天生兩隻豬(1)的機率?欲知第一天生一隻豬(0),第四天生兩隻豬(1)的機率?

  23. Back to the paper

  24. transition matrix :

  25. We still don’t know the true model • maximum likelihood estimator • Let y be a substring of x and • then,

  26. Explain TARZAN algorithm

  27. Compare Tree

  28. Tarzan Algorithm

  29. Experiments

  30. Experiments • sensitivity • selectivity • compare with tsa-tree and IMM

  31. Experiment1 sensitivity- 是否可以找到anomaly

  32. Experiment2 • the power demand of a Dutch research facility • input 一整年的資料, 觀察是否偵測到假日電量的異常

  33. Experiment3- selectivity • how? • random walk data- can contain any possible pattern • when size goes infinity, every pattern should be repeated • not consider IMM, TSA-tree • IMM- self will become saturated, thus, nothing will be surprising • TSA- not learn from experience

  34. Experiment3

  35. Conclusion • pros • linear time and space • great sensitivity • 不需要特別定義surprising patterns • cons • tree的形狀不能控制, 可能會不對稱 • selectivity 沒有跟其他演算法比較

  36. Application • 網路方面- 流量異常, 偵測是否有攻擊 • 生醫方面- 偵測心律不整, 觀察腦波異常活動 • 工廠觀測- 供電量, 輸送速度是否異常

  37. Discussion • 偵測異常的反應速度? • 是否適用於real time的偵測? • 在偵測異常之後的動作... • 是否可以追溯到發生異常的原點? • 是否能知道為何造成異常?

  38. Reference • C. Shahabi, X. Tian, and W. Zhao. Tsa-tree: A Wavelet-Based Approach to Improve the Efficiency of Multi-Level Surprise and Trend Queries on Time-Series Data. In Proc. 12th International Conference on Scientific and Statistical Database Management, 2000. • D. Dasgupta and S. Forrest. Novelty Detection in Time Series Data using Ideas from Immunology. In Proc. of The International Conference on Intelligent Systems, 1999.

  39. Questions?

More Related