1 / 46

Mining of Frequent Patterns from Sensor Data

Mining of Frequent Patterns from Sensor Data Presented by: Ivy Tong Suk Man Supervisor: Dr. B C M Kao 20 August, 2003 Outline Outline of the Presentation Motivation Problem Definition Algorithm Apriori with data transformation Interval-List Apriori Experimental Results Conclusion

issac
Télécharger la présentation

Mining of Frequent Patterns from Sensor Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining of Frequent Patterns from Sensor Data Presented by: Ivy Tong Suk Man Supervisor: Dr. B C M Kao 20 August, 2003

  2. Outline • Outline of the Presentation • Motivation • Problem Definition • Algorithm • Apriori with data transformation • Interval-List Apriori • Experimental Results • Conclusion

  3. 25ºC 27ºC 28ºC 26ºC t 0 1 5 10 Motivation • Continuous items • reflect values from an entity that changes continuously in the external environment. • Update  Change of state of the real entity • E.g. temperature reading data • Initial temperature: 25ºC at t=0s • Sequence of updates: <timestamp, new_temp> <1s, 27ºC>, <5s, 28ºC>, <10s, 26ºC>, <14s,..> … • t=0s to 1s, 25ºC t=1s to 5s, 27ºC t=5s to 10s, 28ºC • What is the average temperature from t=0s to 10s? • Ans: (25x1+27x4+28x5)/10 = 27.3ºC

  4. Motivation • Time is a component in some applications • E.g. stock price quotes, network traffic data • “Sensors” are used to monitor some conditions, for example: • Prices of stocks: by getting quotations from a finance website • Weather: measuring temperature, humidity, air pressure, wind, etc. • We want to find correlations of the readings among a set of sensors • Goal: To mine association rules from sensor data

  5. Challenges • How different is it from mining association rules from market basket data? • Time component When searching for association rules in market basket data, time field is usually ignored as there is no temporal correlation between the transactions • Streaming data Data arrives continuously, possibly infinitely, and in large volume

  6. Notations • We have a set of sensors R = {r1,r2,…,rm} • Each sensor ri has a set of numerical states Vi • Assume binary states for all sensors • Vi = {0,1} i s.t. ri R • Dataset D: a sequence of updates of sensor state in the form of <ts, ri, vi> where ri R, vi Vi • ts : timestamp of the update • ri: sensor to be updated • vi: new value of the state of ri • For sensors with binary states • update in form of <ts, ri> as the new state can be inferred by toggling the old state

  7. Example • R={A,B,C,D,E,F} • Initial states: all off • D: <1,A> <2,B> <4,D> <5,A> <6,E> <7,F> <8,E> <10,A> <11,F> <13,C> A t 0 1 5 10 B t 2 C t 13 D t 4 E t 6 8 F t 7 11

  8. More Notations • An association rule is a rule, satisfying certain support and confidence restrictions, in the form X  Ywhere XR, YR and XY=

  9. More Notations • Association rule X  Y has confidence c, In c % of the time when the sensors in X are ON (with state = 1), the sensors in Y are ON • Association rule X  Y has support s, In s% of the total length of history, the sensors in X and Y are ON

  10. More Notations • TLS(X) denote Total LifeSpan of X • Total length of time that the sensors in X are ON • T – total length of history • Sup(X) = TLS(X)/T Conf(X  Y) = Sup(X U Y) / Sup(X) • Example: T = 15s TLS(A)=9, TLS(AB)=8 Sup(A) = 9/15 = 60% Sup(AB) =8/15 = 53% Conf(A->B) = 8/9 = 89% A t 0 1 5 10 B t 2

  11. Algorithm A • Transform & Apriori • Transform the sequence of updates to the form of market basket data • At each point of update • take a snapshot of the states of all sensors • Output all sensors with state=on as a transaction • Attach Weight(transaction) = Lifespan(this update) = timestamp(next update) – timestamp(this update)

  12. Initial states: all off D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C> Algorithm A - Example A t 0 1 5 10 B t 2 Transformed database D’: C t 13 D t 4 E t 6 8 F t 7 11

  13. Initial states: all off D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C> Algorithm A - Example A t 0 1 5 10 B t 2 Transformed database D’: C t 13 D t timestamp=1 4 E t 6 8 F t 7 11 timestamp=1

  14. Initial states: all off D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C> Algorithm A - Example A t 0 1 5 10 B t 2 Transformed database D’: C t 13 D t timestamp=1 4 timestamp=2 E t 6 8 F t 7 11 timestamp=2

  15. Initial states: all off D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C> Algorithm A - Example A t 0 1 5 10 B t 2 Transformed database D’: C t 13 D t 4 timestamp=2 E t 6 8 timestamp=4 F t 7 11 timestamp=4

  16. Initial states: all off D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C> Algorithm A - Example A t 0 1 5 10 B t 2 Transformed database D’: C t 13 D t 4 E t 6 8 F t 7 11 End of history = 15s timestamp=13

  17. Initial states: all off D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C> Algorithm A - Example A t 0 1 5 10 B t 2 Transformed database D’: C t 13 D t 4 E t 6 8 F t 7 11

  18. Algorithm A • Apply Apriori on the transformed dataset D’ • Drawbacks: • A lot of redundancy • Adjacent transactions may be very similar, differed by the one sensor with state update

  19. Algorithm B • Interval-List Apriori • Uses an “interval-list” format • <X, interval1, interval2, interval3, … > where intervali is the interval in which all sensors in X are on. • TLS(X) =  (intervali.h – intervali.l) • Example: A t 0 1 5 10 <A, [1,5), [10,15)> TLS(A) = (5-1) + (15-10) = 9

  20. Algorithm B • Step 1: For each ri R, build a list of interval in which ri is ON by scanning the sequence of updates • Calculate the TLS of each ri • If TLS(ri)  min_sup, put ri into L1

  21. Algorithm B – Example • Initial states: all off • D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C> • <A, empty> • <B, empty> • <C, empty> • <D, empty> • <E, empty> • <F, empty>

  22. Algorithm B – Example • Initial states: all off • D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C> • <A, [1,?)> • <B, empty> • <C, empty> • <D, empty> • <E, empty> • <F, empty>

  23. Algorithm B – Example • Initial states: all off • D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C> • <A, [1,?)> • <B, [2,?)> • <C, empty> • <D, empty> • <E, empty> • <F, empty>

  24. Algorithm B – Example • Initial states: all off • D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C> • <A, [1,5)> • <B, [2,?)> • <C, empty> • <D, [4,?)> • <E, empty> • <F, empty>

  25. Algorithm B – Example • Initial states: all off • D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C> • <A, [1,5),[10,?)> • <B, [2,?)> • <C, [13,?)> • <D, [4,?)> • <E, [6,8)> • <F, [7,11)>

  26. Algorithm B – Example • Initial states: all off • D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C> • <A, [1,5),[10,15)> • <B, [2,15)> • <C, [13,15)> • <D, [4,15)> • <E, [6,8)> • <F, [7,11)> End of history T =15s

  27. Algorithm B • Step 2: • Find all larger frequent sensor-sets • Similar to Apriori Frequent Itemst Property • Any subset of a frequent sensor-set must be frequent. • Method: • Generate candidates of size i+1 from frequent sensor-sets of size i. • Approach used: join to obtain sensor-sets of size i+1 if two size-i frequent sensor-sets agree on i-1 • May also prune candidates who have subsets that are not large. • Count the support by merging (intersection of) the interval lists of the two size-i frequent sensor-sets • If sup  min_sup, put into Li+1 • Repeat the process until the candidate set is empty

  28. Algorithm B • Example: • <A, [1,5), [10,15)> • <B, [2,15)> • <AB, [2,5),[10,15)> A t 0 1 5 10 B t 2 T=15

  29. Algorithm B (Example) C D E F A B LS:2 LS:11 LS:2 LS:4 LS:13 LS:9 AB AF BF BD AD LS:1 LS:4 LS:11 LS:6 LS:8 ABD Min support count: 3 LS:6

  30. Algorithm B – Candidate Generation • When generating a candidate sensor-set C of size i from two size i-1 sensor-sets LA and LB (subsets of C), we also construct the interval list of C by intersecting the interval lists of LA and LB. • Joining the two interval lists (of length m and n) is a key step in our algorithm • Use simple linear scan requires O(m+n) time • There are i different size i-1 subset of C which two to pick?

  31. Algorithm B – Candidate Generation • Method 1: • Choose two lists with fewest no of intervals • =>Store no of intervals for each sensor-set • Method 2: • Choose two lists with smallest count (TLS) • Intuitively shorter lifespan implies fewer intervals • Easier to implement • Have the lifespan when checking if the sensor-set is frequent

  32. Experiments • Data generation • Stimulate data generated by a set of n binary sensors • Make use of a standard market basket data • With n sensors, each of which can be either on or off =>2n possible combination of sensor states • Assign a probability to each of the combinations

  33. Experiments – Data Gen • How to assign the probabilities? • Let N be the no of occurrences of the transaction in the market basket that contains exactly only the sensors that are ON • E.g. Consider R={A,B,C,D,E,F} • Suppose we want to assign prob to the sensor state AC (only A and C are ON) • N is no of transactions that contain exactly only A and C • Assign prob = N/|D|, where |D| is the size of the market basket dataset • Note: Need sufficiently large market basket data • transactions that occur very infrequently will not be given ZERO probability

  34. Experiments – Data Gen • Generating sensor set data • Choose the initial state (at t=0s) • Randomly • According to the probabilities assigned • Pick the combination with highest probability assigned => first sensor set states

  35. Experiment – Data Gen • What is the next set of sensor-set states? • For simplicity, in our model, only one sensor can be updated at a time • For any two adjacent updates, the sensor-set states at the two time instants are differed by only one sensor => change only one sensor state => n possible combinations by toggling each of the n sensor states • We normalize the probabilities of the n combinations by their sum • Pick the next set of sensor-set states according to the normalized probabilities • Inter-arrival time of updates: exponential distribution

  36. Experiments • Market Basket Dataset • 8,000,000 transactions • 100 items • number of maximal potentially large itemsets = 2000 • average transaction length: 10 • average length of maximal large itemsets: 4 • length of the maximal large itemsets: 11 • minimum support: 0.05% • length of the maximal large itemsets: ? • Algorithms: • Apriori: cached mode • IL-apriori: • (a) random-join (IL-apriori) • (b) join-by-smallest lifespan (IL-apriori-S) • (c) join-by-fewest-no-of-intervals (IL-apriori-C)

  37. Experiments - Results • Performance of algorithms (larger support): • All IL-apriori algorithms outperform cache apriori

  38. Experiments - Results • Performance (lower support): • More candidates => IL-apriori: Expensive to join interval lists

  39. Experiments - Results • More long frequent sensor-sets • Apriori has to match the candidates by search through the DB • IL-apriori-C and IL-apriori-S reduce a lot of time in joining the lists

  40. Experiments - Results • Amounts of memory usage - peak memory usage • Cache apriori - store the whole database • IL-apriori – store a lot of interval lists when no of candidates is growing large

  41. Experiments – Results Experiments - Results (min_sup = 0.02%) • Apriori is faster in the first 3 passes • Running time for IL-apriori drops sharply after • Apriori has to scan over the whole database • IL-apriori (C/S) needs to join relatively short interval-lists in later passes

  42. Experiments - Results (min_sup = 0.02%) • Memory requirement for IL-apriori is a lot higher when there are more frequent sensor-set interval lists to join

  43. Experiments - Results (min_sup = 0.05%) • Runtime for all algorithms increases linearly with total number of transactions

  44. Experiments - Results (min_sup = 0.05%) • Memory required by all algorithms increases as no of transactions increases. • Rate of increase in IL-apriori is faster

  45. Conclusions • Interval-list method to mine sensor data is described • Two interval list joining strategies are quite effective in reducing running time • Memory requirement is quite high • Future Work • Other methods for joining interval-lists • Trade-off between time and space • Extending to the streaming case • Consider approaches other than Lossy Counting Algorithms (Manku, and R. Motwani, VLDB’02)

  46. Q&A

More Related