Mining of Frequent Patterns from Sensor Data

Mining of Frequent Patterns from Sensor Data Presented by: Ivy Tong Suk Man Supervisor: Dr. B C M Kao 20 August, 2003

Outline • Outline of the Presentation • Motivation • Problem Definition • Algorithm • Apriori with data transformation • Interval-List Apriori • Experimental Results • Conclusion

25ºC 27ºC 28ºC 26ºC t 0 1 5 10 Motivation • Continuous items • reflect values from an entity that changes continuously in the external environment. • Update  Change of state of the real entity • E.g. temperature reading data • Initial temperature: 25ºC at t=0s • Sequence of updates: <timestamp, new_temp> <1s, 27ºC>, <5s, 28ºC>, <10s, 26ºC>, <14s,..> … • t=0s to 1s, 25ºC t=1s to 5s, 27ºC t=5s to 10s, 28ºC • What is the average temperature from t=0s to 10s? • Ans: (25x1+27x4+28x5)/10 = 27.3ºC

Motivation • Time is a component in some applications • E.g. stock price quotes, network traffic data • “Sensors” are used to monitor some conditions, for example: • Prices of stocks: by getting quotations from a finance website • Weather: measuring temperature, humidity, air pressure, wind, etc. • We want to find correlations of the readings among a set of sensors • Goal: To mine association rules from sensor data

Challenges • How different is it from mining association rules from market basket data? • Time component When searching for association rules in market basket data, time field is usually ignored as there is no temporal correlation between the transactions • Streaming data Data arrives continuously, possibly infinitely, and in large volume

Notations • We have a set of sensors R = {r1,r2,…,rm} • Each sensor ri has a set of numerical states Vi • Assume binary states for all sensors • Vi = {0,1} i s.t. ri R • Dataset D: a sequence of updates of sensor state in the form of <ts, ri, vi> where ri R, vi Vi • ts : timestamp of the update • ri: sensor to be updated • vi: new value of the state of ri • For sensors with binary states • update in form of <ts, ri> as the new state can be inferred by toggling the old state

Example • R={A,B,C,D,E,F} • Initial states: all off • D: <1,A> <2,B> <4,D> <5,A> <6,E> <7,F> <8,E> <10,A> <11,F> <13,C> A t 0 1 5 10 B t 2 C t 13 D t 4 E t 6 8 F t 7 11

More Notations • An association rule is a rule, satisfying certain support and confidence restrictions, in the form X  Ywhere XR, YR and XY=

More Notations • Association rule X  Y has confidence c, In c % of the time when the sensors in X are ON (with state = 1), the sensors in Y are ON • Association rule X  Y has support s, In s% of the total length of history, the sensors in X and Y are ON

More Notations • TLS(X) denote Total LifeSpan of X • Total length of time that the sensors in X are ON • T – total length of history • Sup(X) = TLS(X)/T Conf(X  Y) = Sup(X U Y) / Sup(X) • Example: T = 15s TLS(A)=9, TLS(AB)=8 Sup(A) = 9/15 = 60% Sup(AB) =8/15 = 53% Conf(A->B) = 8/9 = 89% A t 0 1 5 10 B t 2

Algorithm A • Transform & Apriori • Transform the sequence of updates to the form of market basket data • At each point of update • take a snapshot of the states of all sensors • Output all sensors with state=on as a transaction • Attach Weight(transaction) = Lifespan(this update) = timestamp(next update) – timestamp(this update)

Initial states: all off D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C> Algorithm A - Example A t 0 1 5 10 B t 2 Transformed database D’: C t 13 D t 4 E t 6 8 F t 7 11

Initial states: all off D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C> Algorithm A - Example A t 0 1 5 10 B t 2 Transformed database D’: C t 13 D t timestamp=1 4 E t 6 8 F t 7 11 timestamp=1

Initial states: all off D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C> Algorithm A - Example A t 0 1 5 10 B t 2 Transformed database D’: C t 13 D t timestamp=1 4 timestamp=2 E t 6 8 F t 7 11 timestamp=2

Initial states: all off D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C> Algorithm A - Example A t 0 1 5 10 B t 2 Transformed database D’: C t 13 D t 4 timestamp=2 E t 6 8 timestamp=4 F t 7 11 timestamp=4

Initial states: all off D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C> Algorithm A - Example A t 0 1 5 10 B t 2 Transformed database D’: C t 13 D t 4 E t 6 8 F t 7 11 End of history = 15s timestamp=13

Initial states: all off D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>, <11,F>,<13,C> Algorithm A - Example A t 0 1 5 10 B t 2 Transformed database D’: C t 13 D t 4 E t 6 8 F t 7 11

Algorithm A • Apply Apriori on the transformed dataset D’ • Drawbacks: • A lot of redundancy • Adjacent transactions may be very similar, differed by the one sensor with state update

Algorithm B • Interval-List Apriori • Uses an “interval-list” format • <X, interval1, interval2, interval3, … > where intervali is the interval in which all sensors in X are on. • TLS(X) =  (intervali.h – intervali.l) • Example: A t 0 1 5 10 <A, [1,5), [10,15)> TLS(A) = (5-1) + (15-10) = 9

Algorithm B • Step 1: For each ri R, build a list of interval in which ri is ON by scanning the sequence of updates • Calculate the TLS of each ri • If TLS(ri)  min_sup, put ri into L1

Algorithm B – Example • Initial states: all off • D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C> • <A, empty> • <B, empty> • <C, empty> • <D, empty> • <E, empty> • <F, empty>

Algorithm B – Example • Initial states: all off • D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C> • <A, [1,?)> • <B, empty> • <C, empty> • <D, empty> • <E, empty> • <F, empty>

Algorithm B – Example • Initial states: all off • D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C> • <A, [1,?)> • <B, [2,?)> • <C, empty> • <D, empty> • <E, empty> • <F, empty>

Algorithm B – Example • Initial states: all off • D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C> • <A, [1,5)> • <B, [2,?)> • <C, empty> • <D, [4,?)> • <E, empty> • <F, empty>

Algorithm B – Example • Initial states: all off • D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C> • <A, [1,5),[10,?)> • <B, [2,?)> • <C, [13,?)> • <D, [4,?)> • <E, [6,8)> • <F, [7,11)>

Algorithm B – Example • Initial states: all off • D: <1,A>,<2,B>,<4,D>,<5,A>, <6,E>,<7,F>,<8,E>,<10,A>,<11,F>,<13,C> • <A, [1,5),[10,15)> • <B, [2,15)> • <C, [13,15)> • <D, [4,15)> • <E, [6,8)> • <F, [7,11)> End of history T =15s

Algorithm B • Step 2: • Find all larger frequent sensor-sets • Similar to Apriori Frequent Itemst Property • Any subset of a frequent sensor-set must be frequent. • Method: • Generate candidates of size i+1 from frequent sensor-sets of size i. • Approach used: join to obtain sensor-sets of size i+1 if two size-i frequent sensor-sets agree on i-1 • May also prune candidates who have subsets that are not large. • Count the support by merging (intersection of) the interval lists of the two size-i frequent sensor-sets • If sup  min_sup, put into Li+1 • Repeat the process until the candidate set is empty

Algorithm B • Example: • <A, [1,5), [10,15)> • <B, [2,15)> • <AB, [2,5),[10,15)> A t 0 1 5 10 B t 2 T=15

Algorithm B (Example) C D E F A B LS:2 LS:11 LS:2 LS:4 LS:13 LS:9 AB AF BF BD AD LS:1 LS:4 LS:11 LS:6 LS:8 ABD Min support count: 3 LS:6

Algorithm B – Candidate Generation • When generating a candidate sensor-set C of size i from two size i-1 sensor-sets LA and LB (subsets of C), we also construct the interval list of C by intersecting the interval lists of LA and LB. • Joining the two interval lists (of length m and n) is a key step in our algorithm • Use simple linear scan requires O(m+n) time • There are i different size i-1 subset of C which two to pick?

Algorithm B – Candidate Generation • Method 1: • Choose two lists with fewest no of intervals • =>Store no of intervals for each sensor-set • Method 2: • Choose two lists with smallest count (TLS) • Intuitively shorter lifespan implies fewer intervals • Easier to implement • Have the lifespan when checking if the sensor-set is frequent

Experiments • Data generation • Stimulate data generated by a set of n binary sensors • Make use of a standard market basket data • With n sensors, each of which can be either on or off =>2n possible combination of sensor states • Assign a probability to each of the combinations

Experiments – Data Gen • How to assign the probabilities? • Let N be the no of occurrences of the transaction in the market basket that contains exactly only the sensors that are ON • E.g. Consider R={A,B,C,D,E,F} • Suppose we want to assign prob to the sensor state AC (only A and C are ON) • N is no of transactions that contain exactly only A and C • Assign prob = N/|D|, where |D| is the size of the market basket dataset • Note: Need sufficiently large market basket data • transactions that occur very infrequently will not be given ZERO probability

Experiments – Data Gen • Generating sensor set data • Choose the initial state (at t=0s) • Randomly • According to the probabilities assigned • Pick the combination with highest probability assigned => first sensor set states

Experiment – Data Gen • What is the next set of sensor-set states? • For simplicity, in our model, only one sensor can be updated at a time • For any two adjacent updates, the sensor-set states at the two time instants are differed by only one sensor => change only one sensor state => n possible combinations by toggling each of the n sensor states • We normalize the probabilities of the n combinations by their sum • Pick the next set of sensor-set states according to the normalized probabilities • Inter-arrival time of updates: exponential distribution

Experiments • Market Basket Dataset • 8,000,000 transactions • 100 items • number of maximal potentially large itemsets = 2000 • average transaction length: 10 • average length of maximal large itemsets: 4 • length of the maximal large itemsets: 11 • minimum support: 0.05% • length of the maximal large itemsets: ? • Algorithms: • Apriori: cached mode • IL-apriori: • (a) random-join (IL-apriori) • (b) join-by-smallest lifespan (IL-apriori-S) • (c) join-by-fewest-no-of-intervals (IL-apriori-C)

Experiments - Results • Performance of algorithms (larger support): • All IL-apriori algorithms outperform cache apriori

Experiments - Results • Performance (lower support): • More candidates => IL-apriori: Expensive to join interval lists

Experiments - Results • More long frequent sensor-sets • Apriori has to match the candidates by search through the DB • IL-apriori-C and IL-apriori-S reduce a lot of time in joining the lists

Experiments - Results • Amounts of memory usage - peak memory usage • Cache apriori - store the whole database • IL-apriori – store a lot of interval lists when no of candidates is growing large

Experiments – Results Experiments - Results (min_sup = 0.02%) • Apriori is faster in the first 3 passes • Running time for IL-apriori drops sharply after • Apriori has to scan over the whole database • IL-apriori (C/S) needs to join relatively short interval-lists in later passes

Experiments - Results (min_sup = 0.02%) • Memory requirement for IL-apriori is a lot higher when there are more frequent sensor-set interval lists to join

Experiments - Results (min_sup = 0.05%) • Runtime for all algorithms increases linearly with total number of transactions

Experiments - Results (min_sup = 0.05%) • Memory required by all algorithms increases as no of transactions increases. • Rate of increase in IL-apriori is faster

Conclusions • Interval-list method to mine sensor data is described • Two interval list joining strategies are quite effective in reducing running time • Memory requirement is quite high • Future Work • Other methods for joining interval-lists • Trade-off between time and space • Extending to the streaming case • Consider approaches other than Lossy Counting Algorithms (Manku, and R. Motwani, VLDB’02)

Q&A

Mining of Frequent Patterns from Sensor Data