Interestingness

Interestingness

Interestingness Measures - Lift • Measure of dependent/correlated events: lift • Lift(B, C) = c(B->C)/s(C) = s(B u C)/(s(B) x s(C)) • Lift(B, C) may tell how B and C are correlated • Lift(B, C) = 1 => B and C are independent • > 1: positively correlated • < 1: negatively correlated • Lift is more telling than support (s) & confidence (c)

Lift Example

Lift Solution • Lift(B, C) = (400/1000)/((600/1000)*(750/1000)) = 0.89 • Lift(B, ^C) = (200/1000)/((600/1000)*(250/1000)) = 1.33 • Thus B & C are negatively correlated since Lift(B,C) < 1 • B and ^C are positively correlated since Lift(B, ^C) > 1

Lift Calculations • s(B u C) =400/1000 = 2/5 = .4 • s(B) = 600/1000 = 3/5 = .6 • s(C) = 750/1000 = ¾ = .75 • Lift(B, C) = .4/(.6*.75) = .4/.45 = .89 • s(B u ^C) = .2 • s(B) = .6 • s(^C) = .25 • Lift(B, ^C) = .2/.15 = 1.33 • Lift(^B, C) = ? • Lift(^B, ^C) = ?

Interestingness Measures - c2 • Another measure to test correlated events: c2 • c2 = Σ (Observed – Expected)2 / Expected • General rules • c2 = 0 => independent • c2 > 0 => correlated, either positively or negatively, so it needs additional tests. • C2 also tells better than support (s) and confidence (c)

c2 Example

c2 Solution • Now c2 = (400-450) 2/450 + (350-300) 2/300 + (200-150) 2/ 150 + (50-100) 2/100 = 55.55 • c2 Shows B & C are correlated because the answer > 0 • As expected value is 450 but 400 is observed we can say that B & C are negatively correlated.

Are Lift and c2 Always Good? • Null transactions -> transactions that contain neither B nor C • Let’s examine another dataset D • BC (100) is much rarer than B^C(1000) and ^BC (1000), but there are many ^B^C (100000) • So unlikely that B&C will happen together! • But, Lift(B,C) = 8.44 >> 1 (strong positive correlation) • c2 i= 670 : Observed (BC) >> expected value (11.85) • Too many null transactions may “spoil the soup”!

c2 & Lift With Null Example

Other Interestingness Algorithms • Null invariance – value does not change with the number of null transactions. • Interestingness null invariance measures: • AllConf(A,B) • Jaccard(A,B) • Cosine(A,B) • Kulczynski(A,B) • MaxConf(A,B) • Not all null-invariant measures are created equal

Imbalance Ratio with Kulczynski • Imbalance Ratio: measure the imbalance of two itemsets A&B in rule implications

Kulczynski • (P(B/C) + P(C/B))/2 < epsilon • where epsilon is 0.01 • Where A = milk, b = coffee • 1 billion transaction = 1,000,000,000 • A = 1 million time = 1,000,000 • B = 10 thousand times = 10000 • A + B = one hundred = 100 • S(A) = 10^6 / 10^9 = 10^-3 = 1/1000 • S(B) = 10^4/ 10^9 = 10^-5 = 1/100000 • S(A u B) = 10^2 / 10^9 = 10^-7 = 1/10000000 • S(A) * S(B) = 10^-3*10^-5 = 10^-8

Kulczynski • P(B|A) = P(AUB) / P(A) = 10^2/10^6 = 10^-4 • P(A|B) = P(AUB) / P(B) = 10^2/ 10^4 = 10^-2 • (P(B|A) + P(A|B))/2 = (10^-4 + 10^-2)/2 = 0.0050 < 0.01 • Therefore this is a negative pattern

Interestingness

Interestingness

Presentation Transcript

Understanding and Predicting Interestingness of Videos

Change Analysis in Spatial Datasets by Interestingness Comparison

Perspectives on the Interestingness of Data Patterns