Télécharger la présentation
## Mining Favorable Facets

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Mining Favorable Facets**Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University of Hong Kong) Ke Wang (Simon Fraser University) Prepared by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong**Outline**• Introduction • Skyline • Algorithm • Empirical Study • Conclusion**Package a “dominates” package b**1. Introduction Thus, we do not need to consider package b. Suppose we want to look for a vacation package 3 packages We want to have a cheaper package. We want to have a higher hotel-class. Suppose we compare package a and package b • We know that package a is “better” • than package b • because • Price of package a is smaller • Hotel-class of package a is higher**Package a “dominates” package b**1. Introduction Thus, we do not need to consider package b. Suppose we want to look for a vacation package 3 packages We want to have a cheaper package. We want to have a higher hotel-class. Suppose we compare package a and package b • We know that package a is “better” • than package b • because • Price of package a is smaller • Hotel-class of package a is higher**Package a “dominates” package b**1. Introduction Thus, package a and package c are all of the “best” possible choices. We call that package a and package c are skyline points. Points are not dominated by any other points Package a is NOT dominated by any other packages. Suppose we want to look for a vacation package Package c is NOT dominated by any other packages. 3 packages We want to have a cheaper package. We want to have a higher hotel-class. Suppose we compare package a and package c • We know that • Package a has a cheaper price • Package c has a higher hotel-class • We cannot determine • whether package a is better than package c(i.e., package a dominates package c) • whether package c is better than package a(i.e., package c dominates package a)**Suppose a customer have the**following preferences. H < T < M Suppose another customer have the following preferences. H < M < T 1. Introduction The skyline points are packages a and c. The skyline points are packages a, c and e. In other words, different preferences give different skyline points. Suppose we want to look for a vacation package 6 packages We want to have a cheaper package. We want to have a higher hotel-class. How about this one? Different customers may have different preferences on Hotel-group.**Suppose a customer have the**following preferences. H < T < M Suppose another customer have the following preferences. H < M < T 1. Introduction The skyline points are packages a and c. The skyline points are packages a, c and e. In other words, different preferences give different skyline points. Suppose we want to look for a vacation package 6 packages Suppose hotel-group Mozilla wants to promote its own packages (e.g., package f) to potential customers.**1. Introduction**What preferences make package f a skyline point? Alice T < M {a, c} Preferences: Bob No special preference {a, c, e, f} No special preference {a, c, e} Chris H < M M < T … {a, c, e} David H < M < T Emily H < T < M {a, c} {a, c, e, f} Fred M < T Suppose hotel-group Mozilla wants to promote its own packages (e.g., package f) to potential customers. Bob and Fred are the potential customers.**1. Introduction**What preferences make package e a skyline point? Alice T < M {a, c} Preferences: Bob No special preference {a, c, e, f} No special preference H < M {a, c, e} Chris H < M H < M < T {a, c, e} David H < M < T M < T Emily H < T < M {a, c} … {a, c, e, f} Fred M < T Suppose hotel-group Mozilla wants to promote its own packages (e.g., package e) to potential customers. Bob, Chris, David and Fred are the potential customers. Problem: Given a package, we want to find what preferences or conditions that this package is a skyline point? Favorable facets**1. Introduction**Problem: Given a package, we want to find what preferences or conditions that this package is a skyline point? Favorable facets**1. Introduction**Problem: Given a package, we want to find what preferences or conditions that this package is a skyline point? Favorable facets**{T < M}**{T < H} {H < M} {H < T} {M < T} {M < H} {T < M, H < M} {T < M, T < M} {H < T, H < M} {T < H, M < H} … SKY={a,c} SKY={a,c} SKY={a,c,e} SKY={a,c,e,f} {T < M, T < M, H < M} {T < M, T < M, M < H} SKY={a,c} SKY={a,c} T Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? 1. Introduction We can solve the problem by a naive method: Lattice Search {} SKY={a, c, e, f} SKY={a,c} SKY={a,c,e} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} SKY={}**,**{M < H} {T < M} {T < H} {H < M} {H < T} {M < T} {M < H} {T < M, H < M} {T < M, T < M} {H < T, H < M} {T < H, M < H} … SKY={a,c} SKY={a,c} SKY={a,c,e} SKY={a,c,e,f} {T < M, T < M, H < M} {T < M, T < M, M < H} SKY={a,c} SKY={a,c} T Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? 1. Introduction We can solve the problem by a naive method: Lattice Search Consider package f Preferences: {} , {T < H} , {M < T} , {H < T} {} , {T < H, M < H} SKY={a, c, e, f} SKY={a,c} SKY={a,c,e} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} This approach has two disadvantages. 1. Computation is costly. We need to compute all skyline points for each possible preference 2. It is difficult to interpret the results. There are many preferences which qualify package f as a skyline point SKY={}**{T < M}**{T < H} {H < M} {H < T} {M < T} {M < H} {T < M, H < M} {T < M, T < M} {H < T, H < M} {T < H, M < H} … SKY={a,c} SKY={a,c} SKY={a,c,e} SKY={a,c,e,f} {T < M, T < M, H < M} {T < M, T < M, M < H} SKY={a,c} SKY={a,c} T Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? 1. Introduction We can solve the problem by a naive method: Lattice Search Consider package f We find that whenever the preference contains “T < M” or “H < M”, package f is not a skyline point. {} border for f SKY={a, c, e, f} SKY={a,c} SKY={a,c,e} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} Skyline point Not skyline point SKY={}**{T < M}**{T < H} {H < M} {H < T} {M < T} {M < H} {T < M, H < M} {T < M, T < M} {H < T, H < M} {T < H, M < H} … SKY={a,c} SKY={a,c} SKY={a,c,e} SKY={a,c,e,f} {T < M, T < M, H < M} {T < M, T < M, M < H} SKY={a,c} SKY={a,c} T Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point? Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? 1. Introduction We can solve the problem by a naive method: Lattice Search Consider package f We find that whenever the preference contains “T < M” or “H < M”, package f is not a skyline point. {} border for f SKY={a, c, e, f} SKY={a,c} SKY={a,c,e} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} We can say that “T < M” or “H < M” is a minimal disqualifying condition (MDC). Skyline point Not skyline point SKY={}**Problem: Given a package, we want to**find what minimal conditions that this package is NOT a skyline point? 3. Algorithm • How to find MDCs of a point?**3. Algorithm**Point q is said to quasi-dominate point p if all attributes of point q are NOT worse than those of point p. e.g. Package a quasi-dominates package f because 1. Package a has a lower (or better) price than package f 2. Package a has a higher (or better) hotel-class than package f If package a quasi-dominates package f, we define Raf as follows. {T < M}**Problem: Given a package, we want to**find what minimal conditions that this package is NOT a skyline point? 3. Algorithm • Two Algorithms • MDC-O: Computing MDC On-the-fly • Does not store MDCs of points • Compute MDC of a given points on-the-fly • MDC-M: A Materialization Method • Store MDCs of all points • Indexing Method for Speed-up • R*-tree**Problem: Given a package, we want to**find what minimal conditions that this package is NOT a skyline point? 3.1 MDC-O: Computing MDC On-the-fly • On-the-fly Algorithm • Given • data point p • Variable • MDC(p): minimal disqualifying condition • Algorithm • MDC(p) • For each data point q which quasi-dominates p • if MDC(p) does not contain Rqp • insert Rqp to MDC(p) • Return MDC(p)**Problem: Given a package, we want to**find what minimal conditions that this package is NOT a skyline point? 3.2 MDC-M: A Materialization Method • Materialization Algorithm • Variable • MDC(p): minimal disqualifying condition • Algorithm MDC(p) • For each data point p • For each data point q which quasi-dominates p • if MDC(p) does not contain Rqpthen insert Rqp to MDC(p) • Store MDC(p) • Query Algorithm • Given • A data point p • Algorithm • Return MDC(p)**4. Empirical Study**• Datasets • Synthetic Dataset • Real Dataset (from UCI) • Nursery Dataset • Automobile Dataset • Default Values (Synthetic) • No. of tuples = 500K • No. of numeric dimensions = 3 • No. of categorical dimensions = 1 • No. of values in a nominal dimension = 20**4. Empirical Study**Without indexing: MDC-O: Slowest Search Time MDC-M: Faster Search Time Storage of MDC: 8MB With indexing: MDC-O and MDC-M: Fast Search Time**4. Empirical Study**A salesperson should NOT promote this car to the customer who prefers Toyota to Honda. • Automobile • Three car models A salesperson should promote this car to the customer who prefers Mitsubishi to others. A salesperson should promote this car to ANY customers.**5. Conclusion**• Skyline • Favorable Facets • Minimal Disqualifying Condition • Algorithm • On-the-fly • Materialization • Empirical Study**Q&A**• Poster Board • Title: Mining Favorable Facets • Date: Monday, 13th August • Place: Poster board carrying number 31**a better value**All points (e.g., point q) in this region quasi-dominate point p 3.3 Speedup p q 0 a better value • Build an R*-tree based on the totally-ordered attributes • For each point p, • MDC(p) • Perform a range search • from 0 to the value of dimension D of p for each dimension D • For each point q found in the range search • insert Rqp into MDC(p)