1 / 28

CPS 196.03: Information Management and Mining Constraint-based Mining, First programming project

CPS 196.03: Information Management and Mining Constraint-based Mining, First programming project. Where we are headed. First programming project: On constraint-based association rule mining Month-long, demo and report, 15% of grade Due: March 2 Data warehousing

zlhna
Télécharger la présentation

CPS 196.03: Information Management and Mining Constraint-based Mining, First programming project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPS 196.03: Information Management and Mining Constraint-based Mining, First programming project

  2. Where we are headed • First programming project: • On constraint-based association rule mining • Month-long, demo and report, 15% of grade • Due: March 2 • Data warehousing • Multi-billion dollar industry, fast growing • Web data management and mining

  3. Constraint-based (Query-Directed) Mining • Let us start with an example • Sales(customer_id, item_id, date) • Lives_in(customer_id, city, state) • Items(item_id, group, price)

  4. Constraint-based (Query-Directed) Mining • Finding all the patterns in a database autonomously? — unrealistic! • The patterns could be too many but not focused! • Data mining should be an interactive process • User directs what to be mined using a data mining query language (or a graphical user interface) • Constraint-based mining • User flexibility: provides constraints on what to be mined • System optimization: explores such constraints for efficient mining—constraint-based mining

  5. Constraints in Data Mining • Knowledge type constraint: • classification, association, etc. • Data constraint • find product pairs sold to Chicago customers in 2004 • Dimension/level constraint • in relevance to region, price, brand, customer category • Rule (or pattern) constraint • small sales (price < $10) trigger big sales (sum > $200) • Interestingness constraint • strong rules: min_support  3%, min_confidence  60%

  6. Constrained Mining vs. Constraint-Based Search • Constrained mining vs. constraint-based search/reasoning • Both are aimed at reducing search space • Finding all patterns satisfying constraints vs. finding some (or one) answer in constraint-based search in AI or optimization • Constrained mining vs. query processing in DBMS • Constrained pattern mining shares a similar philosophy as pushing selections deeply in query processing

  7. Anti-Monotonicity in Constraint Pushing TDB (min_sup=2) • Anti-monotonicity • When an itemset S violates the constraint, so does any of its superset • sum(S.Price)  v is anti-monotone • sum(S.Price)  v is not anti-monotone • Example. C: range(S.profit)  15 is anti-monotone • Itemset ab violates C • So does every superset of ab

  8. Monotonicity for Constraint Pushing TDB (min_sup=2) • Monotonicity • When an itemset S satisfies the constraint, so does any of its superset • sum(S.Price)  v is monotone • min(S.Price)  v is monotone • Example. C: range(S.profit)  15 • Itemset ab satisfies C • So does every superset of ab

  9. Succinctness • Succinctness: • Given A1, the set of items satisfying a succinctness constraint C, then any set S satisfying C is based on A1 , i.e., S contains a subset belonging to A1 • Idea: Without looking at the transaction database, whether an itemset S satisfies constraint C can be determined based on the selection of items • min(S.Price) v is succinct • sum(S.Price)  v is not succinct • Optimization: If C is succinct, C is pre-counting pushable

  10. The Apriori Algorithm — Example Database D L1 C1 Scan D C2 C2 L2 Scan D L3 C3 Scan D

  11. Naïve Algorithm: Apriori + Constraint Database D L1 C1 Scan D C2 C2 L2 Scan D L3 C3 Constraint: Sum{S.price} < 5 Scan D

  12. The Constrained Apriori Algorithm: Push an Anti-monotone Constraint Deep Database D L1 C1 Scan D C2 C2 L2 Scan D L3 C3 Constraint: Sum{S.price} < 5 Scan D

  13. The Constrained Apriori Algorithm: Push a Succinct Constraint Deep Database D L1 C1 Scan D C2 C2 L2 Scan D not immediately to be used L3 C3 Constraint: min{S.price } <= 1 Scan D

  14. Converting “Tough” Constraints TDB (min_sup=2) • Convert tough constraints into anti-monotone or monotone by properly ordering items • Examine C: avg(S.profit)  25 • Order items in value-descending order • <a, f, g, d, b, h, c, e> • If an itemset afb violates C • So does afbh, afb* • It becomes anti-monotone!

  15. Strongly Convertible Constraints • avg(X)  25 is convertible anti-monotone w.r.t. item value descending order R: <a, f, g,d, b, h, c, e> • If an itemset af violates a constraint C, so does every itemset with af as prefix, such as afd • avg(X)  25 is convertible monotone w.r.t. item value ascending order R-1: <e, c, h, b, d, g, f, a> • If an itemset d satisfies a constraint C, so does itemsets df and dfa, which having d as a prefix • Thus, avg(X)  25 is strongly convertible

  16. Can Apriori Handle Convertible Constraints? • A convertible, not monotone nor anti-monotone nor succinct constraint cannot be pushed deep into the an Apriori mining algorithm • Within the level wise framework, no direct pruning based on the constraint can be made • Itemset df violates constraint C: avg(X)>=25 • Since adf satisfies C, Apriori needs df to assemble adf, df cannot be pruned • But it can be pushed into frequent-pattern growth framework!

  17. Mining With Convertible Constraints • C: avg(X) >= 25, min_sup=2 • List items in every transaction in value descending order R: <a, f, g, d, b, h, c, e> • C is convertible anti-monotone w.r.t. R • Scan TDB once • remove infrequent items • Item h is dropped • Itemsets a and f are good, … • Projection-based mining • Imposing an appropriate order on item projection • Many tough constraints can be converted into (anti)-monotone TDB (min_sup=2)

  18. Recall • Traversal of Itemset Lattice

  19. Handling Multiple Constraints • Different constraints may require different or even conflicting item-ordering • If there exists an order R s.t. both C1 and C2 are convertible w.r.t. R, then thereis no conflict between the two convertible constraints

  20. What Constraints Are Convertible?

  21. Constraint-Based Mining—A General Picture

  22. Monotone Antimonotone Strongly convertible Succinct Convertible anti-monotone Convertible monotone Inconvertible A Classification of Constraints

  23. Visualization of Association Rules: Plane Graph

  24. Visualization of Association Rules: Rule Graph

  25. Visualization of Association Rules (SGI/MineSet 3.0)

  26. First Programming Project • Individual project, 15 Points in final grade • Sales(customer_id, item_id, item_group, item_price, purchase_date) • Will be provided as a file during demo and for generating performance numbers for project report • Task 1: 5 Points • Interface to enter MIN_SUPPORT (% of customers) • Find frequent itemsets using Apriori (set of item_id’s) • Task 2: 5 Points (Section 5.5 in the textbook) • Interface to enter two constraint types (e.g., SUM(item_price) op const) • Use the constraints in Apriori as effectively as possible, study and demonstrate performance improvement • Task 3: 5 Points • Extension of your choice. Examples include (i) association rules, (ii) complex constraints, (iii) sequential patterns, (iv) variants of apriori, (v) FP-growth

  27. First Programming Project: Milestones • Feb 3: Project announced • Feb 17: Mid-project report due • Describe progress and planned extensions • Describe detailed algorithms for all three tasks • Feb 17: Sample data file will be provided for generating performance results for project report • March 2: Submit code, README file to run code, code documentation, and final project report • March 2-4: Project demos (random assignment) • March 6: Spring break. Second project announced

  28. Finalized Grading Criteria for Class • Homeworks: 15 points • Programming projects: 40 points • Midterm: 20 points • Note: Midterm is on Feb 19 (Thu) in class • Final: 25 Points

More Related