1 / 46

Mining Frequent Patterns

Mining Frequent Patterns. Jiawei Han, Jian Pei and Yiwen Yin School of Computer Science Simon Fraser University. Without Candidate Generation. Afsoon Yousefi. CS:332, March 24 th , 2014 Inspired by Song Wang slides. Outline. Problem of mining frequent Pattern Review of Apriori

kevyn
Télécharger la présentation

Mining Frequent Patterns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Frequent Patterns Jiawei Han, Jian Pei and Yiwen Yin School of Computer Science Simon Fraser University Without Candidate Generation AfsoonYousefi CS:332, March 24th, 2014 Inspired by Song Wang slides

  2. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  3. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  4. Problem of mining frequent Pattern • Frequent pattern mining plays an essential role in mining associations. • Most of the previous studies, adopt an Apriori-like approach. • Achieves good performance but suffers from:

  5. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  6. Review of Apriori • Knowing the minimum support threshold • Use frequent (k-1)-itemsets • generate candidates of frequent k-itemsets • Scan database and count each pattern in • Get frequent k-itemsets

  7. Review of Apriori • Bottleneck of the Apriori-like method is at the • Candidate set generation • Test • How to avoid generating a huge set of candidates? • A novel compact data structure, called FP-tree • FP-tree based pattern fragment growth mining method • Employing a divide-and-conquer search method for frequent itemsets combinations

  8. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  9. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  10. Frequent Pattern Tree: An Example • Minimum support threshold • One scan of DB to identify the set of frequent items • Items are ordered in frequency descending order • For convenience, the frequent itemsets of each transaction is listed in this ordering

  11. Frequent Pattern Tree: An Example • One scan of DB to identify the set of frequent items • Store the set of frequent items of each transaction in a tree • Create a “null” root • Scan the DB for second time • Add the paths which are the ordered frequent items • Share the path until a different item comes up • Branch and create a sub-path root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1

  12. Frequent Pattern Tree: An Example • One scan of DB to identify the set of frequent items • Store the set of frequent items of each transaction in a tree • To facilitate tree traversal, build item header table • Nodes with the same item-name are linked root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1

  13. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  14. Frequent Pattern Tree: Design and Construction • One root • Aset of item prefix subtrees as the children of the root • Afrequent-item header table • The tree consist of • Each node in the tree has three fields • Each entry in the frequent-item header table consist of • Item-name • Count • Node-link • Item-name • Head of node-link

  15. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  16. Frequent Pattern Tree: Properties • Constructing FP-tree • Needs exactly two scans of DB • First to collect the set of frequent items • Second to construct the FP-tree • The cost of inserting transaction is • is the number of frequent items in • Completeness • the FP-tree contains all the information related to mining frequent patterns • given the minimum support threshold • Compactness • The size of the tree is bounded by the occurrences of frequent items • The height of the tree is bounded by the maximum number of items in a transaction

  17. Frequent Pattern Tree: Properties • The frequent itemsets of transactions have descending order • An example for unordered itemsets root root m:2 p:3 f:4 c:1 c:1 b:1 m:2 b:1 c:3 b:1 b:1 b:1 a:2 a:2 c:1 a:3 p:1 p:1 c:1 c:2 m:2 b:1 f:2 f:2 p:2 m:1

  18. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  19. Mining Frequent Patterns Using FP-tree • Examine the mining process by starting from the bottom of the header table • Collect all the patterns that node participates • Starting from ’s head in the header table and following ’s node-links

  20. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  21. Mining Frequent Patterns Using FP-tree: An Example • Node p (p:3) • FP-tree paths <f:4 , c:3 , a:3 , m:2 , p:2> , <c:1 , b:1 , p:1> • Conditional pattern base {(f:2 , c:2 , a:2 , m:2), (c:1 , b:1)} • Construction of a FP-tree on these • just keep the frequent items root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1

  22. Mining Frequent Patterns Using FP-tree: An Example • Node m (m:3) • FP-tree paths <f:4 , c:3 , a:3 , m:2 > , < f:4 , c:3 , a:3 , b:1 , m:1 > • Conditional pattern base {(f:2 , c:2 , a:2 ), (f:1 , c:1 , a:1 , b:1)} • Construction of a FP-tree on these • just keep the frequent items • create the tree root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1

  23. Mining Frequent Patterns Using FP-tree: An Example • Node b (b:3) • FP-tree paths <f:4 , c:3 , a:3 , b:1 > , < f:4 , b:1 > , < c:1 , b:1 > • Conditional pattern base {(f:1 , c:1 , a:1 ), (f:1), (c:1)} • Construction of a FP-tree on these • just keep the frequent items • create the tree root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1

  24. Mining Frequent Patterns Using FP-tree: An Example • Node a (a:3) • FP-tree paths <f:4 , c:3 , a:3 > • Conditional pattern base {(f:3 , c:3)} • Construction of a FP-tree on these • just keep the frequent items • create the tree root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1

  25. Mining Frequent Patterns Using FP-tree: An Example • Node c (c:4) • FP-tree paths <f:4 , c:3> , <c:1> • Conditional pattern base {(f:3)} • Construction of a FP-tree on these • just keep the frequent items • create the tree root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1

  26. Mining Frequent Patterns Using FP-tree: An Example • Node f (f:4) • FP-tree paths <f:4 > • Conditional pattern base {()} • Construction of a FP-tree on these • just keep the frequent items • create the tree root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1

  27. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  28. Mining Frequent Patterns Using FP-tree: Design and construction

  29. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  30. Mining Frequent Patterns Using FP-tree : Properties • To calculate the frequent patterns containing in path • Only consider prefix sub-path of node in • The frequency count of every node in tat sub-path is the same as node • Suppose FP-tree has a single path • The complete set of the frequent patterns of FP-tree can be generated by • Enumeration of all the combinations of the sub-paths of • The support of each is equal to the minimum support of the items contained in that sub-path

  31. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  32. Algorithm Efficiency Properties • FP-tree is usually much smaller than the size of DB. • FP-trees constructed in the FP-growth are never bigger than the sub-paths • Mining operations consist of • mainly prefix count adjustment • Counting • Pattern fragment concatenation This is much less costly than • Generating a very large number of candidate patterns • Test each of them

  33. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  34. Performance Study • Comparison of FP-growth with Apriori • Performed on a • 450MHz Pentium PC • 128MB main memory • Microsoft Windows/NT • Written in Microsoft/Visual C++6.0 • Run Time was considered time interval between input and output • Two datasets

  35. Performance Study

  36. Performance Study

  37. Performance Study

  38. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  39. Future Works • Construction of FP-trees for projected Databases • Database is large • FP-tree can not be constructed in the main memory • Partition database into a set of projected databases • Construct an FP-tree • Mine it in each projected databases

  40. Future Works • Construction of a disk-resident FP-tree • Use B+-tree structure to index FP-tree • Split the tree based on the common prefix paths • Materialization of an FP-tree • Constructing FP-tree needs two scan of the database • Materialize an FP-tree for frequent pattern mining • How to select a good minimum support threshold • Use a low ?

  41. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  42. Conclution • Constructs a highly compact FP-tree • Usually substantially smaller than the original database • Applies a pattern growth method • Avoids costly candidate generation and tests • Applies a partitioning-based divide and conquer method • Dramatically reduces the size of the subsequent conditional FP-trees • Mines both short and long patterns efficiently in large databases

  43. Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions

  44. Selected questions • One root • Aset of item prefix subtrees as the children of the root • Afrequent-item header table • What are the components of a FP-tree? • How To calculate the frequent patterns containing in path • Compare efficiency of mining operation in FP-growth with Apriori • Only consider prefix sub-path of node in • The frequency count of every node in tat sub-path is the same as node • Find all the combinations • Mining operations consist of • mainly prefix count adjustment • Counting • Pattern fragment concatenation • This is much less costly than • Generating a very large number of candidate patterns • Test each of them

  45. Mining Frequent Patterns Jiawei Han, Jian Pei and Yiwen Yin School of Computer Science Simon Fraser University Without Candidate Generation AfsoonYousefi CS:332, March 24th, 2014 Inspired by Song Wang slides

More Related