1 / 20

2000 년 6 월 23 일 DE Lab. 윤지영

Mining Frequent Patterns without Candidate Generation Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'00), Dallas, TX, May 2000. Jiawei Han, Jian Pei, and Yiwen Yin. 2000 년 6 월 23 일 DE Lab. 윤지영. 1. Introduction. The core of the Apriori algorithm :

aran
Télécharger la présentation

2000 년 6 월 23 일 DE Lab. 윤지영

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Frequent Patterns without Candidate Generation Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'00), Dallas, TX, May 2000.Jiawei Han, Jian Pei, and Yiwen Yin 2000년 6월 23일 DE Lab. 윤지영

  2. 1. Introduction • The core of the Apriori algorithm : If any length k pattern is not frequent in the database, its length(K+1) super-pattern can never be freguent -> Use frequent (k – 1)-itemsets to generate candidate frequent k-itemsets • Huge candidate sets: - 104 frequent 1-itemset will generate 107 candidate 2-itemsets - To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100}, one needs to generate 2100  1030 candidates. • Multiple scans of database: - Needs (n +1 ) scans, n is the length of the longest pattern

  3. Our Approach : Mining Frequent PatternsWithout Candidate Generation • Construct frequent pattern tree(FP-tree)- extended prefix-tree structure • Develop FP-tree-based pattern fragment growth mining method- start from a frequent length-1 pattern(as an initial suffix pattern)- constructs its conditional FP-tree- the pattern growth is achieved via concatenation of the suffix pattern from a conditional FP-tree • Using Partitioning-based, divide-and-conquer method

  4. {} Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1 2. Frequent Pattern Tree(Design and Construction) • <Example 1> Table 1 (ξ = 3) , Fig 1 TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p}{f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500{a, f, c, e, l, p, m, n}{f, c, a, m, p} • < Algorithm 1 > • Scan DB once, find frequent 1-itemset (single item pattern) and Sort frequent items in support descending order • Scan DB again, Construnt FP-tree

  5. Definition of FP-tree • Consist of - One root(null)- the children node(a set of item prefix subtrees)- a frequent-item header table • Each node of children consists of three field : item-name, count, node-link • Each entry in the frequent-item header table consist of two field : item-name, head of node link, which points to the first node

  6. 2.2 Completeness and Compactness of FP-tree • < Lemma 2.1 > (Completeness) • never breaks a long pattern of any transaction • preserves complete information for frequent pattern mining • < Lemma 2.2> (Compactness) • reduce irrelevant information—infrequent items are gone • frequency descending ordering: more frequent items are more likely to be shared • never be larger than the original database (if not count node-links and counts)

  7. 3. Mining Frequent Patterns using FP-tree • General idea (divide-and-conquer) • Recursively grow frequent pattern path using the FP-tree • Method • For each item, construct its conditional pattern-base, and then its conditional FP-tree • Repeat the process on each newly created conditional FP-tree • Until the resulting FP-tree is empty, or it containsonly one path(single path will generate all the combinations of its sub-paths, each of which is a frequent pattern)

  8. conditional pattern base & conditional FP-tree • Construct conditional pattern base for each node in the FP-treeex) For node m, m’s conditional pattern base=> {(f:4,c:3,a:3,m:2), (f:4,c:3,a:3,b:1,m:1)} • Construct conditional FP-tree from each conditional pattern-baseex) m’s conditional FP-tree => {(f:3,c:3,a:3)} |m

  9. Property <Property 3.1> (Node-link property) For any frequent item ai,all the possible frequent patterns that contain ai can be obtained by following ai's node-links, starting from ai's head in the FP-tree header <Property 3.2> (Prefix-path property) To calculate the frequent patterns for a node ai in a path P, only the prefix sub-path of ai in P need to be accumulated, and its frequency count should carry the same count as node ai.

  10. {} Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1 Step 1: From FP-tree to Conditional Pattern Base • Starting at the frequent header table in the FP-tree • Traverse the FP-tree by following the link of each frequent item • Accumulate all of transformed prefix paths of that item to form a conditional pattern base Conditional pattern bases item cond. pattern base c f:3 a fc:3 b fca:1, f:1, c:1 m fca:2, fcab:1 p fcam:2, cb:1

  11. {} f:3 c:3 a:3 m-conditional FP-tree Step 2: Construct Conditional FP-tree • For each pattern-base • Accumulate the count for each item in the base • Construct the FP-tree for the frequent items of the pattern base • conditional pattern base of “m”: • fca:2, fcab:1 {} Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 f:4 c:1 All frequent patterns concerning m m, fm, cm, am, fcm, fam, cam, fcam c:3 b:1 b:1   a:3 p:1 m:2 b:1 p:2 m:1

  12. Table 2

  13. Lemma 3.1 <Lemma 3.1> (Fragment growth) Let  be a frequent itemset in DB, B be 's conditional pattern base, and  be an itemset in B. Then the support of    in DB is equivalent to the support of  in B < Corollary 3.1 > (Pattern growth) Let  be a frequent itemset in DB, B be 's conditional pattern base, and  be an itemset in B. Then    is a frequent itemset in DB if  is frequent in B.

  14. {} f:3 c:3 am-conditional FP-tree {} f:3 c:3 a:3 m-conditional FP-tree Step 3: recursively mine the conditional FP-tree Cond. pattern base of “am”: (fc:3) Header Table Item head of node-links f c a {} Cond. pattern base of “cm”: (f:3) f:3 cm-conditional FP-tree {} Cond. pattern base of “cam”: (f:3) f:3 cam-conditional FP-tree

  15. Lemma 3.2 <Lemma 3.2> (Single FP-tree path pattern generation) Suppose an FP-tree T has a single path P. The complete set of the frequent patterns of T can be generated by the enumeration of all the combinations of the subpaths of P with the support being the minimum support of the items contained in the subpath.

  16. Step 4: Single FP-tree Path Generation • Suppose an FP-tree T has a single path P • The complete set of frequent pattern of T can be generated by enumeration of all the combinations of the sub-paths of P {} All frequent patterns concerning m m, fm, cm, am, fcm, fam, cam, fcam f:3  c:3 a:3 m-conditional FP-tree

  17. Algorithm 2 (FP-growth) Procedure FP-growth(Tree,  ) { (1) If Tree contains a single path P (2) Then for each combination (denoted as ) of the nodes in the path p do (3) Generate pattern    with support = minimum support of nodes in ; (4) Else for each ai in the header of Tree do { (5) Generate pattern =ai   with Support = ai.support; (6) Construct ’s conditional pattern base and then ’s conditional FP-tree tree ; (7) If tree  ≠ 0 (8) Then call FP-growth(tree , ) } }

  18. FP-growth vs. Apriori: Scalability With the Support Threshold Data set (D1) T25I20D10K / Data set (D2) T25I20D100K

  19. FP-growth vs. Apriori: Scalability With Number of Transactions Data set (D2) T25I20D100K (1.5%)

  20. Run time of FP-growth per itemset vs. support threshold Data set (D1) T25I20D10K / Data set (D2) T25I20D100K

More Related