1 / 73

Huffman Codes and Asssociation Rules (II)

Lecture 15. Huffman Codes and Asssociation Rules (II). Prof. Sin-Min Lee Department of Computer Science. Huffman Code Example. Given: A B C D E 3 1 2 4 6 By using an increasing algorithm (changing from smallest to largest), it changes to: B C A D E 1 2 3 4 6.

toril
Télécharger la présentation

Huffman Codes and Asssociation Rules (II)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 15 Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science

  2. Huffman Code Example • Given: A B C D E 3 1 2 4 6 By using an increasing algorithm (changing from smallest to largest), it changes to: B C A D E 1 2 3 4 6

  3. Huffman Code Example – Step 1 • Because B and C are the lowest values, they can be appended. The new value is 3

  4. Huffman Code Example – Step 2 • Reorder the problem using the increasing algorithm again. This gives us: BC A D E 3 3 4 6

  5. Huffman Code Example – Step 3 • Doing another append will give:

  6. Huffman Code Example – Step 4 • From the initial BC A D E code we get: D E ABC • 6 6 D E BCA 4 6 6 D ABC E 4 6 6 D BCA E 4 6 6

  7. Huffman Code Example – Step 5 • Taking derivates from the previous step, we get: D E BCA 4 6 6 E DBCA 6 10 DABC E 10 6 D E ABC 4 6 6

  8. Huffman Code Example – Step 6 • Taking derivates from the previous step, we get: BCA D E 6 4 6 E DBCA 6 10 E DABC 40 10 ABC D E 6 4 6

  9. Huffman Code Example – Step 7 • After the previous step, we’re supposed to map a 1 to each right branch and a 0 to each left branch. The results of the codes are:

  10. Example • Items={milk, coke, pepsi, beer, juice}. • Support = 3 baskets. B1 = {m, c, b} B2 = {m, p, j} B3 = {m, b} B4 = {c, j} B5 = {m, p, b} B6 = {m, c, b, j} B7 = {c, b, j} B8 = {b, c} • Frequent itemsets: {m}, {c}, {b}, {j}, {m, b}, {c, b}, {j, c}.

  11. Association Rules • Association rule R : Itemset1 => Itemset2 • Itemset1, 2 are disjoint and Itemset2 is non-empty • meaning: if transaction includes Itemset1 then it also has Itemset2 • Examples • A,B => E,C • A => B,C

  12. Example + _ _ + B1 = {m, c, b} B2 = {m, p, j} B3 = {m, b} B4 = {c, j} B5 = {m, p, b} B6 = {m, c, b, j} B7 = {c, b, j} B8 = {b, c} • An association rule: {m, b} →c. • Confidence = 2/4 = 50%.

  13. From Frequent Itemsets to Association Rules • Q: Given frequent set {A,B,E}, what are possible association rules? • A => B, E • A, B => E • A, E => B • B => A, E • B, E => A • E => A, B • __ => A,B,E (empty rule), or true => A,B,E

  14. Classification Rules Focus on one target field Specify class in all cases Measures: Accuracy Association Rules Many target fields Applicable in some cases Measures: Support, Confidence, Lift Classification vs Association Rules

  15. Rule Support and Confidence • Suppose R: I => J is an association rule • sup (R) = sup (I  J) is the support count • support of itemset I  J (I or J) • conf (R) = sup(J) / sup(R) is the confidence of R • fraction of transactions with I  J that have J • Association rules with minimum support and count are sometimes called “strong” rules

  16. Association Rules Example: • Q: Given frequent set {A,B,E}, what association rules have minsup = 2 and minconf= 50% ? A, B => E : conf=2/4 = 50% A, E => B : conf=2/2 = 100% B, E => A : conf=2/2 = 100% E => A, B : conf=2/2 = 100% Don’t qualify A =>B, E : conf=2/6 =33%< 50% B => A, E : conf=2/7 = 28% < 50% __ => A,B,E : conf: 2/9 = 22% < 50%

  17. Find Strong Association Rules • A rule has the parameters minsup and minconf: • sup(R) >= minsup and conf (R) >= minconf • Problem: • Find all association rules with given minsup and minconf • First, find all frequent itemsets

  18. Finding Frequent Itemsets • Start by finding one-item sets (easy) • Q: How? • A: Simply count the frequencies of all items

  19. Finding itemsets: next level • Apriori algorithm (Agrawal & Srikant) • Idea: use one-item sets to generate two-item sets, two-item sets to generate three-item sets, … • If (A B) is a frequent item set, then (A) and (B) have to be frequent item sets as well! • In general: if X is frequent k-item set, then all (k-1)-item subsets of X are also frequent • Compute k-item set by merging (k-1)-item sets

  20. Finding Association Rules • A typical question: “find all association rules with support ≥s and confidence ≥c.” • Note: “support” of an association rule is the support of the set of items it mentions. • Hard part: finding the high-support (frequent ) itemsets. • Checking the confidence of association rules involving those sets is relatively easy.

  21. Naïve Algorithm • A simple way to find frequent pairs is: • Read file once, counting in main memory the occurrences of each pair. • Expand each basket of n items into its n (n -1)/2 pairs. • Fails if #items-squared exceeds main memory.

  22. Filter Filter Construct Construct C1 L1 C2 L2 C3 First pass Second pass

  23. [Agrawal, Srikant 94] Fast Algorithms for Mining Association Rules, by Rakesh Agrawal and Ramakrishan Sikant, IBM Almaden Research Center

  24. C^1 L1 Database C2 C^2 L2 C^3 L3 C3

  25. Dynamic ProgrammingApproach • Want proof of principle of optimality and overlapping subproblems • Principle of Optimality • The optimal solution to Lk includes the optimal solution of Lk-1 • Proof by contradiction • Overlapping Subproblems • Lemma of every subset of a frequent item set is a frequent item set • Proof by contradiction

More Related