1 / 14

Mining Disjunctive Association Rules Using Genetic Programming

Mining Disjunctive Association Rules Using Genetic Programming. Michelle Lyman Gary Lewandowski Department of Mathematics and Computer Science Xavier University Celebration of Student Research April 4, 2005. Genetic Algorithms. Target string: xavier Generation 1: iTksr fitness = 1

arvid
Télécharger la présentation

Mining Disjunctive Association Rules Using Genetic Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Disjunctive Association Rules Using Genetic Programming Michelle Lyman Gary Lewandowski Department of Mathematics and Computer Science Xavier University Celebration of Student Research April 4, 2005

  2. Genetic Algorithms • Target string: xavier • Generation 1: \iTksr fitness = 1 • Generation 504: \aT:sr fitness = 2 • Generation 1143: \aT:er fitness = 3 • Generation 1498: \av:er fitness = 4 • Generation 1701: xav:er fitness = 5 • Generation 1857: xavier fitness = 6 • What are they? (Example: string evolution) • Selection (Roulette Wheel) • Crossover • First string: xPki;& • Second string: \aT:er • Index:3 • Resulting strings: xPkier • \aT:;&

  3. Genetic Programming • Trees Arithmetic operation Logical operation

  4. Mutation • Change an operator or a value original tree mutated tree • Replace an existing subtree with a new subtree original tree mutated tree

  5. Original two trees Resulting trees Crossover

  6. Association Rules • Conjunctive • If a customer buys turkey and cranberry sauce, then the customer also buys corn and pumpkin pie. • Regular If a customer buys peanut butter, then the customer also buys jelly. • Disjunctive • If a customer buys ham or turkey, then the customer also buys wheat bread exclusive-or white bread. • If some condition a exists, then condition c exists. • a is the antecedent. • c is the consequent.

  7. Card Sorting

  8. Rule Format and and loop iteration tree recursion Antecedent Consequent If a student groups loop and iteration, then the student also groups tree and recursion.

  9. A Small Problem Assume we want to build a tree for the rule, if there exists a group that contains recursion and tree and there exists a group that contains loop and iteration, then there exists a group that contains function exclusive-or method exclusive-or there exists a group that contains scope exclusive-or thread. and xor and and xor xor recursion tree loop iteration function method scope thread Another possible interpretation of this rule is, if there exists a group that contains recursion and tree and loop and iteration, then there exists a group that contains function exclusive-or method exclusive-or scope exclusive-or thread. solution: SAND, SXOR, SOR, GAND, and GXOR

  10. Goals • to understand how students understand and relate concepts • high support - Let the set of sorts be called D. Let the set of sorts that fulfill the antecedent of rule R be called A. Let the set of sorts that fulfill the consequent be called C. Then the support for R is given by • high confidence - Using the same notation, the confidence for R is given by

  11. Goals (Continued) • a high number of cards used. Let the lower bound be called low, and the upper bound be called high. Using integer division, if the number of cards used in a tree is n, the score is given by • a large percentage of g operators • a high balance • 1. Card nodes have a balance of 1 • 2. G-operator nodes have a balance of 1 because each subtree must be true • for the g-operator to evaluate to true. • 3. S-operator nodes have a balance given by • where leftCount is the number of times the left side was true and rightCount • is the number of times the right side was true. The balance for the rule is the • minimum blance of the antecedent and the consequent.

  12. Fitness Function • Each of the following four factors is evaluated. Each can take on an integer value • between 0 and 100: • 1. support • 2. confidence • 3. card points • 4. g operator points • Each is weighted by a user specified multiplier, and the values are multiplied together. • The result is then scaled by the balance factor, which is a value between 0 and 1.

  13. Data • 1044 student sorts • 158 educator sorts • Performance subsets [1, 2): 37 people [2, 3): 140 people [3, 4): 447 people [4, 5): 223 people [5]: 97 people

  14. Results • Educators have more rules. • 90% Balance, 50% Confidence • Many sorts separate high-level concepts (i.e., abstraction, encapsulation) from low-level concepts (i.e., array, variable). • Concepts that often appear together: (procedure, function) (encapsulation, choice, decomposition, abstraction) (constant, variable, boolean, array) (tree, list) (loop, if-then-else) (choice, thread) • I don’t know.

More Related