html5-img
1 / 88

Data Mining Chapter 3 Output: Knowledge Representation

Data Mining Chapter 3 Output: Knowledge Representation. Kirk Scott. A summary of ways of representing knowledge, the results of mining: Rule sets Decision trees Regression equations Clusters Deciding what kind of output you want is the first step towards picking a mining algorithm.

macy
Télécharger la présentation

Data Mining Chapter 3 Output: Knowledge Representation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data MiningChapter 3Output: Knowledge Representation Kirk Scott

  2. A summary of ways of representing knowledge, the results of mining: • Rule sets • Decision trees • Regression equations • Clusters • Deciding what kind of output you want is the first step towards picking a mining algorithm

  3. 3.1 Tables

  4. Output can be in the form of tables • This is kind of lame • All they’re saying is that instances can be organized to form a lookup table for classification • The contact lens data can be viewed in this way • At the end they will consider another way in which the instance set itself is pretty much the result of mining

  5. 3.2 Linear Models

  6. For problems with numeric attributes you can apply statistical methods • The computer performance example was given earlier • The methods will be in more detail in chapter 4 • The statistical approach can be illustrated graphically

  7. Fitting a Line • This would be a linear equation relating cache size to computer performance • PRP = 37.06 + 2.47 CACH • This defines the straight line that best fits the instances in the data set • Figure 3.1, on the following overhead, shows both the data points and the line

  8. Finding a Boundary • A different technique will find a linear decision boundary • This linear equation in petal length and petal width will separate instances of Iris setosa and Iris versicolor • 2.0 – 0.5 PETAL_LENGTH – 0.8 PETAL_WIDTH = 0

  9. An instance of Iris setosa should give a value >0 (above/to the right of the line) and an instance of Iris versicolor should give a value <0 • Figure 3.2, on the following overhead, shows the boundary line and the instances of the two kinds of Iris

  10. 3.3 Trees

  11. The book summarizes the different kinds of decisions (< , =, etc.) that might be coded for a single attribute at each node in a decision tree • Most are straightforward and don’t need to be repeated here • Several more noteworthy aspects will be addressed on the following overheads

  12. Null Values • If nulls occur, you will have to make a decision based on them in any case • The occurrence of a null value may be one of the separate branches out of a decision tree node • At this point the value of assigning a meaning to null becomes apparent (not available, not applicable, not important…)

  13. Approaches to dealing with uncoded/undistinguished nulls: • Keep track of the number of instances per branch and classify nulls with the most popular branch • Alternatively, keep track of the relative frequency of different branches • In the aggregate results, assign a corresponding proportion of the nulls to the different branches

  14. Other Kinds of Comparisons • Simple decisions compare attribute values and constants • Some decisions may compare two attributes in the same instance • Some decisions may be based on a function of >1 attribute per instance

  15. Oblique Splits • Comparing an attribute to a constant splits data parallel to an axis • A decision function which doesn’t split parallel to an axis is called an oblique split • In effect, the boundary between the kinds of irises shown earlier is such a split

  16. Option Nodes • A single node with alternative splits on different attributes is called an option node • Instances are classified according to each split and may appear in >1 leaf classification • The last part of analysis includes deciding what such results indicate

  17. Weka and Hand-Made Decision Trees • The book suggests that you can get a handle on decision trees by making one yourself • The book illustrates how Weka includes tools for doing this • To me this seems out of place until chapter 11 when Weka is introduced • I will not cover it here

  18. Regression Trees • For a problem with numeric attributes it’s possible to devise a tree-like classifier • Working from the bottom up: • The leaves contain the performance prediction • The prediction is the average of the performance of all instances that end up classified in that leaf • The internal nodes contain numeric comparisons of attribute values

  19. Model Trees • A model tree is a hybrid of a decision tree and regression • In a model tree instances are classified into a given leaf • Once a classification reaches the leaf, the prediction is made by applying a linear equation to some subset of instance attribute values

  20. Figure 3.4, on the following overhead, showns (a) a linear model, (b) a regression tree, and (c) a model tree

  21. Rule Sets from Trees • Given a decision tree, you can generate a corresponding set of rules • Start at the root and trace the path to each leaf, recording the conditions at each node • The rules in such a set are independent • Each covers a separate case

  22. The rules don’t have to be applied in a particular order • The downside is such a rule set is more complex than an ordered set • It is possible to prune a set derived from a tree to remove redundancy

  23. Trees from Rule Sets • Given a rule set, you can generate a decision tree • Now we’re interested in going in the opposite direction • Even a relatively simple rule set can lead to a messy tree

  24. A rule set may compactly represent a limited number of explicitly known cases • The other cases may be implicit in the rule set • The implicit cases have to be spelled out in the tree

  25. An Example • Take these rules for example: • If a and b then x • If c and d then x • The result is implicitly binary, either x or not x • The other variables are also implicitly binary (T or F)

  26. With 4 variables, a, b, c, and d, there can be up to 4 levels in the tree • A tree for this problem is shown in Figure 3.5 on the following overhead

  27. Messiness = Replicated Subtrees • The tree is messy because it contains replicated subtrees • If a = yes and b = no, you then have to test c and d • If a = no, you have to do exactly the same test on c and d • The gray leaves in the middle and the gray leaves on the right both descend from analogous branches of the tree

  28. The book states that “decision trees cannot easily express the disjunction implied among the different rules in a set.” • Translation: • One rule deals with a and b • The other rule is disjoint from the first rule; it deals only with b and c • As seen above, for “no” for each of a and b you have to do the same test on c and d

  29. Another Example of Replicated Subtrees • Figure 3.6, on the following overhead, illustrates an exclusive or (XOR) function

  30. Consider the graph: • (x = 1) XOR (y = 1)  a • Incidentally, note that you could also write: • (x <> y)  a, (x = y)  b

  31. Now consider the tree: • There’s nothing surprising: First test x, then test y • The gray leaves on the left and the right at the bottom are analogous • Now consider the rule set: • In this example the rule set is not simpler • This doesn’t negate the fact that the tree has replication

  32. Yet Another Example of a Replicated Subtree • Consider Figure 3.7, shown on the following overhead

  33. In this example there are again 4 attributes • This time they are 3-valued instead of binary • There are 2 disjoint rules, each including 2 of the variables • There is a default rule for all other cases

  34. The replication is represented in the diagram in this way: • Each gray triangle stands for an instance of the complete subtree on the lower left which is shown in gray

  35. The rule set would be equally complex IF there were a rule for each branch of the tree • It is less complex in this example because of the default rule

  36. Other Issues with Rule Sets • We have not seen the data mining algorithms yet, but some do not generate rule sets in a way analogous to reading all of the cases off of a decision tree • Sets (especially those not designed to be applied in a given order) may contain conflicting rules that classify specific cases into different categories

  37. Rule Sets that Produce Multiple Classifications • In practice you can take two approaches • Do not classify instances that fall into >1 category • Count how many times each rule is triggered by a training set and use the most popular of the classification rules when two conflict

  38. Rule Sets that Don’t Classify Certain Cases • If a rule set doesn’t classify certain cases, there are again two alternatives: • Do not classify those instances • Classify those instances with the most frequently occurring instances

  39. The Simplest Case with Rule Sets • Suppose all variables are Boolean • I.e., suppose rules only have two possible outcomes, T/F • Suppose only rules with T outcomes are expressed • (By definition, all unexpressed cases are F)

  40. Under the foregoing assumptions: • The rules are independent • The order of applying the rules is immaterial • The outcome is deterministic • There is no ambiguity

  41. Reality is More Complex • In practice, there can be ambiguity • The authors state that the assumption that there are only two cases, T/F, and only T is expressed, is a form of closed world assumption • In other words, the assumption is that everything is binary

  42. As soon as this and any other simplifying assumptions are relaxed, things become messier • In other words, rules become dependent, the order of application matters, etc. • This is when you can arrive at multiple classifications or no classifications from a rule set

  43. Association Rules • This subsection is largely repetition • Any subset of attributes may predict any other subset of attributes • Association rules are really just a generalization or superset of classification rules

  44. This is because this rule is one of many association rules (all non-class attributes)  (class attribute) • Because so many association rules are possible, you need criteria for defining interesting ones

More Related