1 / 19

Data Mining Schemes in Practice

Data Mining Schemes in Practice. Implementation: Real machine learning schemes. Decision trees: from ID3 to C4.5 missing values, numeric attributes, pruning efficiency Instance-based learning Speed up, combat noise, attribute weighting, generalized exemplars. Numeric attributes.

kiona
Télécharger la présentation

Data Mining Schemes in Practice

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Schemes in Practice

  2. Implementation:Real machine learning schemes • Decision trees: from ID3 to C4.5 • missing values, numeric attributes, pruning efficiency • Instance-based learning • Speed up, combat noise, attribute weighting, generalized exemplars

  3. Numeric attributes • Standard method: binary splits • E.g. temp < 45 • Unlike nominal attributes,every attribute has many possible split points • Solution is a straightforward extension: • Evaluate info gain (or other measure)for every possible split point of attribute • Choose “best” split point • Info gain for best split point is info gain for attribute • Computationally more demanding

  4. Weather data (again!)

  5. Example • Split on temperature attribute: 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No • E.g. temperature  71.5: yes/4, no/2 temperature  71.5: yes/5, no/3 • Info([4,2],[5,3])= 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits • Place split points halfway between values • Can evaluate all split points in one pass!

  6. Avoid repeated sorting! • Sort instances by the values of the numeric attribute • Time complexity for sorting: O (n log n) • Does this have to be repeated at each node of the tree? • No! Sort order for children can be derived from sort order for parent • Time complexity of derivation: O (n) • Drawback: need to create and store an array of sorted indices for each numeric attribute

  7. Binary vs multiway splits • Splitting (multi-way) on a nominal attribute exhausts all information in that attribute • Nominal attribute is tested (at most) once on any path in the tree • Not so for binary splits on numeric attributes! • Numeric attribute may be tested several times along a path in the tree • Disadvantage: tree is hard to read • Remedy: • pre-discretize numeric attributes, or • use multi-way splits instead of binary ones

  8. Computing multi-way splits • Dynamic programming can find optimum multi-way split in O (n2) time • imp (k, i, j ) is the impurity of the best split of values xi … xj into k sub-intervals • imp (k, 1, n ) = min1j <n imp (k–1, 1, j ) + imp (1, j+1, n ) • imp (k, 1, n ) gives us the best k-way split

  9. (4,1,10) (3,1,7) (1,8,10) … (3,1,8) (1,8,10) … … (2,1,3) (1,4,7) … … (2,1,3) (1,4,8) … … … E.g. we better remember the result for (2,1,3) in order to not repeat its computation. If we don’t remember the previous computations, then the complexity would be exponential in k. Recursion unfolding for imp(.): E.g. imp(4,1,10)

  10. Dyn. Prog. By Memoization imp (k, 1, n ) = min1j <n imp (k–1, 1, j ) + imp (1, j+1, n ) double imp(int k, int i, int n) { static double memoization[k+1][n+1][n+1]; if(memoization[k][i][n] != 0) return memoization[k][i][n]; if(i==j || k==1) return entropy(considering elements i..j) double min = +infinity; for(int j=1; j<n; j++) { double t = imp(k-1,1,j)+imp(1,j+1,n); if (min > t) min = t; } return min; }

  11. Missing values • Split instances with missing values into pieces • A piece going down a branch receives a weight proportional to the popularity of the branch • weights sum to 1 • Info gain works with fractional instances • use sums of weights instead of counts • During classification, split the instance into pieces in the same way • Merge probability distribution using weights

  12. Missing value example Info(4/13+1, 3) = -(4/13+1)/(4/13+1+3)log[(4/13+1)/(4/13+1+3)] -3/(4/13+1+3)log[3/(4/13+1+3)] And so on… What about the classification of a new instance with missing values?

  13. Pruning • Prevent overfitting to noise in the data • “Prune” the decision tree • Two strategies: • Postpruningtake a fully-grown decision tree and discard unreliable parts • Prepruningstop growing a branch when information becomes unreliable • Postpruning preferred in practice—prepruning can “stop early”

  14. Prepruning • Stop growing the tree when there is no significant association between any attribute and the class at a particular node • I.e. stop if there is no significant info gain.

  15. Early stopping • Pre-pruning may stop the growth process prematurely: early stopping • Classic example: XOR/Parity-problem • No individual attribute exhibits any significant association to the class • Structure is only visible in fully expanded tree • Prepruning won’t expand the root node

  16. Postpruning • First, build full tree • Then, prune it • Fully-grown tree shows all attribute interactions • Two pruning operations: • Subtree replacement • Subtree raising • Possible strategies: • error estimation • significance testing • MDL principle

  17. Example

  18. Subtreereplacement • Bottom-up • Consider replacing a tree only after considering all its subtrees If estimated error doesn’t get bigger, replace subtree.

  19. Subtree raising • Delete node • Redistribute instances • Slower than subtree replacement (Worthwhile?) If estimated error doesn’t get bigger, raise subtree.

More Related