Download
decision trees n.
Skip this Video
Loading SlideShow in 5 Seconds..
Decision Trees PowerPoint Presentation
Download Presentation
Decision Trees

Decision Trees

124 Vues Download Presentation
Télécharger la présentation

Decision Trees

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012

  2. Information Gain • InfoGain(S,A): expected reduction in entropy due to A

  3. Information Gain • InfoGain(S,A): expected reduction in entropy due to A

  4. Information Gain • InfoGain(S,A): expected reduction in entropy due to A

  5. Information Gain • InfoGain(S,A): expected reduction in entropy due to A • Select A with max InfoGain • Resulting in lowest average entropy

  6. Fraction of samples down branch i Disorder of class distribution on branch i Computing Average Entropy |S| instances Branch 2 Branch1 Sa2a Sa2b Sa1a Sa1b

  7. Sunburn Example

  8. Hair Color Height Lotion Weight Picking a Test Brown Blonde Tall Short Red Average Alex:N Annie:B Katie:N Sarah:B Emily:B John:N Sarah: B Dana: N Annie: B Katie: N Alex: N Pete: N John: N Dana:N Pete:N Emily: B Yes No Heavy Light Average Sarah:B Annie:B Emily:B Pete:N John:N Dana:N Alex:N Katie:N Dana:N Alex:N Annie:B Emily:B Pete:N John:N Sarah:B Katie:N

  9. Entropy in Sunburn Example

  10. Entropy in Sunburn Example S = [3B,5N]

  11. Entropy in Sunburn Example S = [3B,5N]

  12. Entropy in Sunburn Example S = [3B,5N] Hair color= 0.954-(4/8(-2/4 log 2/4 - 2/4log2/4) + 1/8*0 + 3/8 *0) = 0.954- 0.5 = 0.454 Height = 0.954 - 0.69= 0.264 Weight = 0.954 - 0.94= 0.014 Lotion = 0.954 - 0.61= 0.344

  13. Height Lotion Weight Picking a Test Tall Short Average Annie:B Katie:N Sarah:B Dana:N Yes No Heavy Light Average Sarah:B Annie:B Dana:N Katie:N Dana:N Annie:B Sarah:B Katie:N

  14. Entropy in Sunburn Example S=[2B,2N] Height = 1-2/4(-1/2log1/2-1/2log1/2) + 1/4*0+1/4*0 = 1- 0.5 =0.5 Weight = 1-2/4(-1/2log1/2-1/2log1/2) +2/4(-1/2log1/2-1/2log1/2) = 1 Lotion = 1- 0 = 1

  15. Building Decision Trees with Information Gain • Until there are no inhomogeneous leaves

  16. Building Decision Trees with Information Gain • Until there are no inhomogeneous leaves • Select an inhomogeneous leaf node

  17. Building Decision Trees with Information Gain • Until there are no inhomogeneous leaves • Select an inhomogeneous leaf node • Replace that leaf node by a test node creating subsets that yield highest information gain

  18. Building Decision Trees with Information Gain • Until there are no inhomogeneous leaves • Select an inhomogeneous leaf node • Replace that leaf node by a test node creating subsets that yield highest information gain • Effectively creates set of rectangular regions • Repeatedly draws lines in different axes

  19. Alternate Measures • Issue with Information Gain:

  20. Alternate Measures • Issue with Information Gain: • Favors features with more values • Option:

  21. Alternate Measures • Issue with Information Gain: • Favors features with more values • Option: • Gain Ratio

  22. Alternate Measures • Issue with Information Gain: • Favors features with more values • Option: • Gain Ratio • Sa : elements of S with value A=a

  23. Overfitting • Overfitting: • Model fits the training data TOO well • Fits noise, irrelevant details

  24. Overfitting • Overfitting: • Model fits the training data TOO well • Fits noise, irrelevant details • Why is this bad?

  25. Overfitting • Overfitting: • Model fits the training data TOO well • Fits noise, irrelevant details • Why is this bad? • Harms generalization • Fits training data too well, fits new data badly

  26. Overfitting • Overfitting: • Model fits the training data TOO well • Fits noise, irrelevant details • Why is this bad? • Harms generalization • Fits training data too well, fits new data badly • For model m, training_error(m), D_error(m) – D=all data

  27. Overfitting • Overfitting: • Model fits the training data TOO well • Fits noise, irrelevant details • Why is this bad? • Harms generalization • Fits training data too well, fits new data badly • For model m, training_error(m), D_error(m) – D=all data • If overfit, for another model m’, • training_error(m) < training_error(m’), but • D_error(m) > D_error(m’)

  28. Avoiding Overfitting • Strategies to avoid overfitting:

  29. Avoiding Overfitting • Strategies to avoid overfitting: • Early stopping:

  30. Avoiding Overfitting • Strategies to avoid overfitting: • Early stopping: • Stop when InfoGain < threshold • Stop when number of instances < threshold • Stop when tree depth > threshold • Post-pruning

  31. Avoiding Overfitting • Strategies to avoid overfitting: • Early stopping: • Stop when InfoGain < threshold • Stop when number of instances < threshold • Stop when tree depth > threshold • Post-pruning • Grow full tree and remove branches • Which is better?

  32. Avoiding Overfitting • Strategies to avoid overfitting: • Early stopping: • Stop when InfoGain < threshold • Stop when number of instances < threshold • Stop when tree depth > threshold • Post-pruning • Grow full tree and remove branches • Which is better? • Unclear, both used. • For some applications, post-pruning better

  33. Post-Pruning • Divide data into • Training set: used to build the original tree • Validation set: used to perform pruning

  34. Post-Pruning • Divide data into • Training set: used to build the original tree • Validation set: used to perform pruning • Build decision tree based on training data

  35. Post-Pruning • Divide data into • Training set: used to build the original tree • Validation set: used to perform pruning • Build decision tree based on training data • Until pruning does not reduce validation set performance • Compute perf. for pruning each nodes (& its children) • Greedily remove nodes that do not reduce VS performance

  36. Post-Pruning • Divide data into • Training set: used to build the original tree • Validation set: used to perform pruning • Build decision tree based on training data • Until pruning does not reduce validation set performance • Compute perf. for pruning each nodes (& its children) • Greedily remove nodes that do not reduce VS performance • Yields smaller tree with best performance

  37. Performance Measures • Compute accuracy on:

  38. Performance Measures • Compute accuracy on: • Validation set • k-fold cross-validation

  39. Performance Measures • Compute accuracy on: • Validation set • k-fold cross-validation • Weighted classification error cost: • Weight some types of errors more heavily

  40. Performance Measures • Compute accuracy on: • Validation set • k-fold cross-validation • Weighted classification error cost: • Weight some types of errors more heavily • Minimum description length:

  41. Performance Measures • Compute accuracy on: • Validation set • k-fold cross-validation • Weighted classification error cost: • Weight some types of errors more heavily • Minimum description length: • Favor good accuracy on compact models • MDL = error(tree) + model_size(tree)

  42. Rule Post-Pruning • Convert tree to rules

  43. Rule Post-Pruning • Convert tree to rules • Prune rules independently

  44. Rule Post-Pruning • Convert tree to rules • Prune rules independently • Sort final rule set

  45. Rule Post-Pruning • Convert tree to rules • Prune rules independently • Sort final rule set • Probably most widely used method (toolkits)

  46. Modeling Features • Different types of features need different tests • Binary: Test branches on

  47. Modeling Features • Different types of features need different tests • Binary: Test branches on true/false • Discrete: Branches

  48. Modeling Features • Different types of features need different tests • Binary: Test branches on true/false • Discrete: Branches for each discrete value • Continuous?

  49. Modeling Features • Different types of features need different tests • Binary: Test branches on true/false • Discrete: Branches for each discrete value • Continuous? • Need to discretize

  50. Modeling Features • Different types of features need different tests • Binary: Test branches on true/false • Discrete: Branches for each discrete value • Continuous? • Need to discretize • Enumerate all values