1 / 14

Continuous Attributes: Computing GINI Index / 2

Continuous Attributes: Computing GINI Index / 2. Measure of Impurity: Entropy. Computing Entropy of a Single Node. Computing information Gain After Splitting. Problems with Information Gain. Gain Ratio. Measure of Impurity: Classification Error. Computing Error of a Single Node.

glenda
Télécharger la présentation

Continuous Attributes: Computing GINI Index / 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continuous Attributes: Computing GINI Index / 2

  2. Measure of Impurity: Entropy

  3. Computing Entropy of a Single Node

  4. Computing information Gain After Splitting

  5. Problems with Information Gain

  6. Gain Ratio

  7. Measure of Impurity: Classification Error

  8. Computing Error of a Single Node

  9. Comparison among Impurity Measures For binary (2-class) classification problems

  10. Misclassification Error vs Gini index

  11. Example: C4.5 • Simple depth-first construction. • Uses Information Gain • Sorts Continuous Attributes at each node. • Needs entire data to fit in memory. • Unsuitable for Large Datasets. • Needs out-of-core sorting. • You can download the software from:http://www.cse.unsw.edu.au/~quinlan/c4.5r8.tar.gz

  12. Scalable Decision Tree Induction / 1 • How scalable is decision tree induction? • Particularly suitable for small data set • SLIQ (EDBT’96 — Mehta et al.) • Builds an index for each attribute and only class list and the current attribute list reside in memory

  13. Scalable Decision Tree Induction / 2 • SLIQ Sample data for the class buys_computer Disk-resident attribute lists Memory-resident class list 0 1 2 3 4 5 6

  14. Decision Tree Based Classification • Advantages • Inexpensive to construct • Extremely fast at classifying unknown records • Easy to interpret for small-sized tress • Accuracy is comparable to other classification techniques for many data sets • Practical Issues of Classification • Underfitting and Overfitting • Missing Values • Costs of Classification

More Related