1 / 7

Implementing Hoeffding Decision Trees in DB2

Implementing Hoeffding Decision Trees in DB2. CS240A, Win 2003 Carlo Zaniolo. Decision Tree Learning Algorithms. Decision Tree Example. Temp. Mild. Cool. Hot. Yes. No. Yes. New Application Challenges. Classical Learning : In-Memory Data, All Data Available at Beginning

jasia
Télécharger la présentation

Implementing Hoeffding Decision Trees in DB2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementing Hoeffding Decision Trees in DB2 CS240A, Win 2003 Carlo Zaniolo

  2. Decision Tree Learning Algorithms • Decision Tree Example Temp Mild Cool Hot Yes No Yes

  3. New Application Challenges • Classical Learning: In-Memory Data, All Data Available at Beginning • New Scenario: Very Large Data, Streaming in • SOLUTIONS: Incremental Learning

  4. V1 A B C1 C2 V2 X Y C1 C2 Incremental Decision Tree Construction Intuitively, a small number of samples is sufficient to choose the best attributes to test on each node.

  5. Hoeffding Decision Tree • Hoeffding Bound: Given a random variable r, we make n independent observations to estimate its mean and get . Hoeffding bound states that with probability 1 - , the true mean of the variable is at least - , where  is : • Hoeffding Tree: • random variable r: the difference between the information gain given by the best attributes and the 2nd best attributes. • observation: training samples falling into the node so far. • goal: the best attribute is chosen with confidence of 1 - . • mechanics: maintaining a distribution table: ( attr, attr_val, class, # samples )

  6. Make the Best Out of DB2 • Input: training and testing data both in DB2 tables. • Output: a decision tree in DB2 table. • DB2 utilities you may consider: • UDF for learning, • Recursive SQL for prediction. Have Fun !

  7. References: • Pedro Domingos, Geoff Hulten, Mining High-Speed Data Streams, ACM SIGKDD 2000. • Don Chamberlin, A Complete Guide to DB2 Universal Database, Morgan Kaufmann, 1998.

More Related