1 / 18

Lower-Bound Estimate for Cost-sensitive Decision Trees

Lower-Bound Estimate for Cost-sensitive Decision Trees. Mikhail Goubko , senior researcher. Trapeznikov Institute of Control Sciences of the Russian Academy of Sciences, Moscow, Russia. 18 th IFAC World Congress, Milan, August 31, 2011.

alika
Télécharger la présentation

Lower-Bound Estimate for Cost-sensitive Decision Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lower-Bound Estimate for Cost-sensitive Decision Trees Mikhail Goubko, senior researcher Trapeznikov Institute of Control Sciences of the Russian Academy of Sciences, Moscow, Russia 18th IFAC World Congress, Milan, August 31, 2011

  2. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 • New lower-bound estimate is suggested for the decision tree with case-dependent test costs • Unlike known estimates it performs well when the number of classes is small • Estimate calculation average performance is n2m operationsfor n examples and mtests • Use the estimate to evaluate absolute losses of heuristic decision tree algorithms • Use the estimate in split criteria of greedy top-down algorithms of decision tree construction • Experiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics – IDX, CS-ID3, EG2 • Algorithms suggested perform better on small data sets with lack of tests S U M M A R Y Introduction Decision trees • Decision tree – a popular classification tool for: • machine learning • pattern recognition • fault detection • medical diagnostics • situational control • Decision is made from a series of tests of attributes • The next attribute tested depends on the results of the previous tests • Decision trees are learned from data • Compact trees are good • Expected length of the path in a tree is the most popular measure of tree size Costs Motivation Results Literature Definitions Calculation Applications Bonus: Hierarchy Optimization Model Methods from www.crengland.com Apps

  3. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 • New lower-bound estimate is suggested for the decision tree with case-dependent test costs • Unlike known estimates it performs well when the number of classes is small • Estimate calculation average performance is n2m operationsfor n examples and mtests • Use the estimate to evaluate absolute losses of heuristic decision tree algorithms • Use the estimate in split criteria of greedy top-down algorithms of decision tree construction • Experiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics – IDX, CS-ID3, EG2 • Algorithms suggested perform better on small data sets with lack of tests S U M M A R Y Introduction Decision trees Set of attributes M Costs Yes No Yes Yes No No No No No Wheezing Motivation Results Literature No No Yes Yes Yes Yes Yes Yes No Cough Definitions High High High High High High Low Low Low Calculation t0 Applications Bonus: Hierarchy Optimization No No No No No Yes No Yes No Chronic illness Model Set D of decisions influenza chest cold healthy Set of cases N m1 m2 m4 m7 mn m3 m5 m6 .... Methods Apps

  4. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 S U M M A R Y Introduction Decision trees Set of attributes M Costs Yes No Yes Yes No No No No No Wheezing Motivation Results Literature No No Yes Yes Yes Yes Yes Yes No Cough Definitions High High High High High High Low Low Low Calculation t0 Applications Bonus: Hierarchy Optimization No No No No No Yes No Yes No Chronic illness Model Set D of decisions influenza chest cold healthy Set of cases N m1 m2 m4 m7 mn m3 m5 m6 .... Methods Apps

  5. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 S U M M A R Y Introduction Decision trees Set of attributes M Costs High High High High High High Low Low Low Motivation Results Literature No No Yes Yes Yes Yes Yes Yes No Definitions Yes No Yes Yes No No No No No Calculation Applications Bonus: Hierarchy Optimization No No No No No Yes No Yes No Model Set D of decisions influenza chest cold healthy Set of cases N m1 m2 m4 m7 mn m3 m5 m6 .... Methods Apps

  6. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 • New lower-bound estimate is suggested for the decision tree with case-dependent test costs • Unlike known estimates it performs well when the number of classes is small • Estimate calculation average performance is n2m operationsfor n examples and mtests • Use the estimate to evaluate absolute losses of heuristic decision tree algorithms • Use the estimate in split criteria of greedy top-down algorithms of decision tree construction • Experiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics – IDX, CS-ID3, EG2 • Algorithms suggested perform better on small data sets with lack of tests S U M M A R Y Introduction Decision trees Set of attributes M Wheezing Costs High High High High High High Low Low Low Motivation Cough Results No No Yes Yes Yes Yes Yes Yes No Literature t0 Definitions Yes Yes No No No Yes Calculation Chronic illness Applications Bonus: Hierarchy Optimization Model Set D of decisions influenza chest cold healthy Set of cases N m1 m2 m4 m7 mn m3 m5 m6 .... Methods Apps

  7. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 • New lower-bound estimate is suggested for the decision tree with case-dependent test costs • Unlike known estimates it performs well when the number of classes is small • Estimate calculation average performance is n2m operationsfor n examples and mtests • Use the estimate to evaluate absolute losses of heuristic decision tree algorithms • Use the estimate in split criteria of greedy top-down algorithms of decision tree construction • Experiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics – IDX, CS-ID3, EG2 • Algorithms suggested perform better on small data sets with lack of tests S U M M A R Y Introduction Decision trees • Tests differ in measurement costs (e.g., a general blood analysis is much cheaper than a computer tomography procedure) • The decision tree is grown to minimize the expected cost of classification given zero misclassification rate on a learning set • Other types of costs relevant to decision tree growing (Turney, 2000), i.e. misclassification, teaching, intervention costs, are not considered • Measurement costs (see Turney, 2000) may depend on: • individual case • true class of the case • side-effects (value of hidden attributes) • prior tests • prior results (other attributes’ values) • correct answer of the current question Costs Motivation Results Literature Definitions Calculation Studied! Applications Bonus: Hierarchy Optimization Covered! Model Can be accounted! Methods Apps

  8. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 • New lower-bound estimate is suggested for the decision tree with case-dependent test costs • Unlike known estimates it performs well when the number of classes is small • Estimate calculation average performance is n2m operationsfor n examples and mtests • Use the estimate to evaluate absolute losses of heuristic decision tree algorithms • Use the estimate in split criteria of greedy top-down algorithms of decision tree construction • Experiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics – IDX, CS-ID3, EG2 • Algorithms suggested perform better on small data sets with lack of tests S U M M A R Y Introduction Decision trees • The decision tree growing problem is NP-hard and even hard to approximate • Numerous heuristic algorithms were suggested of finding near-optimal trees (cost-sensitive algorithms are IDX, EG2, CS-ID3 and many others) • These heuristics seem to perform well on real data but the question is always open - how much extra cost is due to imperfectness of an algorithm? • Money losses may be substantial in some settings • As optimal tree is hard to compute, its cost should be approximated from below. • So we need a lower-bound estimate for the optimal decision tree cost • Estimates known from the literature have common limitations: • too optimistic when the number of classes is small • attributes’ cardinality variations are not accounted • Below a new lower-bound estimate is suggested and studied Costs Motivation Results Literature Definitions Calculation Applications Bonus: Hierarchy Optimization Model Methods Apps

  9. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 • New lower-bound estimate is suggested for the decision tree with case-dependent test costs • Unlike known estimates it performs well when the number of classes is small • Estimate calculation average performance is n2m operationsfor n examples and mtests • Use the estimate to evaluate absolute losses of heuristic decision tree algorithms • Use the estimate in split criteria of greedy top-down algorithms of decision tree construction • Experiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics – IDX, CS-ID3, EG2 • Algorithms suggested perform better on small data sets with lack of tests S U M M A R Y Introduction Decision trees • Hyafil and Rivest (1976), and also Zantema and Bodlaender (2000) have shown the decision tree growing problem to be NP-hard • Sieling (2008) has shown that the size of an optimal tree is hard to approximate up to any constant factor • Quinlan (1979) developed an information-gain-based heuristics ID3. Now it is commercial and has version 5 • Heuristic algorithms for cost-sensitive tree construction: • CS-ID3 - Tan (1993) • IDX - Norton (1989) • EG2 - Núñez (1991) • Lower-bound estimates for cost-sensitive decision trees: • Entropy-based estimate - Ohtaand Kanaya (1991) • Huffman code length - Biasizzo, Žužek, and Novak (1998) • Combinatory estimate - Bessiere, Hebrard, and O’Sullivan (2009) Costs Motivation Results Literature Definitions Calculation Applications Bonus: Hierarchy Optimization Model Methods Apps

  10. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 • New lower-bound estimate is suggested for the decision tree with case-dependent test costs • Unlike known estimates it performs well when the number of classes is small • Estimate calculation average performance is n2m operationsfor n examples and mtests • Use the estimate to evaluate absolute losses of heuristic decision tree algorithms • Use the estimate in split criteria of greedy top-down algorithms of decision tree construction • Experiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics – IDX, CS-ID3, EG2 • Algorithms suggested perform better on small data sets with lack of tests S U M M A R Y Introduction Decision trees Definition 1. A subset of tests QMisolates case w insubset of casesSN (wS) iff sequence of tests Q assures proper decision f(w) given initial uncertainty S and w is the real state of the world. Definition 2. Optimalset of questionsQ(w, S) Mis the cheapest of the sets of questions that isolate case w in set S; Define also minimum cost Costs Motivation Results Literature Definitions Calculation The lower-bound estimate Applications Bonus: Hierarchy Optimization Model • Unlike known estimates it performs well when: • the number of classes is small compared to that of examples • there is a small number of examples Methods Apps

  11. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 • New lower-bound estimate is suggested for the decision tree with case-dependent test costs • Unlike known estimates it performs well when the number of classes is small • Estimate calculation average performance is n2m operationsfor n examples and mtests • Use the estimate to evaluate absolute losses of heuristic decision tree algorithms • Use the estimate in split criteria of greedy top-down algorithms of decision tree construction • Experiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics – IDX, CS-ID3, EG2 • Algorithms suggested perform better on small data sets with lack of tests S U M M A R Y Introduction Decision trees The problem of unknown case classification is replaced by the problem of proving the true case to a third party. Costs Simplified problem Initial problem Motivation Results Literature Definitions Calculation Applications Bonus: Hierarchy Optimization Test this Model Try testing this Methods I know the true case! What is the true case? Apps

  12. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 • New lower-bound estimate is suggested for the decision tree with case-dependent test costs • Unlike known estimates it performs well when the number of classes is small • Estimate calculation average performance is n2m operationsfor n examples and mtests • Use the estimate to evaluate absolute losses of heuristic decision tree algorithms • Use the estimate in split criteria of greedy top-down algorithms of decision tree construction • Experiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics – IDX, CS-ID3, EG2 • Algorithms suggested perform better on small data sets with lack of tests S U M M A R Y Introduction Decision trees Calculation of the estimate is reduced to a number of set-covering problems and is NP-hard in the worst case. Costs But experiments show nice performance for the estimate and its linear programming relaxation. Motivation Results Literature Definitions Calculation Applications Bonus: Hierarchy Optimization Model Methods Apps Average time  n2m, (n - the number of cases, m - the number of tests)

  13. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 • New lower-bound estimate is suggested for the decision tree with case-dependent test costs • Unlike known estimates it performs well when the number of classes is small • Estimate calculation average performance is n2m operationsfor n examples and mtests • Use the estimate to evaluate absolute losses of heuristic decision tree algorithms • Use the estimate in split criteria of greedy top-down algorithms of decision tree construction • Experiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics – IDX, CS-ID3, EG2 • Algorithms suggested perform better on small data sets with lack of tests S U M M A R Y Introduction Decision trees 1. Use the lower-bound estimate to evaluate extra costs due to imperfectness of heuristic tree growing algorithms Costs Motivation The quality of the lower-bound estimate on real data sets Results Literature Definitions Calculation Applications Bonus: Hierarchy Optimization Model Methods The quality of the estimate varies from 60 to 95% depending on the data set Apps

  14. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 • New lower-bound estimate is suggested for the decision tree with case-dependent test costs • Unlike known estimates it performs well when the number of classes is small • Estimate calculation average performance is n2m operationsfor n examples and mtests • Use the estimate to evaluate absolute losses of heuristic decision tree algorithms • Use the estimate in split criteria of greedy top-down algorithms of decision tree construction • Experiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics – IDX, CS-ID3, EG2 • Algorithms suggested perform better on small data sets with lack of tests S U M M A R Y Introduction Decision trees 2. Use the lower-bound estimate to build new tree growing algorithms Costs Comparing new algorithms with known heuristics (IDX, CS-ID3, EG2) on different data sets (tree costs are shown) Motivation Results Literature Definitions Calculation Applications Bonus: Hierarchy Optimization Model Methods • new algorithms perform better on small data sets • new algorithms work worse in the presence of “dummy” tests • adjacency of results shows that heuristic trees are nearly optimal ones Apps

  15. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 • New lower-bound estimate is suggested for the decision tree with case-dependent test costs • Unlike known estimates it performs well when the number of classes is small • Estimate calculation average performance is n2m operationsfor n examples and mtests • Use the estimate to evaluate absolute losses of heuristic decision tree algorithms • Use the estimate in split criteria of greedy top-down algorithms of decision tree construction • Experiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics – IDX, CS-ID3, EG2 • Algorithms suggested perform better on small data sets with lack of tests S U M M A R Y Introduction Decision trees Costs Motivation Results Literature General methods of hierarchy optimization Definitions Calculation Applications Bonus: Hierarchy Optimization Model Methods Apps

  16. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 • New lower-bound estimate is suggested for the decision tree with case-dependent test costs • Unlike known estimates it performs well when the number of classes is small • Estimate calculation average performance is n2m operationsfor n examples and mtests • Use the estimate to evaluate absolute losses of heuristic decision tree algorithms • Use the estimate in split criteria of greedy top-down algorithms of decision tree construction • Experiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics – IDX, CS-ID3, EG2 • Algorithms suggested perform better on small data sets with lack of tests S U M M A R Y Introduction Decision trees • Allowed • arbitrary number of layers • asymmetric hierarchies • multiple subordination • several top nodes • Not allowed • cycles • unconnected parts • subordinating to the “worker” nodes (black) Costs Motivation Results Literature Definitions Calculation Applications Bonus: Hierarchy Optimization Model Cost function Methods The problem is to find Apps

  17. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 • New lower-bound estimate is suggested for the decision tree with case-dependent test costs • Unlike known estimates it performs well when the number of classes is small • Estimate calculation average performance is n2m operationsfor n examples and mtests • Use the estimate to evaluate absolute losses of heuristic decision tree algorithms • Use the estimate in split criteria of greedy top-down algorithms of decision tree construction • Experiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics – IDX, CS-ID3, EG2 • Algorithms suggested perform better on small data sets with lack of tests S U M M A R Y Introduction Decision trees Costs • Sectional cost function • 1. Covers most of applied problems of hierarchy optimization • 2. Analytical methods • when the tallest or the flattest hierarchy is optimal • when an optimal tree exists • when an optimal hierarchy has the shape of a conveyor • 3. Algorithms • optimal hierarchy search • optimal tree search • Build an optimal conveyor-belt hierarchy • 4. The general problem of hierarchy optimization is complex Motivation Results Literature Definitions Homogenous cost function 1. Also has numerous applications 2. Closed-form solution for the optimal hierarchy problem 3. Efficient algorithms of nearly-optimal tree construction Calculation Applications Bonus: Hierarchy Optimization Model Methods Apps

  18. Mikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC’2011, Milan, August 31 • New lower-bound estimate is suggested for the decision tree with case-dependent test costs • Unlike known estimates it performs well when the number of classes is small • Estimate calculation average performance is n2m operationsfor n examples and mtests • Use the estimate to evaluate absolute losses of heuristic decision tree algorithms • Use the estimate in split criteria of greedy top-down algorithms of decision tree construction • Experiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics – IDX, CS-ID3, EG2 • Algorithms suggested perform better on small data sets with lack of tests S U M M A R Y Introduction Decision trees • Applications of hierarchy optimization methods • Manufacturing planning (assembly line balancing) • Networks design • communication and computing networks • data collection networks • structure of hierarchical cellular networks • Computational mathematics • optimal coding • structure of algorithms • real-time computation and aggregation • hierarchical parallel computing • User interfaces design • optimizing hierarchical menus • building compact and informative taxonomies • Data mining • decision trees growing • structuring database indexes • Organization design • org. chart re-engineering • theoretical models of a hierarchical firm Costs Motivation Results Literature Definitions Calculation Applications Bonus: Hierarchy Optimization Model Methods Apps

More Related