1 / 28

Learning and Making Decisions When Costs and Probabilities are Both Uknown

This research explores cost-sensitive decision-making when costs and probabilities are unknown. It introduces a testbed using the KDD'98 charitable donations dataset and discusses probability estimation methods. Experimental results are presented.

aoakes
Télécharger la présentation

Learning and Making Decisions When Costs and Probabilities are Both Uknown

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning and Making Decisions When Costs and Probabilities are Both Uknown Authors:Bianca Zadrozny, Charles Elkan Advisor:Dr. Hsu Graduate:Yu-Wei Su IDSL, Intelligent Database System Lab

  2. Outline • Motivation • Objective • Introduction • MetaCost vs. direct cost-sensitive decision-making • a testbed:The KDD’98 charitable donations dataset • Probability estimation methods • Estimaition donation amounts • Experimental results • Conclusion • opinion IDSL, Intelligent Database System Lab

  3. Motivation • Misclassification costs are different for different examples, in the same way of probabilities • Problems of data unbalance in real world dataset IDSL, Intelligent Database System Lab

  4. Objective • To make optimal decisions given cost and probabilities • Solution of sample bias based on Nobel prize-winning economist, James Heckman IDSL, Intelligent Database System Lab

  5. Introduction • Most supervised learning algorithms assume all errors(incorrect predictions) are equal—not true • Cost-sensitive learning lead to the lowestexpected cost • Non cost-sensitive learning classified as accurate • To present an alternative method call direct cost-sensitive decision-making IDSL, Intelligent Database System Lab

  6. MetaCost vs. direct cost-sensitive decision-making • MetaCost • Each example x is associated with a cost C(i,j,x) of predicting class i for x when the true class of x is j • The optimal decision concerning x is the class i that leads to the lowest expected cost IDSL, Intelligent Database System Lab

  7. MetaCost vs. direct cost-sensitive decision-making( cont.) • Direct cost-sensitive decsion-making has the same central idea but two difference • MetaCost is based on the assumption that costs are known in advance and are the same for all examples • do not estimate probabilities using bagging, using simpler method based on single decison tree IDSL, Intelligent Database System Lab

  8. A testbed:the KDD’98 charitable donations dataset • Training set consists of 95412 records with known classes;test set consists of 96367 records without known classes • The overall percentage of donors among population is about 5% • The donation amount for persons who respond varies from $1 to $200 IDSL, Intelligent Database System Lab

  9. A testbed:the KDD’98 charitable donations dataset( cont.) • In donation domain it is easier to talk consistently about benefit than than cost • The optimal predicted label for example x is the class i that maximizes(j=1 mean the person does donate;j=0 not donate) IDSL, Intelligent Database System Lab

  10. A testbed:the KDD’98 charitable donations dataset( cont.) • The optimal policy IDSL, Intelligent Database System Lab

  11. Probability estimation methods • Deficiencies of decison tree methods • Smoothing • Curtailment • Calibrating naive Bayes classifier scores • Averaging probability estimates IDSL, Intelligent Database System Lab

  12. Deficiencies of decison tree methods • Standard decision tree methods assign by default the raw training frequency p=k/n • These are not accurate conditional probability estimate for at least two reasons • High bias • High variance • Pruning methods can alleviate it but it is not suitable for unbalanced datasets IDSL, Intelligent Database System Lab

  13. Deficiencies of decison tree methods( cont.) • The solution use C4.5 without pruning and without collapsing to obtain raw scores that can be transformed into accurate class membership probabilities IDSL, Intelligent Database System Lab

  14. Smoothing • Using the Laplace correction method • For a two-class problem, it replaces the conditional probability estimate p=k/n by p’=(k+1)/(n+2) that adjusts probabilities estimates to be closer to ½ • With donation it replace the probability p=k/n by p’=(k+bm)/(n+m),where b is the base rate of the positive class and m is a parameter IDSL, Intelligent Database System Lab

  15. Smoothing( cont.) • For example, a leaf contains four examples, one of which is positive, the raw C4.5 score of this leaf is 0.25. • The smoothed score with m=200 and b=0.05 is IDSL, Intelligent Database System Lab

  16. Smoothing( cont.) IDSL, Intelligent Database System Lab

  17. Curtailment • To overcome the problem of overfit • Curtailment is not equivalent to any type of pruning IDSL, Intelligent Database System Lab

  18. Curtailment( cont.) IDSL, Intelligent Database System Lab

  19. Curtailment( cont.) IDSL, Intelligent Database System Lab

  20. Calibrating naive Bayes classifier scores • Using a histogram method to obtain calibrated probabilityestimates from a naive Bayesian classifier • Sort the training examples acording to their scores and divide the sorted set into b equal size bins • Given a test example x, place it in a bin according to its score n(x) and then estimate the corrected probability IDSL, Intelligent Database System Lab

  21. Averaging probability estimates • Combining the probability estimates given by different classifiers throught averaging can reduce the variance of the probability estimates[ Tumer and Ghosh,1995] • Where is the variance of each original clasifier,N is the number of classifiers and is the correlatin factor among all classifiers IDSL, Intelligent Database System Lab

  22. Estimaition donation amounts • For non-donors in the training set it should impute a donation amount of zero since their actual donation amount is zero as analogous to donation probability • It is also wrong to using the same donation estimate for all test examples means that the decision about donate is based on the probability IDSL, Intelligent Database System Lab

  23. Estimaition donation amounts( cont.) • These costs or benefits must be estimated for each example • Using least-squares multiple linear regression(MLB) to estimate donaition • Lastgift:dollar amount of most recent gift • Ampergift:average gift amount in responses to the last 22 promotions IDSL, Intelligent Database System Lab

  24. Estimaition donation amounts( cont.) • The problem of sample selection bias • Donation amounts estimated by the regression equation tend to be too low for test examples that have a low probability of donation IDSL, Intelligent Database System Lab

  25. Estimaition donation amounts( cont.) • Heckman correction • To learn a probit linear model to estimate conditional probabilities P(j=1|x) • To estimate y(x) by llinear regression using only the training examples x for which j(x)=1,but including value of P(j=1|x) • Second step of Heckman’s procedure in this paper is obtain by decision tree or a navie Bayes classifier IDSL, Intelligent Database System Lab

  26. Experimental results IDSL, Intelligent Database System Lab

  27. Conclusion • The method of cost-sensitive learning that performs systematically better than MetaCost in experiments • To provide a solution to the fundamental problem of costs being different for different examples • To identify and solve the problem of sample selection bias IDSL, Intelligent Database System Lab

  28. Opinion • Frequency is not the only metric • Positive and negative classes are not 1 and 0 question IDSL, Intelligent Database System Lab

More Related