1 / 7

Defining Dummy Variables

Defining Dummy Variables. Getting ready for Discriminant Analysis. Why dummies?. Not necessary for predictive models, but has some advantages. A subset of a variable (a certain range of values) may affect dependent differently, but variable used as a continuous one may not be significant.

bruno-lott
Télécharger la présentation

Defining Dummy Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Defining Dummy Variables Getting ready for Discriminant Analysis

  2. Why dummies? • Not necessary for predictive models, but has some advantages. • A subset of a variable (a certain range of values) may affect dependent differently, but variable used as a continuous one may not be significant. • Easier to interpret for business applications. • For credit bureau variables, can handle special cases (no record, inquiries only, missing, etc.) a little better, based on dependent variable characteristics for those categories.

  3. How to define them • Compute ratio of column percentages for each category (Good Column Percent / Bad Column Percent). • Use the pattern of these ratios to determine how many categories (and hence number of dummies) to create. • Must have a neutral category.

  4. Example:Customer Age Dummies 1 2 Neutral 3 4 5 6 7

  5. Some Guidelines • Look for a logical pattern • Eg: Ratios get better with age – does that make sense? Why or why not? • If a higher age category has lower ratio then combine it with the previous (or next) category. • If pattern is contrary to business expectation, investigate data, and/or drop the variable. • If no pattern (variation in ratios) at all, drop the variable – it has no discriminatory power.

  6. Special Cases • What to do with ‘No Record’, ‘Inquiries Only’, etc. while dealing with Credit Bureau variables? • Look at Good/Bad ratio for those categories. • Find category with closest match and make that the Neutral category. • The special cases should also be part of Neutral category for all variables. • Assess their impact only once in the model by defining dummies for the CBTYPE variable.

  7. CBTYPE Variable Key to CBTYPE variable 1 = Record with Trades 2 = Record w/Inqs. and Pub Recs Only 3 = Record w/Inqs. Only 4 = Record w/Pub Recs Only 5 = No Record

More Related