1 / 13

Model Building and Validation

Model Building and Validation. An overview using the discriminant analysis technique. Assumption for this lecture. There are several types of models, but this lecture assumes we are building one with a 2-valued dependent variable.

hang
Télécharger la présentation

Model Building and Validation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Model Building and Validation An overview using the discriminant analysis technique

  2. Assumption for this lecture • There are several types of models, but this lecture assumes we are building one with a 2-valued dependent variable. • e.g. We want to predict who will respond to a mailing – dependent var. has two values – responders/non-responders. • e.g. Predict who is at risk for a heart attack – dependent variable is – had a heart attack/did not have a heart attack

  3. What will it tell us? • The model is built using past data to generate a score to predict the likelihood of something occurring or not. • (What is the probability that this person will respond to the mailing?)

  4. The Modeling Process • Sample Design • Data Collection and Cleaning • Sample selection • Data aggregation • Build Model • Test the Model

  5. Sample Design • What data do you need? • Where is it? • How much is needed? • What is the dependent variable?

  6. Data Collection and Cleaning • Read, validate data • Deal with Missing values • Delete unwanted records and variables.

  7. Selecting a sample • Choose a sample to analyze. • For 0/1 regression (discriminant analysis equivalent) use approximately equal records of each type. • Select twice the number you need to build the model, so you can set aside 50% of the data for validation.

  8. Data Aggregation • Data from multiple sources merged • This may occur as a first step before data cleaning, depending on the situation. • New variables defined • (eg: ratio of satisfactory trades to total trades).

  9. Model Building • Break up each independent variable into classes. Each class should have roughly 2 to 10% of the observations. • Run Crosstabs of each variable with the dependent variable. • Redefine the independent variable as multiple dummy (0/1) variables. • Run regression with the dummies.

  10. Example: Data looks like this

  11. It is transformed to look like this:

  12. Model Building, contd. • Eliminate variables that are not significant, until you have a model with variables that are significant and intuitively meaningful.

  13. Testing the model • Perform Kolmogorov-Smirnov (K-S Test) to test how well the model performs on: • The analysis sample • The validation sample • The total sample • If it separates the 0 and the 1s well in each of the three cases, you have a good model.

More Related