1 / 63

SAS Enterprise Miner Release 4.3

SAS Enterprise Miner Release 4.3. A brief overview: analysis of the Donor Recapture Case (Case 3). Kevin Garsek … Class of 2006. Importing Base Data. SAS’s main drawback is the fact that if any line of data has a null or blank value it will totally disregard the full record

Télécharger la présentation

SAS Enterprise Miner Release 4.3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAS Enterprise MinerRelease 4.3 A brief overview: analysis of the Donor Recapture Case (Case 3) Kevin Garsek … Class of 2006

  2. Importing Base Data • SAS’s main drawback is the fact that if any line of data has a null or blank value it will totally disregard the full record • In this case, if we were unable to manipulate the data, the available records would decrease dramatically • We can fight back by recoding the data as will be shown in the import step

  3. Base SAS Interface Screen

  4. Importing Charity Data Text Editor

  5. Text Editor We will use the text editor in Base SAS to import the Charity Case data. In order to use this editor, you simply type as you would in any text editor.

  6. Text Editor A line by line example of the code that we will use is as follows: libname charity 'C:\Documents and Settings\Kevin\Desktop\Datamining\charity.1'; denotes the master folder where the raw data is housed your local PC data charity.raw; tells SAS to create a new dataset named charity raw infile 'chr\2.dat' missover firstobs=2; lets SAS know the individual subfolder in which the data is housed and tells it to import it into the new dataset input OSOURCE $; names the data column OSOURCE and the $ tells SAS that this is character based data (if this was left out, SAS assumes that the data is numerical in format) OSOURCE_D = 0; due to prevalent missing data, this creates a new dummy variable termed OSOURCE_D and makes the value 0 for every record if trim(OSOURCE) = "“ the trim statement deletes any erroneous spaces and the if sets up the opening of an if then statement to compensate for blank data then do; OSOURCE = "0"; this sets all missing values in the OSOURCE column to 0 OSOURCE_D = 1; this sets the newly created dummy variable to 1 when OSOURCE was blank in the input file end; this ends this statement as all code from infile to end can be written on a single line in the text editor

  7. Importing Charity Data The below depicts the completed code. The actual code can be easily written In Excel using a & statement and then pasted into the text editor. Moving the writing process to Excel will save considerable time during this laborious process.

  8. Importing Charity Data Once the code is completed, you will need to right hand click in the text editor and select “submit all”. This will tell SAS to read through the code in the text editor and execute. Be prepared, due to the large size of the data, this will take considerable time to complete.

  9. Starting Enterprise Miner from Base SAS module You should now have a fully working dataset and you are now ready to open Enterprise Miner by following the subsequent slides.

  10. Starting Enterprise Miner from Base SAS module

  11. Starting Enterprise Miner from Base SAS module

  12. Binding Data to Program • This is an exasperating activity • Even for someone who took a SAS training course in Enterprise Miner • The documentation is pathetic • I’ll document each step carefully in case this ever happens to you

  13. Name Project Charity and Drag Input Data Node to Workspace

  14. Bind Data to Project Right click on tools to get this menu.

  15. Bind Data to Project Left click on initialization, left click top edit.

  16. Bind Data to Project Right click select; browse for library RDATA; click ok

  17. Bind Data to Project Gotcha: Must select RAW and hit enter even though only data set in RDATA

  18. Change to Larger Sample Left click change; changed to 10,000 to give low response items representation

  19. Success!

  20. Click Variables Tab Notice that some variables rejected including some, this is typically due to the fact that that column has only one value throughout e.g. a dummy variable that is 0 due to no variation in the input data.

  21. Then Bad Things Happen • Who knows why. • If I hadn’t taken the course the slides would stop here. • That’s the only reason I know what to do • I’ll document this also, in case it happens to you.

  22. Crash Recovery Right click on top level icon; select explore

  23. Crash Recovery Open emproj; delete all files with extension .lck; open user subfolder; delete everything in user subfolder

  24. Analysis Resumes • We’ll have a look at MAILCODE. • Enterprise Miner has some neat graphical tools that are easy to use. • The simplest and easiest are part of the data input tool.

  25. A Histogram Right click item, select “view distribution of MAILCODE” from drop down menu

  26. Histogram of Mailcode SAS has classified as missing data that R accepted and used!

  27. Must Identify TARGET_D as Target Right click row item in column “Model Role”, select “Change Model Role” from drop down menu, select “target” from next drop down menu

  28. Histogram of Target This is what makes the problem hard: extremely low response rate!

  29. Save changes!

  30. Add Data Partition Node Drag down from tool bar above and connect line by dragging the mouse.

  31. This is What it Does We will choose to use an 80%/20% training/validation allocation. Close box, right click, click “Run” on drop down menu.

  32. Design Philosophy Click lower tools tab. Note tools on left. One drags a tool to worksheet and connects with arrows. We’ll now drag and connect regression.

  33. Regression Chose stepwise selection, validation error. That mimics what we did in R.

  34. Regression Right hand click on the Regression node and select run

  35. Regression Regression is highlighted in green while running

  36. Regression Lets take a look at the results; SAS has a very different interpretation of important variables that the R analysis

  37. Regression The error rate is not that bad, but the significant variables are not necessarily easily interpretable.

  38. Regression Lets try it again with a few changes to the model selection

  39. Regression Again, we get results, but nothing easily interpretable.

  40. Regression Lets limit the regression to those variables determined by R to be significant. To do this, we will again right hand click on regression and select open.

  41. Regression Then go to the variables tab. Right hand click under the status column for each unneeded variable and set the status to “don’t use”.

  42. Regression In addition to limiting our variables to those from the R results we are going to add an interaction as well as a squared variable. The first step is to add the squared term by adding a transform variables node and right hand clicking on the node and selecting open.

  43. Regression From the variables tab, we will right hand click on DOB and select Transform.

  44. Regression We will now select square. This will create a new variable, DOB_L1S6, which will then be used in our next regression.

  45. Regression Our next step is to create an interaction. To do this, go back to the main diagram and double click on regression. This should bring you into the model manager where you will click on the Interaction Builder icon.

  46. Regression On this screen, you should use the Ctrl button to highlight both Lastgift and Pepstrfl. Next, press the Cross button in order to create the new interaction variable. The new variable should be added to the available terms window and should be used in subsequent regressions.

  47. Regression Results! While the initial bar graph may look complex, this is how SAS handles character data and creating dummy variables.

  48. Regression As we now look at the table, or coefficient estimates, we have interpretable results!

  49. Regression For those that are interested, you can look at the Code tab and see the actual SAS coding that one would have to write if you were to program this regression manually.

  50. Regression Lets add another level of analysis and try to rid the data of outliers. To do this, you will need to incorporate a Filter Outlier node between the Transform Variables and Regression nodes.

More Related