1 / 15

Data Mining

Data Mining. SAS Enterprise Miner. User : sasdemo1 , sasdemo2, … , sasdemo24 Password : aboi0rajee Server: asas2 Data: use your sgh login as Project name. Process diagram flow.

kuri
Télécharger la présentation

Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining

  2. SAS Enterprise Miner • User: sasdemo1, sasdemo2, … , sasdemo24 • Password: aboi0rajee • Server: asas2 • Data: • useyoursgh login as Project name

  3. Process diagram flow • Businesthe data mining process is driven by a process flow diagram that you create by dragging nodes from a toolbar that is organized by SEMMA categories and dropping them onto a diagram workspace.

  4. The SAS EM GraficalUserInterface 5. Toolbar 1. Toolbarshortcutbuttons 2. Project Panel 6. Diagram Workspace 3. Properties Panel 7. Diagram NavigationTollbar 4. Property Help Panel

  5. The SAS EM GraficalUserInterface • Toolbar Shortcut Buttons • to perform common computer functions and frequently used SAS Enterprise Miner operations. Move the mouse pointer over any shortcut button to see the text name. Click on a shortcut button to use it. • Project Panel • to manage and view data sources, diagrams, results, and project users. • Properties Panel • to view and edit the settings of data sources, diagrams, nodes, and users. • Property Help Panel • The Property Help Panel displays a short description of any property that you select in the Properties Panel. Extended help can be found from the Help main menu.

  6. The SAS EM GraficalUserInterface • Toolbar • a graphic set of node icons that you use to build process flow diagrams in the Diagram Workspace. Drag a node icon into the Diagram Workspace to use it. The icon remains in place in the Toolbar, and the node in the Diagram Workspace is ready to be connected and configured for use in the process flow diagram. • Diagram Workspace • to build, edit, run, and save process flow diagrams. In this workspace, you graphically build, order, sequence, and connect the nodes that you use to mine your data and generate reports. • Diagram Navigation Toolbar • to organize and navigate the process flow diagram. http://support.sas.com/documentation/onlinedoc/miner/

  7. ROC Curves Styczna do krzywej ROC F(S|Y=1) = Sensivity (F(S|Y=0); F(S|Y=1)) F(S|Y=0) = 1- Specifity Stanowi wizualizację „separacji” rozkładów warunkowych zmiennej:  Można potraktować pole pod krzywą ROC jako miarę zależności stochastycznej

  8. Classificationerror

  9. ROC Curve

  10. ClassificationErrors

  11. ClassificationErrors

  12. SAS/BASE & SAS/STAT

  13. PROC step libname data „path"; libname data "C:\Users\Andrzej\Desktop"; proc logistic data=data.German_creditdesc; model default=duration credit_amtinstalment age /outroc=roc; run; proc gplotdata=roc; Title "ROC Curve"; symbol i=join; plot _sensit_ * _1mspec_; run;

  14. DimensionReduction – PROC VARCLUS The VARCLUS procedure divides a set of numeric variables into disjoint or hierarchical clusters. Associated with each cluster is a linear combination of the variables in the cluster. This linear combination can be either the first principal component (the default) or the centroid component (if you specify the CENTROID option). proc varclusdata=data.German_creditouttree=Tree maxclusters=10 noprint; var duration credit_amtinstalment age; run; proc treedata=tree; proc treedata=treelineprinter; axis1 order=(0 to 1 by 0.2); proc tree data=Tree horizontal haxis=axis1; height _PROPOR_; run;

  15. _NAME_ the name of the cluster • _PARENT_ the parent of the cluster • _NCL_ the number of clusters • _VAREXP_ the amount of variance explained by the cluster • _PROPOR_ the proportion of variance explained by the clusters at the current level of the tree diagram • _MINPRO_ the minimum proportion of variance explained by a cluster • _MAXEIGEN_ the maximum second eigenvalue of a cluster

More Related