1 / 27

Using SQL Server Data Mining

Using SQL Server Data Mining. The Main Tools. SQL Server 2012 RDBMS to store data and support DMX Visual Studio 2010/SSDT Provides and IDE The direction MS made to use VS for all its development activity tasks, including DB, certainly include DW and DM. SSDT. It is part of VS2010

israel
Télécharger la présentation

Using SQL Server Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using SQL Server Data Mining

  2. The Main Tools • SQL Server 2012 • RDBMS to store data and support DMX • Visual Studio 2010/SSDT • Provides and IDE • The direction MS made to use VS for all its development activity tasks, including DB, certainly include DW and DM

  3. SSDT • It is part of VS2010 • You can either start it from SQL Server 2012 or VS 2010 • To start from SQL Server 2012 • Start  all programs  SQL Server 2012  SQL Server Data Tool • What you will see will be VS2010 IDE • The rest is very much the same as doing cubes

  4. Preparing Data Source • We use a relational DBs as the data source • For example, we restore the MovieClick or CollegePlans databases from its backup or import from MS Access

  5. Specify your Data Sources • Make sure the Solution Explorer is visible • If not, go to View  Solution Explorer or CTL +W,S • Right Click on Data Sources to bring up the wizard • Skip into • Define connection • Provide • server name • Authentication mode • Then select the DB • If you cannot see the DB you are expecting, the first two steps are incorrect • If the connections are already defined, select one • Note: This is to your Relational DB

  6. Impersonation Information • Impersonate Current User • Most security • Causing problems when delegation is needed • Impersonate Account*** • Make everyone’s life easier – my first choice, most of the cases • Impersonate Service Account • Not recommended • Inherit • May be, never tried

  7. The IDE

  8. DSV

  9. DSV (2) • This is where the modeling begin • Defines how you want to see the data at the data source • Here we define case table, nested case tables, and other lookup tables (also called dimension tables)

  10. View Data • Table • Pivot Table • Chart • Pivot Chart

  11. Creating Structure • Right click on Mining structure • Create a structure with a mode or not

  12. Successfully deployed • To deploy Build  Deploy MoviceClick

  13. Decision Tree – try one

  14. Decision Tree – 2nd Try

  15. Tree Dependency Network

  16. Mining Model Viewer • Mining Structure • Manages columns • Mining Models • Add new models or set model parameters • Filter the rows • Model view • Second level tabs are model dependent

  17. Lift Chart – one bedroom

  18. Lift Chart – a better picture

  19. More on lift Chart • Lift Chart • One line for each model • A random line • Ideal line • Lift is a measure of the effectiveness of a predictive model • You can also perform profit calculation

  20. Profit Chart

  21. Classification Matrix

  22. Other Charts – Scatter Plot • Scatter Plot • A scatter plot is generated instead of a lift chart whenever the predictable attributes has continuous values. • charts the accuracy of a model that predicts a continuous attribute, comparing the actual values versus the predicted values for each case.

  23. Cross-Validation • It uses the training data • Divide in to n folders • Each folder uses data in other n -1 folders for training and uses the current folder data for test • You can check • How good the results of predicting are • If the results are uniform across the folders

  24. Cross-Validation

  25. Mining Model Prediction • Allows you to create and run DMX queries • You can use the GUI (Design View) to create queries • You can use the Query Editor to modify queries • View the result on the result view

  26. Query Example [MCNT-Tree].[Age] = t.[Age] AND [MCNT-Tree].[Education Level] = t.[Education Level] AND [MCNT-Tree].[Gender] = t.[Gender] AND [MCNT-Tree].[Home Ownership] = t.[Home Ownership] AND [MCNT-Tree].[Internet Connection] = t.[Internet Connection] AND [MCNT-Tree].[Marital Status] = t.[Marital Status] AND [MCNT-Tree].[Movie Selector] = t.[Movie Selector] AND [MCNT-Tree].[Num Bathrooms] = t.[Num Bathrooms] AND [MCNT-Tree].[Num Cars] = t.[Num Cars] AND [MCNT-Tree].[Num Children] = t.[Num Children] AND [MCNT-Tree].[Num T Vs] = t.[Num TVs] AND [MCNT-Tree].[PPV Freq] = t.[PPV Freq] AND [MCNT-Tree].[Prerec Buying Freq] = t.[Prerec Buying Freq] AND [MCNT-Tree].[Prerec Format] = t.[Prerec Format] AND [MCNT-Tree].[Prerec Renting Freq] = t.[Prerec Renting Freq] AND [MCNT-Tree].[Prerec Viewing Freq] = t.[Prerec Viewing Freq] AND [MCNT-Tree].[Theater Freq] = t.[Theater Freq] AND [MCNT-Tree].[TV Movie Freq] = t.[TV Movie Freq] AND [MCNT-Tree].[TV Signal] = t.[TV Signal] Order by (PredictProbability([MCNT-Tree].[Bedrooms], 'One')) SELECT t.[CustomerID], (PredictProbability([MCNT-Tree].[Bedrooms], 'One')) as [One], (PredictProbability([MCNT-Tree].[Bedrooms], 'Two or three')) as [two] From [MCNT-Tree] PREDICTION JOIN OPENQUERY([MCNT], 'SELECT [CustomerID], [Num Bedrooms], [Age], [Education Level], [Gender], [Home Ownership], [Internet Connection], [Marital Status], [Movie Selector], [Num Bathrooms], [Num Cars], [Num Children], [Num TVs], [PPV Freq], [Prerec Buying Freq], [Prerec Format], [Prerec Renting Freq], [Prerec Viewing Freq], [Theater Freq], [TV Movie Freq], [TV Signal] FROM [dbo].[Customers] ') AS t

  27. Query Results

More Related