Using SQL Server Data Mining
280 likes | 498 Vues
Using SQL Server Data Mining. The Main Tools. SQL Server 2012 RDBMS to store data and support DMX Visual Studio 2010/SSDT Provides and IDE The direction MS made to use VS for all its development activity tasks, including DB, certainly include DW and DM. SSDT. It is part of VS2010
Using SQL Server Data Mining
E N D
Presentation Transcript
The Main Tools • SQL Server 2012 • RDBMS to store data and support DMX • Visual Studio 2010/SSDT • Provides and IDE • The direction MS made to use VS for all its development activity tasks, including DB, certainly include DW and DM
SSDT • It is part of VS2010 • You can either start it from SQL Server 2012 or VS 2010 • To start from SQL Server 2012 • Start all programs SQL Server 2012 SQL Server Data Tool • What you will see will be VS2010 IDE • The rest is very much the same as doing cubes
Preparing Data Source • We use a relational DBs as the data source • For example, we restore the MovieClick or CollegePlans databases from its backup or import from MS Access
Specify your Data Sources • Make sure the Solution Explorer is visible • If not, go to View Solution Explorer or CTL +W,S • Right Click on Data Sources to bring up the wizard • Skip into • Define connection • Provide • server name • Authentication mode • Then select the DB • If you cannot see the DB you are expecting, the first two steps are incorrect • If the connections are already defined, select one • Note: This is to your Relational DB
Impersonation Information • Impersonate Current User • Most security • Causing problems when delegation is needed • Impersonate Account*** • Make everyone’s life easier – my first choice, most of the cases • Impersonate Service Account • Not recommended • Inherit • May be, never tried
DSV (2) • This is where the modeling begin • Defines how you want to see the data at the data source • Here we define case table, nested case tables, and other lookup tables (also called dimension tables)
View Data • Table • Pivot Table • Chart • Pivot Chart
Creating Structure • Right click on Mining structure • Create a structure with a mode or not
Successfully deployed • To deploy Build Deploy MoviceClick
Mining Model Viewer • Mining Structure • Manages columns • Mining Models • Add new models or set model parameters • Filter the rows • Model view • Second level tabs are model dependent
More on lift Chart • Lift Chart • One line for each model • A random line • Ideal line • Lift is a measure of the effectiveness of a predictive model • You can also perform profit calculation
Other Charts – Scatter Plot • Scatter Plot • A scatter plot is generated instead of a lift chart whenever the predictable attributes has continuous values. • charts the accuracy of a model that predicts a continuous attribute, comparing the actual values versus the predicted values for each case.
Cross-Validation • It uses the training data • Divide in to n folders • Each folder uses data in other n -1 folders for training and uses the current folder data for test • You can check • How good the results of predicting are • If the results are uniform across the folders
Mining Model Prediction • Allows you to create and run DMX queries • You can use the GUI (Design View) to create queries • You can use the Query Editor to modify queries • View the result on the result view
Query Example [MCNT-Tree].[Age] = t.[Age] AND [MCNT-Tree].[Education Level] = t.[Education Level] AND [MCNT-Tree].[Gender] = t.[Gender] AND [MCNT-Tree].[Home Ownership] = t.[Home Ownership] AND [MCNT-Tree].[Internet Connection] = t.[Internet Connection] AND [MCNT-Tree].[Marital Status] = t.[Marital Status] AND [MCNT-Tree].[Movie Selector] = t.[Movie Selector] AND [MCNT-Tree].[Num Bathrooms] = t.[Num Bathrooms] AND [MCNT-Tree].[Num Cars] = t.[Num Cars] AND [MCNT-Tree].[Num Children] = t.[Num Children] AND [MCNT-Tree].[Num T Vs] = t.[Num TVs] AND [MCNT-Tree].[PPV Freq] = t.[PPV Freq] AND [MCNT-Tree].[Prerec Buying Freq] = t.[Prerec Buying Freq] AND [MCNT-Tree].[Prerec Format] = t.[Prerec Format] AND [MCNT-Tree].[Prerec Renting Freq] = t.[Prerec Renting Freq] AND [MCNT-Tree].[Prerec Viewing Freq] = t.[Prerec Viewing Freq] AND [MCNT-Tree].[Theater Freq] = t.[Theater Freq] AND [MCNT-Tree].[TV Movie Freq] = t.[TV Movie Freq] AND [MCNT-Tree].[TV Signal] = t.[TV Signal] Order by (PredictProbability([MCNT-Tree].[Bedrooms], 'One')) SELECT t.[CustomerID], (PredictProbability([MCNT-Tree].[Bedrooms], 'One')) as [One], (PredictProbability([MCNT-Tree].[Bedrooms], 'Two or three')) as [two] From [MCNT-Tree] PREDICTION JOIN OPENQUERY([MCNT], 'SELECT [CustomerID], [Num Bedrooms], [Age], [Education Level], [Gender], [Home Ownership], [Internet Connection], [Marital Status], [Movie Selector], [Num Bathrooms], [Num Cars], [Num Children], [Num TVs], [PPV Freq], [Prerec Buying Freq], [Prerec Format], [Prerec Renting Freq], [Prerec Viewing Freq], [Theater Freq], [TV Movie Freq], [TV Signal] FROM [dbo].[Customers] ') AS t