Using SQL Server Data Mining

Using SQL Server Data Mining

The Main Tools • SQL Server 2012 • RDBMS to store data and support DMX • Visual Studio 2010/SSDT • Provides and IDE • The direction MS made to use VS for all its development activity tasks, including DB, certainly include DW and DM

SSDT • It is part of VS2010 • You can either start it from SQL Server 2012 or VS 2010 • To start from SQL Server 2012 • Start  all programs  SQL Server 2012  SQL Server Data Tool • What you will see will be VS2010 IDE • The rest is very much the same as doing cubes

Preparing Data Source • We use a relational DBs as the data source • For example, we restore the MovieClick or CollegePlans databases from its backup or import from MS Access

Specify your Data Sources • Make sure the Solution Explorer is visible • If not, go to View  Solution Explorer or CTL +W,S • Right Click on Data Sources to bring up the wizard • Skip into • Define connection • Provide • server name • Authentication mode • Then select the DB • If you cannot see the DB you are expecting, the first two steps are incorrect • If the connections are already defined, select one • Note: This is to your Relational DB

Impersonation Information • Impersonate Current User • Most security • Causing problems when delegation is needed • Impersonate Account*** • Make everyone’s life easier – my first choice, most of the cases • Impersonate Service Account • Not recommended • Inherit • May be, never tried

The IDE

DSV

DSV (2) • This is where the modeling begin • Defines how you want to see the data at the data source • Here we define case table, nested case tables, and other lookup tables (also called dimension tables)

View Data • Table • Pivot Table • Chart • Pivot Chart

Creating Structure • Right click on Mining structure • Create a structure with a mode or not

Successfully deployed • To deploy Build  Deploy MoviceClick

Decision Tree – try one

Decision Tree – 2nd Try

Tree Dependency Network

Mining Model Viewer • Mining Structure • Manages columns • Mining Models • Add new models or set model parameters • Filter the rows • Model view • Second level tabs are model dependent

Lift Chart – one bedroom

Lift Chart – a better picture

More on lift Chart • Lift Chart • One line for each model • A random line • Ideal line • Lift is a measure of the effectiveness of a predictive model • You can also perform profit calculation

Profit Chart

Classification Matrix

Other Charts – Scatter Plot • Scatter Plot • A scatter plot is generated instead of a lift chart whenever the predictable attributes has continuous values. • charts the accuracy of a model that predicts a continuous attribute, comparing the actual values versus the predicted values for each case.

Cross-Validation • It uses the training data • Divide in to n folders • Each folder uses data in other n -1 folders for training and uses the current folder data for test • You can check • How good the results of predicting are • If the results are uniform across the folders

Cross-Validation

Mining Model Prediction • Allows you to create and run DMX queries • You can use the GUI (Design View) to create queries • You can use the Query Editor to modify queries • View the result on the result view

Query Example [MCNT-Tree].[Age] = t.[Age] AND [MCNT-Tree].[Education Level] = t.[Education Level] AND [MCNT-Tree].[Gender] = t.[Gender] AND [MCNT-Tree].[Home Ownership] = t.[Home Ownership] AND [MCNT-Tree].[Internet Connection] = t.[Internet Connection] AND [MCNT-Tree].[Marital Status] = t.[Marital Status] AND [MCNT-Tree].[Movie Selector] = t.[Movie Selector] AND [MCNT-Tree].[Num Bathrooms] = t.[Num Bathrooms] AND [MCNT-Tree].[Num Cars] = t.[Num Cars] AND [MCNT-Tree].[Num Children] = t.[Num Children] AND [MCNT-Tree].[Num T Vs] = t.[Num TVs] AND [MCNT-Tree].[PPV Freq] = t.[PPV Freq] AND [MCNT-Tree].[Prerec Buying Freq] = t.[Prerec Buying Freq] AND [MCNT-Tree].[Prerec Format] = t.[Prerec Format] AND [MCNT-Tree].[Prerec Renting Freq] = t.[Prerec Renting Freq] AND [MCNT-Tree].[Prerec Viewing Freq] = t.[Prerec Viewing Freq] AND [MCNT-Tree].[Theater Freq] = t.[Theater Freq] AND [MCNT-Tree].[TV Movie Freq] = t.[TV Movie Freq] AND [MCNT-Tree].[TV Signal] = t.[TV Signal] Order by (PredictProbability([MCNT-Tree].[Bedrooms], 'One')) SELECT t.[CustomerID], (PredictProbability([MCNT-Tree].[Bedrooms], 'One')) as [One], (PredictProbability([MCNT-Tree].[Bedrooms], 'Two or three')) as [two] From [MCNT-Tree] PREDICTION JOIN OPENQUERY([MCNT], 'SELECT [CustomerID], [Num Bedrooms], [Age], [Education Level], [Gender], [Home Ownership], [Internet Connection], [Marital Status], [Movie Selector], [Num Bathrooms], [Num Cars], [Num Children], [Num TVs], [PPV Freq], [Prerec Buying Freq], [Prerec Format], [Prerec Renting Freq], [Prerec Viewing Freq], [Theater Freq], [TV Movie Freq], [TV Signal] FROM [dbo].[Customers] ') AS t

Query Results

Using SQL Server Data Mining

Using SQL Server Data Mining

Presentation Transcript

Capabilities Apollo and SQL Server Data Mining

SQL Server Data Quality Services

DAT377 Data Mining In SQL Server 2000 And SQL Server 2005 (Code Named “Yukon”)

SQL Server Data Quality Services

Data Mining in SQL Server 2008

SQL Server Data Mining

SQL Server Data Types Choosing and Using

DAT205 Advanced Data Mining Using SQL Server 2000

Data Mining in SQL Server 2000 and Yukon

SQL SERVER

Introduction to SQL Server Data Mining

Extending SQL Server Data Mining

Data Partitioning – SQL Server Satya

SQL Server 2000 Data Types

SQL Server Change Data Capture