1 / 21

More value from data using Data Mining

More value from data using Data Mining. Allan Mitchell SQL Server MVP. Who am I. SQL Server MVP SQL Server Consultant Joint author on Wrox Professional SSIS book Worked with SQL Server since version 6.5 www.SQLDTS.com and www.SQLIS.com Partner of SQL Know How. Today’s Schedule.

creda
Télécharger la présentation

More value from data using Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. More value from data using Data Mining Allan Mitchell SQL Server MVP

  2. Who am I • SQL Server MVP • SQL Server Consultant • Joint author on Wrox Professional SSIS book • Worked with SQL Server since version 6.5 • www.SQLDTS.comand www.SQLIS.com • Partner of SQL Know How

  3. Today’s Schedule • what is data mining (Overview) • data mining terminology • myths around data mining • excel AddIn to Office2007 • Demo Setup • Demo Key Influencers • Demo Categories • Demo Make a Prediction • Demo “Other stuff” – if time • Questions and answers

  4. What is Data Mining • The process of using statistical techniques to discover subtle relationships between data items, and the construction of predictive models based on them. The process is not the same as just using an OLAP tool to find exceptional items. Generally, data mining is a very different and more specialist application than OLAP, and uses different tools from different vendors. Normally the users are different, too. OLAP vendors have had little success with their data mining efforts. OLAP REPORT

  5. Explores Your Data Performs Predictions Finds Patterns What does Data Mining Do?

  6. Comparative BenefitsPredictive Projects versus Nonpredictive Projects Source: IDC, 2003

  7. Data Mining terminology • mining structure • mining model • mining algorithm • training dataset • testing dataset

  8. Decision Trees Clustering Time Series Naïve Bayes Sequence Clustering Association Neural Net SQL Server 2005 Algorithms Plus: Linear and Logistic Regression

  9. Sequence Clustering • Applied to • Click stream analysis • Customer segmentation with sequence data • Sequence prediction • Mix of clustering and sequence technologies • Group individuals based on their profiles including sequence data

  10. Time Series • Applied to • Forecast sales • Web hits prediction • Stock value estimation • Patented technique from Microsoft Research • Uses regression tree technology to describe and predict series values

  11. Clustering • Applied to • Segmentation: Customer grouping, Mailing campaign • Also support classification and regression • Expectation Maximization • Probabilistic Clustering • K-Means • Distance based • Clusters both discrete and continuous values • Discrete values are “binarized” • Anomaly detection • Check variable independence • “Predict Only” attributes not used for clustering

  12. ClusteringDiscrete Age Female Male Son Daughter Parent

  13. Age Female Male Son Daughter Parent ClusteringAnomaly Detection

  14. Model Browsing LOB Application Reporting Historical Dataset Data Transform (SSIS) Prediction Mining Models Cube Cube dm data flow New Dataset

  15. the steps to a successful model MS BOL

  16. DMX CREATE MINING MODEL CreditRisk (CustID LONG KEY, Gender TEXT DISCRETE, Income LONG CONTINUOUS, Profession TEXT DISCRETE, Risk TEXT DISCRETE PREDICT) USING Microsoft_Decision_Trees INSERT INTO CreditRisk (CustId, Gender, Income, Profession, Risk) Select CustomerID, Gender, Income, Profession,Risk From Customers Select NewCustomers.CustomerID, CreditRisk.Risk, PredictProbability(CreditRisk) FROM CreditRisk PREDICTION JOIN NewCustomers ONCreditRisk.Gender=NewCustomer.Gender AND CreditRisk.Income=NewCustomer.Income AND CreditRisk.Profession=NewCustomer.Profession

  17. Myths around data mining • You have to be a propeller head • It’s a new concept. • Only works with SSAS cubes

  18. Excel 2007 DMAddin • DM visualisation • table analysis • Create session models/permanent models • Connect to ssas for full blown models • intuitive interface

  19. Demos • setup • key Influencers • categories • Make a prediction • other sexy stuff

  20. Resources • Loads to be honest (DMX, API to name two things) • Big Subject but very sexy

  21. Contact Details allan.mitchell@konesans.com

More Related