1 / 15

Data Mining in Industry: Putting T heory into Practice

Data Mining in Industry: Putting T heory into Practice. Bhavani Raskutti. Agenda. What do analysts in industry actually do? Who are our customers & colleagues? What resources do we use? Who uses analytics in Australian Industry? Case studies Take-home Points.

fionn
Télécharger la présentation

Data Mining in Industry: Putting T heory into Practice

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining in Industry:Putting Theory into Practice Bhavani Raskutti

  2. Agenda • What do analysts in industry actually do? • Who are our customers & colleagues? • What resources do we use? • Who uses analytics in Australian Industry? • Case studies • Take-home Points

  3. Business understanding of complex trends To make strategic & operational decisions • Data • Acquisition & Preparation • Presentation Data Matrix • Deployment • DAP • Problem • Definition • Mathematical • Modelling • (Algorithms) What do analysts in industry actually do? • Decision-making by users • Insights via GUI • Automation • Training • Documentation • IT Support • Business Problem • PD • MM • P • Initial Development • Iterative • 90% DAP • D

  4. Customers of • Analytics Market Research Behavioural analysis psych/mktg/SocSc graduates • Analytics Business Intelligence Historical Reporting CS/IT graduates • Marketing • Design • Business/ Corporate • Information • Technology • Sales Data Mining Statistical analysis, machine learning Maths/Stats/Science graduates Who are our customers & colleagues? • Supply • Chain • Senior • Management

  5. What resources do we use? Data Extraction • SQL: from databases such as Oracle, DB2, mySQL, … Exploratory/Visualisation • Tableau: Multi-dimensional visual analysis with ability to publish and connectivity to most databases • Qlikview: Very similar to Tableau, later entrant into Australia • Excel: Great for exploration, although businesses use it as the only analysis tool Statistical Modelling • Expensive commercial tools used in financial & telecommunications industry. • SAS: Industry leader with broad statistical service offering, but license is expensive • KXEN: Recent entrant, but innovative with particular focus on large datasets & automation. • Salford systems: Well established leader with focus on regression trees and explainable models. • SPSS, Statistica, Matlab: Niche players appealing to certain communities. • Open source or low priced data mining tools: • Weka is open source software issued under the GNU General Public License. • RapidMiner is available under a dual license: GNU licence or a proprietary license. • R is a free software environment for statistical computing and graphics. Needs compilation. Presentation • Cognos, Business Objects, Tableau, …

  6. Who uses analytics in Australian industry? • Government, Utilities, Pharmaceuticals, Manufacturing, Web service providers • Consulting firms, Data mining vendors

  7. Who uses analytics in Australian industry? • Government, Utilities, Pharmaceuticals, Manufacturing, Web service providers, … • Consulting firms, Data mining vendors, Market research firms, …

  8. Case Study: Wholesale Industry • Simple univariate regression in SQL • - Sales  demand • - Similar products @ similar outlets have similar demand to sales relationship • - Anomaly may be due to lack of stock • - Weekly SOH & sales for each store & SKU • - SKU master • - Store master • - Self-serve report in Cognos for each sales rep • - Presents list of products with opportunities • - Opportunities click through to detailed graphs showing demand, sales & stock position of the two products compared • Perform comparisons & find anomalies with stock issues • Increase wholesale sales into major retailers • DAP • - Quantify demand • - Define normalised sell-rate • - Define a long term in-stock measure • - Define products & outlets that are similar • PD • MM • P • D

  9. Case Study: Wholesale Industry (Cont’d) • R1 • R2 Sell Rate Demand In-stock % Demand

  10. Case Study: Wholesale Industry (Cont’d) • Simple univariate regression in SQL • - Sales  demand • - Similar products @ similar outlets have similar demand to sales relationship • - Anomaly may be due to lack of stock • - Weekly SOH & sales for each store & SKU • - SKU master • - Store master • - Implementation in SQL & Cognos • - DataMartsfor reports updated weekly • - Documentation on intranet wiki • - Training by corporate training team • - Support from IT helpdesk • - Self-serve report in Cognos for each sales rep • - Presents list of products with opportunities • - Opportunities click through to detailed graphs showing demand, sales & stock position of the two products compared • Perform comparisons & find anomalies with stock issues • Increase wholesale sales into major retailers • DAP • - Quantify demand • - Define normalised sell-rate • - Define a long term in-stock measure • - Define products & outlets that are similar • PD • MM • P • D

  11. Agenda • What do analysts in industry actually do? • Who are our customers & colleagues? • What resources do we use? • Who uses analytics in Australian Industry • Case studies • Take-home Points

  12. Case Study: Telecommunications Industry • - Satisfaction survey • - Service assurance • - Demographics • - Quarterly revenue from different products for each customer • - SVMs to score with likelihood of take-up • - Weighting by value of take-up to find high value take-up • - Winning back customers is hard • - Churn is hard to identify and harder to prevent • - Upsell to existing customers increases retention & revenue • - Implementation in Matlab & C • - Different predictive models for over 50 products in 4 segments • - Automatic updates every quarter • - Used by sales consultants to re-negotiate contracts • Excel spread sheet with potential customer list • - Take-up likelihood for all modelled products • - Last quarter revenue for all products • Increase revenue from business customers • Create models to predict customers likely to take up a product soon • Win-back? • Stop churn? • Upsell? • DAP • Imbalanced data – too few examples of take-up for most products • - Data aggregation & Interleaving • Comparable predictors from revenue • - Raw, change from previous, projected • - Use values as is & normalised • - Binarise using 10 equi-size bins • PD Labels i-5 i-4 i-3 i-2 T R A I N • MM i-4 i-3 i-2 i-1 • P i-3 i-2 i-1 i Prediction Predictors i-1 i i+1 i+2 • D

  13. Case Study: Telecommunications Industry (Cont’d) • Evaluation: Piloted predictive modelling in 2 different regions • Region 1: 9 new opportunities from just 5 products with an increase in revenue of ~400K A$ • Region 2: Opportunities identified were already being processed by sales consultants • Conclusion: Predictive modelling better than previous manual process • Identifies more opportunities • Spreads techniques of good sales teams across the whole organisation • Deployed in 2004 & still operational • For more details, refer to “Predicting Product Purchase Patterns for Corporate Customers” by Bhavani Raskutti & Alan Herschtal in Proceedings of KDD’05, Chicago, Illinois, USA

  14. Take-home points • Data acquisition & processing phase forms 80-90% of any analytics project • Business users are tool agnostic • R, SAS, Matlab, SPSS, … for statistical analysis • Tableau, Cognos, Excel, VB, … for presentation • Business adoption of analytics driven by • Utility of application • Ease of decision-making from insights • Ability to explain insights

  15. Questions?

More Related