1 / 21

Big Data and Predictive Analytics

Big Data and Predictive Analytics. Unravel the BIG mystery. “In God we trust, all others must bring data”. Antarip Biswas Sept 26th 2013. Agenda / Table of Contents. Introduction to Big Data. Drivers of Big Data Analytics. Data Sciences.

africa
Télécharger la présentation

Big Data and Predictive Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data and Predictive Analytics Unravel the BIG mystery “In God we trust, all others must bringdata” AntaripBiswas Sept 26th 2013

  2. Agenda / Table of Contents Introduction to Big Data Drivers of Big Data Analytics Data Sciences Use Cases and Success Stories – Class 3 Social Media Analytics Technical Deep Dive, Real Life Projects Real Life Projects – Class 3

  3. Use Cases and Success Stories CONFIDENTIAL & PROPRIETARY

  4. Success Stories - FareCast • Air fare prediction • For an online airfare predicts whether the fare will go UP or DOWN or STAY SAME in the future • Acquired for $100M by Microsoft • Employed machine learning technologies over big data

  5. Tesco Loyalty Program Done by Dunnhumby Data • Data for Loyalty Program • Basic demographic information such as address, age, gender, the number of members in a household and their ages, dietary habits. • Purchase history appended • Summary attributes Cluster analysis Crucible • a massive database of not only applicant information and purchase history, but also information purchased and collected elsewhere about participating consumers. Credit reports, loan applications, magazine subscription lists, Office for National Statistics, and the Land Registry are all sources of additional information that is stored in Crucible.

  6. Tesco Loyalty Program - Benefits 1. Loyalty 2. Cross-sells 3. Inventory, distribution and store network planning 4. Optimal targeting and use of manufacturer promotions 5. Consumer insight generation and marketing those insights Tesco has achieved a 3.6 factor increase in coupon redemption rates by using big-data predictive analytics to predict which consumers are more likely to redeem which coupons !

  7. Big Data – Success Story CONFIDENTIAL & PROPRIETARY

  8. Netflix Recommendations Existing recommendation system – Cinematch Korbell Team winner • 107 algorithms explored • Machine learning and Data mining • Employed SVD and RBM Achieved 8.43% improvement in recommendations over existing system

  9. Google Flu Spread Prediction Prediction of the spread of flu in real time during H1N1 2009 • Google tested a mammoth of 450 million different mathematical models to test the search terms, comparing their predictions against the actual flu cases; 45 important parameters were founds • Model was tested when H1N1 crisis struck in 2009 and gave more meaningful and valuable real time information than any public health official system

  10. Prediction – High Frequency Trading Objective: predict impact of earnings announcement on stock prices • Use historical financial data to get a time series of quarterly expected and actual earnings announcements • Use historical financial data of stock price movements after the announcement Approach • Categorize stocks based on market capital so that similar sized companies are grouped together • Split the historical data into in sample (training) set and out sample (validation) set • Fit a linear regression model on sample data where the independent variable (feature) is the difference between the actual and estimated earnings, the dependent variable is the impact on stock price Achieved return of 1% or 100 “basis points”

  11. Predictive Analytics for Couponing Run the same campaign on both lists Test Group List of households from Analytic engine Control Group List of households getting the same offer Evaluate impact – Control Group vs. Test Group Measure results Redemption (primary), Clips (secondary) Verify efficacy of household recommendation demonstrating significant variance from Control Group

  12. Improve Recommendations/Allocations Customer deviation in buying behavior refined by customer profile changes • Taxonomy based approach to identify business semantic • Major events that determine change in buying pattern: Location change, change in marital status, change in income group, birth of child, … • Source for this information social channels, purchase deviation, … • Identify specific product categories relevant for the major event • Association of product categories to various customer classification • For instance customers with kids buy candies; or customers with pets buy pet-food Time Series Customer Transactions Association & Clustering Customer 360 Exploratory techniques Cluster assignments Products eligible for recommendation Refine classifiers Time specific product and associated prods Product classification and Customer segment association Customer groups based on classifications Products List For target customer’s cluster Campaign results Matching / Filtering Probabilistic product affinities based on segment’s behavior Personalized Recommendation List Target Recommendation

  13. Improve Recommendations/Allocations Products bought by similar customers, but not by current customer • Identification of similar customers more accurately with availability of extensive profile information • Classification of customers by predetermined attributes • Usage of exploratory techniques to identify clusters of similar customers • Identify product propensity for specific segments • Determined by clustering and classification techniques Customer Transactions Association & Clustering Customer 360 - NoSQL Exploratory techniques Cluster assignments Products eligible for recommendation Refine classifiers Segment specific Product lists Customer groups based on classifications Products List For target customer’s cluster Campaign results Matching / Filtering Probabilistic product affinities based on segment’s behavior Personalized Recommendation List Target Recommendation

  14. Improve Recommendations/Allocations Determine correlated items not bought by current customer • Link association to determine products that are bought together – bread and butter, wine and cheese, … • Identify products bought by customer, but not the correlated item • Recommendation based on absence of product Association rules Customer Transactions Association & Clustering Customer 360 - NoSQL Exploratory techniques Cluster assignments Products eligible for recommendation Refine classifiers Segment specific product and associated prods Customer groups based on classifications Products List For target customer’s cluster Campaign results Matching / Filtering Probabilistic product affinities based on segment’s behavior Personalized Recommendation List Target Recommendation

  15. Sample technique Identify what customers want – and when Cross-tabulated data • Salary, • Zipcode, • No of kids, • House owner • Gender • Brand1, Brand2,… Brandn • Weight, Size, Volume, • Brand • Category1, Categgory2, .. • Offer clipped category1, … Transaction details merged with customer data to provide contextual information as required for inference Transaction details for filtered customer list : Buyers of Cat food/ Cat food Generic 4 oz Affinity models Models generated using historical data by the analytic engine to identify affinity of specific variables Associated Variables: Single or multiple variables by different segments using multi-model approach Prediction models Application of variable affinity to customer list to identify probability of non-purchasers to purchase cat food / cat food Generic 4 oz Customer list by probability Correlation Regression Scattergrams

  16. Contextualize information, correlate facts, predict and improve Information from social channels that provide supporting information to create detailed customer profile Information from multiple operational and data warehousing systems that contain customer data, purchase details, … Rule sets from knowledgebase accumulated over the years Carpet cleaners Affinity Variety1 Brand 1 Cat grooming tools Cat owners Pet owners Variety2 Cat food Pet foods Variety1 Litter box Brand 1 need Litter Variety2 Advanced Analytics - Product association Filter Customer list, probability Buyer of Cat Food / Generic Cat food 4 ounce Transaction details for this customer list Filtered high vol. categories Associated products by affinity + confidence Inferred rules

  17. Obama for America Campaign 2012 Canvassing from older generation Canvassing from youth

  18. Obama for America Campaign 2012 • Obama for America data science team used social media as a tool to efficiently recruit human resources it needed leading into the election’s home stretch • Primary objective - determine who were the best messengers, who they might be able to persuade, and what actions they might be willing to take • Reason to harness social media - • Youth majority unreachable on phone calls or neighborhood canvassing, but always connected to some form of social media • Optimize resources by enabling to transform voter intelligence to actionable intelligence.

  19. Traffic Congestion Control • Big Data Analytics used for traffic congestion control • Enables travellers to plan their routes to their destinations • Enables traffic controllers to effectively route cars in order to avoid as much congestion as possible • Implemented in LA by a joint initiative of Xerox and the LA transport department

  20. DNA Sequencing and Cancer Therapies • Previously small portions of people’s genes sequenced • Big Data technology enables entire DNA to be sequenced which is largely helpful for cancer patients • Enabled selecting therapies based on genetic markers and person-specific genetic makeup • If one treatment became ineffective due to cancer mutation, use different therapies based on other gene markers. • Steve Jobs one of the first people in the world to have entire DNA sequenced

  21. Thank You CONFIDENTIAL & PROPRIETARY

More Related