1 / 18

Lecture 10

Lecture 10. MARK2039 Summer 2006 George Brown College Wednesday 9-12. Assignment 8: Geocoding example. Example: A retailer has the following information: Name and address of its customers Address of its stores Stats Can Information

rafael-kim
Télécharger la présentation

Lecture 10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 10 MARK2039 Summer 2006 George Brown College Wednesday 9-12

  2. Assignment 8: Geocoding example • Example: • A retailer has the following information: • Name and address of its customers • Address of its stores • Stats Can Information • As a marketer, how would you intelligently use this information • Get Postal codes of customers and stores • Get geocodes(latitude and longitude numbers of each postal code) • Calculate distance between each customer and neares store • Create trading area around store to determine relevant customers for store • Identify best stores and calculate demographics of best stores vs. the remaining stores • Use above learning to either promote non performing stores with similar customer demographic makeup of best stores • Use above info to determine where to open up or perhaps close stores

  3. Assignment 8 • Why do we look at correlation analysis as our first statistical exercise in the data mining process • Allows us to initially use statistics as a prescreen tool in eliminating variables from the data mining exercise

  4. Assignment 8 • Give me an example of a correlation table of 5 variables where two variables aresignificant and three variables are not significant. Provide correlation values that support your results

  5. Recapping from last week • Geocoding • What are key things to think of. • Look at answer from two slides ago.Geo coding gives us numbers to calculate distance between two postal codes • More Material on correlation analysis • How do EDA reports tie into the correlation analysis • They are trend-like reports which demonstrate why a given variable has a strong relationship with the objective function. • How should we present the final results of a model? How is the above derived? From the partial R2 of each variable divided by the total R2 of the equation.

  6. Notion of Lift • What is Lift: the performance of a group relative to the performance of the benchmark • Examples: Untargetted/ Targetted/ Type of Activity Benchmark Challenger Lift Acquisition Campaign Response Rate 1% 2% 200. Retention Campaign Churn Rate 15% 25% 166 Credit Card Loss Rate 5% 8% 160 Product Affinity Rate 10% 30% 300 The targetted group represents those names as determined by a data mining tool such as a predictive model.

  7. Notion of Lift • Examples of cases where lift is below 100 Untargetted/ Targetted/ Type of Activity Benchmark Challenger Lift Acquisition Campaign Response Rate 1% .5% 50 Retention Campaign Churn Rate 15% 10% 66 Credit Card Loss Rate 5% 2% 40 Product Affinity Rate 10% 6% 60

  8. Validating the Model: Example of a Gains Chart • Revenue per order is $60. • Cost of 1 mail piece is $.855 • Benefits of modelling are the foregone promotion costs by promoting fewer names to achieve a given # of orders at a higher response rate. • Listed below are the hard numbers that might comprise a lift curve % of List Validation Cum. Cum. % Cum. Interval Benefits (Ranked by Mail Resp. of all Lift ROI Model Quantity Rate Resp Score) 0 - 10% 20000 3.50% 23.33% 233 145% $22799 10 - 20% 40000 3.00% 40% 200 75% $34200 20 - 30% 60000 2.75% 55% 183 58% $42750 30 - 40% 80000 2.50% 67% 167 23% $45600 40 - 50% 100000 2.25% 75% 150 - 12.2% $42750 . . . 90 - 100% 20,0000 1.50% 100% 100 - 58% $0 How might this be plotted?-in class we saw this as a straight decreasing linear slope if we were plotting interval resp. rate against the deciles. If we plot the Cum % of responders, then the shape would be a parobola type curve with a larger parobola representing a better model. Meanwhile, a steeper slope if we plotted interval response rate against deciles would represent a stronger model.

  9. Validating the Model: Calculating the metrics on the gains charts. • Cum. % of Responders in top 10%: • Total Responders: 200000 X 1.5%: 3000 • # of responders in top 10%:20000X3.5%: 700 • Cum. % in top 10%: 700/3000: 23% • Cum. Lift in top 10%: • Average Response Rate: 1.5% • Cum. Response Rate in top 10%: 3.5% • Cum .Lift: 233

  10. Calculating the metrics on the gainscharts. • Interval ROI in 10%-20% • # of persons mailed: 20000 • # of responders in 10%-20%(40%-23.33%)*3000: 500 • Net revenue: (500*60)-.855*20000: 12900 • Costs: 17100 • ROI:(12900/17100): 75% • Calculating Benefits Column at 30%: • Mailed costs to achieve 1650 responders without modelling: • ((.0275*60000)/.015) * .855= 94050 • Mailed costs with modelling=60000*.855= 51300 • Benefits: 94050-51300= $42750

  11. Cum. # of Names Cum. Response Mailed Rate Interval Resp.Rate Interval Lift Benefits Interval ROI 10000 2.50% 20000 2.25% 30000 2.10% 40000 1.80% . . . . 100000 1% Gains Chart Examples 1 25% 0 -10% -55% $15,000 $25,000 $33,000 $32,000 2.5% 250 2.5% 200 2.5% 1.8% 180 0.9% 90 Assume a mail cost of $1.00 per piece and a revenue per order of $50.00. IntervalResp.Rate 10,000*0.025=250=2.5% 20,000*0. Please fill in the blanks for the first 4 rows.

  12. Lift Curve with Zero Model Effectiveness What does this look like if we plot it on a lift curve A line rather than a parobola if we plot cum % of responders

  13. Gains Chart Examples What is the best model?-Model 1 What is the worst model?-Model 4 What are the Model 3 results telling you. –we have some rank ordering all the way down to 70000 names and then the model flattens out-may need a strategy herefor this bottom segment.

  14. Gains Chart Examples • In each response model case, answer the following questions: • Where would you cutoff be with a budget of $80000 and a cost per piece of $2.00 • 40000 names • Where would you cutoff be if you needed to attain a forecasted order qty of 350. • Between 10000 and 20000 names-model 1 and 2, between 20000 and 30000 for model 3 and between 30000 and 40000 formodel 4 • Where would your optimum cutoff be presuming that budget nor forecasted order model quantities were constraints? 50000-model 1,2, and 60000 for model 3 –it does not matter for model 4

  15. Gains Chart Examples • Calculate the Following: -Interval Names Mailed -Cum. Response Rate • Assuming a cost per name of $1.50 and revenue perresponder of $75, calculate the interval ROI foreach intervaland modelling benefits for each interval?

  16. Tracking of Models • Two models are used in two campaigns. In campaign A, the overall response rate is 3.5% which is above the breakeven response rate of 2%. In campaign B, the overall response rate is 1.2% which is below the breakeven response rate of 2%. Yet, the model in campaign B is more effective. Explain Why? Model is rank ordering names quite well for campaign B(1.2% overall) while the better campaign overall(3.5%) exhibits no rank ordering of response rate between deciles.

  17. CHAID • CHAID” is an acronym for Chi-square Automatic Interaction Detection • Produces decision-tree like report • Branches and Nodes • Non parametric approach • Output of routine is a segment or groupas opposed to a score • Uses Chi-Square statistics to determine statistically significant breaks • Conceptual Interpretation:(Observed-Expected)/Expected

  18. CHAID What criteria determine the end nodes?

More Related