1 / 73

Understanding and presenting your findings .

Understanding and presenting your findings . Present the basics to tell a story. Do not present advanced statistics and confusion. Today is about creating business intelligence.

gina
Télécharger la présentation

Understanding and presenting your findings .

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding and presenting your findings.

  2. Present the basics to tell a story. Do not present advanced statistics and confusion. Today is about creating business intelligence.

  3. Solve problems – you want a raise a promotion more funding for your group only one way long term add value. Create Business Intelligence.

  4. The Presentation • The presentation is a very important part. Often it can be the most important part of a project. • A good presentation should support the findings not just mention the findings. • The supporting statistics, and graphs within the presentation can help people understand or confuse people. • Management will often rely on the presentation to understand the findings from data mining. • Management needs to trust the findings, if the findings are presented poorly, it is difficult to trust the findings. • A poor presentation can even cause projects to fail. Management will not implement what they do not trust nor understand. • Unfortunately, many statisticians and computer scientists are lacking in this critical area. • They tend to merely look at the results and the numbers in the computer output. • This makes many data analysis projects not as successful as they should be. • The poor presentation, explanation often leaves management unclear on how to understand and proceed with the findings from the project.

  5. GLM Example

  6. T-Log Data: How can we use this information to understand about the different configuration? • Comparing different types of checkout counter styles and cash registers using transaction log data (T-log) in terms of speed. • Partial T-Log Data: • Configuration of checkout counter. • There are 4 types. 2 Different shapes and 2 different cash register types. • nitems=number of items purchased during transaction • tender=0 if cash is used,1 if credit is used • massist=1 if manager assist 0 otherwise • timer1=time first item is scanned • timer2=time last item is scanned • timer3=time transaction completed A snapshoot of the data.

  7. T-Log Data: It is necessary to create new variables for modeling, cannot use the data as is. • We would want to do a general linear model to investigate speed in terms of configuration for the different shapes and register types. • An estimate for the time of a transaction could be a new variable equal to timer3-timer1. • What about configuration. Really we would desire to variables, one variable for the shape of the counter and another variable for the register type. A snapshoot of the data.

  8. The General Linear Model (GLM). Do Not Show in presentation! Note: This could be done better, but that is for another day.

  9. From This we can see the approx 0.9 second difference per item by caused the two shapes

  10. No difference for cash, all have a beta of about 10 seconds

  11. We can see about a 5 second difference for the two register types for credit.

  12. From This we can see the approx 0.9 second difference per item by caused the two shapes No difference for cash, all have a beta of about 10 seconds We can see about a 5 second difference for the two register types for credit. This could be done better, but that is for another day.

  13. Now how to present the results

  14. Which Cashier Register and Counter Design Are Best? Comparing different types of checkout counter styles and cash registers using transaction log data (T-log)

  15. Main Objective • To Understand The Differences Among The Checkout Counters • There Are 4 different configurations • Two different shapes of counters • Two different types of cash registers

  16. First the High Level Findings – depends on your style, I like at end seen both ways • Checkout counter shape and cash register type both have an impact on speed/time of transaction. • These findings were statistically significant. • Using various statistical techniques, we found that configuration types 1 and 2 were best. • Although configuration type 2 on average was faster than type 1, we could not statistically prove that type 2 was faster than type 1 in general. • There was a large difference in average time between Type 2 and the other types for manager assists but, we could not substantiate whether it was not just random chance. • We found that Type 2 was faster than all other types including Type 1 when credit was used. Remember File Layout - very important!!! There are 4 types: 2 different shapes and 2 different cash register types.

  17. The Next Few Slides Will Highlight the Differences Among The Checkout Configurations

  18. The Total Transaction Time: Final Time Minus the Time First Item Was Scanned On Average Configuration 2 fastest and is 3 Seconds Faster per transaction than Configuration 1.

  19. Understanding Manager Assistance For The Cashier and the Configurations A huge time difference when a manager has to assist a cashier due to issues with the cash register and a typical unassisted transaction.

  20. Understanding Manager Assists Configuration 1 has the lowest percent of manager assists but after considering 5,000 transactions for each configurations, it is possible that the difference is between the configurations is mere random chance.

  21. Time is a Function of the Number of Items Purchased As expected, a positive linear relationship.

  22. There is Definitely A Difference Resulting From Shape and Cash Register Type The counter shape helps about 0.9 seconds per item. Time Per Item: Scan and Bag Shape 1 Shape 2

  23. Another Way of Looking At The Time Per Item: Not nearly as nice as a graph, opinion. The counter shape helps about 0.9 seconds per item. Shape 1 Shape 2

  24. There is Definitely A Difference Resulting From Shape and Cash Register Type Register Type 1: Configuration 1,3 Register Type 2: Configuration 2,4 Type 2 is better by approximately 5 seconds for when credit cards are used. Cash no difference. Average Time To Make Payment

  25. Looking At Total Transaction Time Without Manager Assists Configurations 1 and 2 are best when looking at cash transactions. Configurations is the best overall when looking at credit cards only. Thus configuration 2 is best in terms of overall speed.

  26. Conclusions/Recommendations • Focus should be on reducing the need for manager assistance. • A Major cause of time wasted is when a manager needs to assist the cashier. Approximately an additional 9 minutes spent. • Configuration Types 1 and 2 perform the best. • Given that Type 2 performs better than type 1 when credit is used and since we expect the use of credit to extend in Thailand, we would recommend Type 2. • For a day with 2,000 transactions with an average savings of 3 seconds per transaction, the total savings time is 6,000 seconds or 100 minutes in man labor per day. For a day with 12,000 transactions it can lead to a savings time of 600 minutes or 10 hours in man labor per day. Note: a company looking to down size - eliminate cashiers this would be useful information.

  27. Presenting a logistic regression model

  28. The Presentation • A key to understanding is presentation. How do we view our results. • Visualization and presentation is very important. • It is important to know your audience. • Your audience determines how you will present what you learn from the logistic regression model. • Senior management in a business is not interested in a theoretical data mining discussion. S/he is interested in how your fraud detection model will help the company. • A fellow statistician would need less visualization as they already understand, but in my opinion a nice presentation of results can only help. • We will next cover how to look at the variables that enter into your model. • This is very important for gaining trust in your work.

  29. How Do We View the Independent Variables in the Model? • It is important to interpret the variable in the model and then look at the variable individually compared to the dependent variable. • Often the variable when viewed in the model might have the opposite relationship with the dependent variable than it does when looked at separately. • This can result from multicollinearity. • Multicollinearity will not be covered. • Often when creating a model, it is good to think about the variables that enter into the model and why they are entered. You may be asked to explain why you choose to keep a certain variable and use it in the model. • One way to investigate the independent variable’s relationship with the dependent variable is in the same way as when investigating the model.

  30. Sample Partial Presentation Of A Fraud Detection Model Included is only an explanation of variables in the model and model validation.

  31. Most Important Factors For Detecting Fraud

  32. Number Of Inquiries For Credit In The Past 6 Months This slide is showing that people with more inquiries (applications) for credit are more likely to be a victim of fraud. Perhaps some of the inquiries for credit were made by someone attempting to commit fraud and not the actual individual.

  33. Number Of Inquiries For Credit In The Past 6 Months This slide is showing the same information as the previous slide. This slide is more informative, but many people will think the previous slide is better and easier to understand. Know your audience (who you present to)!

  34. Percent Match and Mismatch Database On Driver License Number People who are committing fraud are more likely to write a driver license number on the application different from the database you have.

  35. Percent Match and Mismatch Database On Zip Code People who are committing fraud are more likely to write a zip code on the application different from the database you have.

  36. Average Age Of Applicant Younger people are more often victims of fraud.

  37. Gender Of Applicant Females are more often victims of fraud.

  38. Gender Of Applicant Again there is more than one way to present the same thing. Know your audience (who you present to)!

  39. An More Graphs • Those simple graphs would be produced for all variables in the model.

  40. Understanding The Fraud Detection Model Performance This Model has a KS of 25.82. By refusing the bottom 10% you would have 32 good loans to one fraud, before 24 good loans to one fraud. By refusing the bottom 10% of applicants you can reduce fraud by 32% (25,532/80,000)

  41. Predicting Phone Usage Why – who to target for phone packages

  42. True Customers • Many use other products but not the phone • They have a phone just not with you. • They are prospect phone customers. • You can leverage phone usage • Predict phone usage for non phone users in your customer database. • Highest predicted phone users get a promotion. • This can be done using your phone customers with multiple products. • How to know if the model you create works – that is the next few slides.

  43. Creating the model • Minutes used is continuous thus a general linear model. • The results of the model are predicted minutes. • Are the predictions any good? • Must think how it will be applied. • For our example think mail out campaign or phone sms campaign.

  44. Validating the Model For MarketingAssume I Desire to Market To Groups 3 and 4. “C” is mailing to less desirables. C B A The Categories: 0-1000 minutes 1001-1500 minutes 1501-2000 minutes 2001+ minutes “A” is correct mailings. Note: We only missed mailing to 198 people with 2001+ minutes. 79.4% of true 3s and 4s correctly identified. Also, we would only mail to 67 people with less than 1001 minutes. “B” is missed opportunity.

  45. Customer Profiling and Customer Value Sample Marketing Project – students were to rank customers according to revenue and risk.

  46. Two Main Objectives • To Understand Your Customers • To Understand the Value of Your Customers • To help make marketing strategies.

  47. First, Who Are Your Customers We Looked At All 15,045 Customers To Understand Who They Are

  48. Gender: More Women than Men

More Related