Everyday Statistics Presented by Mike La Dolcetta ASQ Sr. Member, CQE, CQM, CSSGB April 23rd, 2009
A Little Background • Statistics is a relatively young branch of mathematics. • While texts of formal mathematics date back to 1900 -1800 B.C., (Babylonians, Egyptians), the concepts of probability and statistics required the advanced Hindu-Arabic number system to reach Medieval Europe (Fibonacci, circa 1200 A.D.), and ultimately the intellectual freedom of The Renaissance (Tartaglia, Ferrari, Cardano, Galileo). • Born out of probability theory, and very practical real-world needs… • Gambling - Chevalier de Méré (Antoine Gombauld), Blaise Pascal, and Pierre de Fermat collaboration circa 1650. • Mortality data (Insurance/Risk Management) – 1662 John Graunt’s and William Petty’s pioneering use of sampling methods and probabilities. • The word statistics is derived from the analysis of quantitative facts about the state.1 • Operationally, statistics are derived from sample data. Similar values derived from entire populations are referred to as parameters. Bone dice from Pompeii 1st Century AD 1. Peter L. Bernstein, Against The Gods, pg 77.
Modern Branches of Statistics • Mathematical Statistics • Theoretical – super heavy on math; based on probability theory and algebra. • Provides the foundation for much of Applied Statistics. • Applied Statistics • Descriptive – enables us to describe and summarize large amounts of data, including: • Population data – birth, census, labor, illness, death, etc. • Meteorological / Geological data – climate, wind, rainfall, seismic, volcanic, etc. • Business data – product testing, customer surveying, marketing, opinion polls, etc. • Inferential – enables us to draw conclusions based on data, and to quantify uncertainty. Examples include: • Pharmaceutical R&D Testing – efficacy (ability to produce desired effect), safety. • Product Performance Claims – statistically significant differences/change. • Econometrics, stock market forecasting, insurance policy premiums and annuities. • Meteorological – weather forecasting.
Descriptive Statistics 101 • Descriptive Statistics relates to much of what we are exposed to in everyday media. • Measures of Central Tendency of data: • Means (a.k.a. Averages) • Arithmetic Mean – The sum of data values divided by the count of data values. • Geometric Mean – Averages exponential rates of growth. • Harmonic Mean – Commonly used when averaging rates. • Median – The middle value (or the average of the two middle values for even numbered data sets. • Mode – The most frequent value in a data set. • Measures of Dispersion of data: • Range – The width of your data: the biggest value minus the smallest value. • Variance – The arithmetic mean of the squares of the deviations of each data point from the data’s arithmetic mean. Has units of the square of whatever units your data is. Is additive across processes. • Standard Deviation – The square root of the variance. Same units as the data.
Statistics (Ab)used to Persuade • Easily abused, this often causes people to dismiss statistics as useless in general: • “Figures don’t lie, but liars figure.” -Mark Twain • “Statistics can be made to prove anything.” -George Canning (British Prime Minister) • “98% of all statistics are made up.” - Unknown • “There are three kinds of lies: lies, damned lies, and statistics.” - Disraeli/Twain This makes it all the more important that the general population be educated: • “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.” - H.G. Wells • “To understand God's thoughts we must study statistics, for these are the measure of His purpose.” - Florence Nightingale • “All [statistical] models are wrong, but some models are useful.” – George Box • “Do not put faith in what statistics say until you have carefully considered what they do not say.” - William W. Watt (Author) Nightingale H.G. Wells
Abuse #1: Biased Samples ??? ??? EXTRA CREDIT “H&R Block found errors on 4 out of 5 returns prepared by others.” What’s Wrong With This Statement?
Abuse #2: Inappropriate Comparisons “Research shows children who ate Kellogg’s® Frosted Mini-Wheats® cereal for breakfast demonstrated better attentiveness and quality of memory throughout the morning when compared to kids who didn’t eat breakfast.” - Kellogg's Clinical Study Potential effects of not eating breakfast: • Low blood sugar • Mild dehydration / low electrolytes • Stomachache / hunger pangs Is loss of attentiveness and memory any surprise? Why not compare to a competitor’s product instead?
Abuse #3: Spurious Accuracy Virginia Housing Unit Estimates: 2000 to 2007 • Values are listed as estimates… • But are provided with remarkable precision that is “unknown and unknowable.” • Similar information is available for births, deaths, incomes, etc. Why not round to the nearest hundred… or thousand? Source: U.S. Census website Why not provide confidence intervals at a specific confidence level?
Abuse #4: Selective Statistics “Salaries at our company average over $83,000 annually” – Company XYZ • True, but the median (middle) salary is about $50K, that’s ~$33K per year less! • Measures of dispersion are high, indicating salaries vary greatly. • In fact, 98% of the salaries fall between $33K and 69K. EXTRA CREDIT “New Research Suggests Drinking As Little As One Cup Of Black Tea Per Day Can Help Protect Against Cardiovascular Disease.” –MediLexicon.com What’s Wrong With This Statement?
Abuse #5: Insufficient Data* “Sales declined substantially last quarter. We need to motivate the sales team to get where we need to be!” • Sales dropped by about $0.4M, or 14.3%! • Whenever you compare two numbers, the most likely outcome is that one will be bigger than the other. • The complete picture is apparent when a longer term trend is analyzed. • In this case, it was the best Q1 in the past 3 years, and the overall trend is positive! *And tricky pictures!
Abuse #6: Silly Arithmetic “The Law Offices of Smith, Wesson, Heckler, and Koch: over half a century of experience you can rely on!“ John Smith : 8 Years William Wesson : 5 Years Harcourt Heckler : 4 years Robert Koch : 1 Year (recently passed Bar) Two Exec Secretaries: 9 years of experience Four Jr. Attorneys : 5 years combined Eight Paralegals : 19 years combined Total: 51 Years of Experience! Do you really get all that when a Junior Attorney takes your case? Even eight years of experience can be one year of experience eight times over.
Summary I lost 85 pounds in 6 months with the ABC Diet!* *Results not typical. Just one dose and pain is gone in as quick as 15 minutes!* *Fastest reported relief: actual performance will vary. Earn as much as 73% on investments using the XYZ method!* *Any investment in the Stock Market involves risk and is not guaranteed. Statistical literacy is an important skill for modern life. With such, you’ll have a critical eye with which to better discern claims and prepackaged summarized data, and defend yourself against charlatan attacks to separate you from your hard earned wages.
Further Reading Click on the hyperlinked titles to go to a web page for details (Internet connection required) Against the Gods: The Remarkable Story of Risk – Peter L. Bernstein Games, Gods & Gambling: A History of Probability and Statistical Ideas - F.N. David Flaws and Fallacies in Statistical Thinking – Stephen K. Campbell A Mathematician Reads the Newspaper – John Allen Paulos How to Lie with Statistics – Darrell Huff and Irving Geis An Intuitive Explanation of Bayes' Theorem Mind and the World Order: Outline of a Theory of Knowledge – by Clarence Irving Lewis