1 / 103

Review of Top 10 Concepts in Statistics

Review of Top 10 Concepts in Statistics. NOTE: This Power Point file is not an introduction, but rather a checklist of topics to review. Top Ten. 10. Qualitative vs. Quantitative Data 9. Population vs. Sample 8. Graphical Tools 7. Variation Creates Uncertainty 6. Which Distribution?

Télécharger la présentation

Review of Top 10 Concepts in Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Review of Top 10 Conceptsin Statistics NOTE: This Power Point file is not an introduction, but rather a checklist of topics to review

  2. Top Ten 10. Qualitative vs. Quantitative Data 9. Population vs. Sample 8. Graphical Tools 7. Variation Creates Uncertainty 6. Which Distribution? 5. P-value 4. Linear Regression 3. Confidence Intervals 2. Descriptive Statistics 1. Hypothesis Testing

  3. Top Ten #10 • Qualitative vs. Quantitative

  4. Qualitative • Categorical data: success vs. failure ethnicity marital status color zip code 4 star hotel in tour guide

  5. Qualitative • If you need an “average”, do not calculate the mean • However, you can compute the mode (“average” person is married, buys a blue car made in America)

  6. Quantitative • integer values (0,1,2,…) • number of brothers • number of cars arriving at gas station • Real numbers, such as decimal values ($22.22) • Examples: Z, t • Miles per gallon, distance, duration of time

  7. Hypothesis TestingConfidence Intervals • Quantitative: Mean • Qualitative: Proportion

  8. Top Ten #9 • Population vs. Sample

  9. Population • Collection of all items (all light bulbs made at factory) • Parameter: measure of population characteristic (1) population mean (average number of hours in life of all bulbs) (2) population proportion (% of all bulbs that are defective)

  10. Sample • Part of population (bulbs tested by inspector) • Statistic: measure of sample = estimate of parameter (1) sample mean (average number of hours in life of bulbs tested by inspector) (2) sample proportion (% of bulbs in sample that are defective)

  11. Top Ten #8: Graphical Tools • Pie chart or bar chart: qualitative • Joint frequency table: qualitative (relate marital status vs zip code) • Scatter diagram: quantitative (distance from ASU vs duration of time to reach ASU) • Histograms • Stem Plots

  12. Graphical Tools • Line chart: trend over time • Scatter diagram: relationship between two variables • Bar chart: frequency for each category • Histogram: frequency for each class of measured data (graph of frequency distr.) • Box plot: graphical display based on quartiles, which divide data into 4 parts

  13. Top Ten #7 • Variation Creates Uncertainty

  14. No Variation • Certainty, exact prediction • Standard deviation = 0 • Variance = 0 • All data exactly same • Example: all workers in minimum wage job

  15. High Variation • Uncertainty, unpredictable • High standard deviation • Ex #1: Workers in downtown L.A. have variation between CEOs and garment workers • Ex #2: New York temperatures in spring range from below freezing to very hot

  16. Comparing Standard Deviations • Temperature Example • Beach city: small standard deviation (single temperature reading close to mean) • High Desert city: High standard deviation (hot days, cool nights in spring)

  17. Standard Error of the Mean Standard deviation of sample mean = standard deviation/square root of n Ex: standard deviation = 10, n =4, so standard error of the mean = 10/2= 5 Note that 5<10, so standard error < standard deviation. As n increases, standard error decreases.

  18. Sampling Distribution • Expected value of sample mean = population mean, but an individual sample mean could be smaller or larger than the population mean • Population mean is a constant parameter, but sample mean is a random variable • Sampling distribution is distribution of sample means

  19. Example • Mean age of all students in the building is population mean • Each classroom has a sample mean • Distribution of sample means from all classrooms is sampling distribution

  20. Central Limit Theorem (CLT) • If population standard deviation is known, sampling distribution of sample means is normal if n > 30 • CLT applies even if original population is skewed

  21. Top Ten #6 • What Distribution to Use?

  22. Normal Distribution • Continuous, bell-shaped, symmetric • Mean=median=mode • Measurement (dollars, inches, years) • Cumulative probability under normal curve : use Z table if you know population mean and population standard deviation • Sample mean: use Z table if you know population standard deviation and either normal population or n > 30

  23. t Distribution • Continuous, mound-shaped, symmetric • Applications similar to normal • More spread out than normal • Use t if normal population but population standard deviation not known • Degrees of freedom = df = n-1 if estimating the mean of one population • t approaches z as df increases

  24. Normal or t Distribution? • Use t table if normal population but population standard deviation (σ) is not known • If you are given the sample standard deviation (s), use t table, assuming normal population

  25. Top Ten #5 • P-value

  26. P-value • P-value = probability of getting a sample statistic as extreme (or more extreme) than the sample statistic you got from your sample, given that the null hypothesis is true

  27. P-value Example: one tail test • H0: µ = 40 • HA: µ > 40 • Sample mean = 43 • P-value = P(sample mean > 43, given H0 true) • Meaning: probability of observing a sample mean as large as 43 when the population mean is 40 • How to use it: Reject H0 if p-value < α (significance level)

  28. Two Cases • Suppose α = .05 • Case 1: suppose p-value = .02, then reject H0 (unlikely H0 is true; you believe population mean > 40) • Case 2: suppose p-value = .08, then do not reject H0 (H0 may be true; you have reason to believe that the population mean may be 40)

  29. P-value Example: two tail test • H0 : µ = 70 • HA: µ≠ 70 • Sample mean = 72 • If two-tails, then P-value = 2  P(sample mean > 72)=2(.04)=.08 If α = .05, p-value > α, so do not reject H0

  30. Top Ten #4 • Linear Regression

  31. Linear Regression • Regression equation: • =dependent variable=predicted value • x= independent variable • b0=y-intercept =predicted value of y if x=0 • b1=slope=regression coefficient =change in y per unit change in x

  32. Slope vs Correlation • Positive slope (b1>0): positive correlation between x and y (y increase if x increase) • Negative slope (b1<0): negative correlation (y decrease if x increase) • Zero slope (b1=0): no correlation(predicted value for y is mean of y), no linear relationship between x and y

  33. Simple Linear Regression • Simple: one independent variable, one dependent variable • Linear: graph of regression equation is straight line

  34. Example • y = salary (female manager, in thousands of dollars) • x = number of children • n = number of observations

  35. Given Data

  36. Totals

  37. Slope (b1) = -6.5 • Method of Least Squares formulas not on BUS 302 exam • b1= -6.5 given Interpretation: If one female manager has 1 more child than another, salary is $6,500 lower; that is, salary of female managers is expected to decrease by -6.5 (in thousand of dollars) per child

  38. Intercept (b0) • b0 = 44.33 – (-6.5)(2.33) = 59.5 • If number of children is zero, expected salary is $59,500

  39. Regression Equation

  40. 59.5 –6.5(3) = 40 $40,000 = expected salary Forecast Salary If 3 Children

  41. Standard Error of Estimate

  42. Standard Error of Estimate

  43. Standard Error of Estimate Actual salary typically $1,900 away from expected salary

  44. Coefficient of Determination • R2 = % of total variation in y that can be explained by variation in x • Measure of how close the linear regression line fits the points in a scatter diagram • R2 = 1: max. possible value: perfect linear relationship between y and x (straight line) • R2 = 0: min. value: no linear relationship

  45. Sources of Variation (V) • Total V = Explained V + Unexplained V • SS = Sum of Squares = V • Total SS = Regression SS + Error SS • SST = SSR + SSE • SSR = Explained V, SSE = Unexplained

  46. Coefficient of Determination • R2 =SSR SST • R2 = 197 = .98 200.5 • Interpretation: 98% of total variation in salary can be explained by variation in number of children

  47. 0 < R2< 1 • 0: No linear relationship since SSR=0 (explained variation =0) • 1: Perfect relationship since SSR = SST (unexplained variation = SSE = 0), but does not prove cause and effect

  48. R=Correlation Coefficient • Case 1: slope (b1) < 0 • R < 0 • R is negative square root of coefficient of determination

  49. Our Example • Slope = b1 = -6.5 • R2 = .98 • R = -.99

  50. Case 2: Slope > 0 • R is positive square root of coefficient of determination • Ex: R2 = .49 • R = .70 • R has no interpretation • R overstates relationship

More Related