Looking at Data

# Looking at Data

Télécharger la présentation

## Looking at Data

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Looking at Data

2. Clinical Data Example • 1. Kline et al. (2002) • The researchers analyzed data from 934 emergency room patients with suspected pulmonary embolism (PE). Only about 1 in 5 actually had PE. The researchers wanted to know what clinical factors predicted PE. • I will use four variables from their dataset today: • Pulmonary embolism (yes/no) • Age (years) • Shock index = heart rate/systolic BP • Shock index categories = take shock index and divide it into 10 groups (lowest to highest shock index)

3. Descriptive Statistics

4. Types of Variables: Overview Categorical Quantitative binary nominal ordinal discrete continuous 2 categories + more categories + order matters + numerical + uninterrupted

5. Categorical Variables • Also known as “qualitative.” • Categories. • treatment groups • exposure groups • disease status

6. Categorical Variables • Dichotomous (binary)– two levels • Dead/alive • Treatment/placebo • Disease/no disease • Exposed/Unexposed • Heads/Tails • Pulmonary Embolism (yes/no) • Male/female

7. Categorical Variables • Nominal variables– Named categories Order doesn’t matter! • The blood type of a patient (O, A, B, AB) • Marital status • Occupation

8. Categorical Variables • Ordinal variable– Ordered categories. Order matters! • Staging in breast cancer as I, II, III, or IV • Birth order—1st, 2nd, 3rd, etc. • Letter grades (A, B, C, D, F) • Ratings on a scale from 1-5 • Ratings on: always; usually; many times; once in a while; almost never; never • Age in categories (10-20, 20-30, etc.) • Shock index categories (Kline et al.)

9. Quantitative Variables • Numerical variables; may be arithmetically manipulated. • Counts • Time • Age • Height

10. Quantitative Variables • Discrete Numbers– a limited set of distinct values, such as whole numbers. • Number of new AIDS cases in CA in a year (counts) • Years of school completed • The number of children in the family (cannot have a half a child!) • The number of deaths in a defined time period (cannot have a partial death!) • Roll of a die

11. Quantitative Variables • Continuous Variables - Can take on any number within a defined range. • Time-to-event (survival time) • Age • Blood pressure • Serum insulin • Speed of a car • Income • Shock index (Kline et al.)

12. Looking at Data • ü How are the data distributed? • Where is the center? • What is the range? • What’s the shape of the distribution (e.g., Gaussian, binomial, exponential, skewed)? • ü Are there “outliers”? • ü Are there data points that don’t make sense?

13. The first rule of statistics: USE COMMON SENSE!90% of the information is contained in the graph.

14. Frequency Plots (univariate) Categorical variables • Bar Chart Continuous variables • Box Plot • Histogram

15. Bar Chart • Used for categorical variables to show frequency or proportion in each category. • Translate the data from frequency tables into a pictorial representation…

16. Bar Chart for SI categories 200.0 183.3 166.7 150.0 133.3 116.7 Number of Patients 100.0 83.3 66.7 50.0 33.3 16.7 0.0 1 2 3 4 5 6 7 8 9 10 Shock Index Category Much easier to extract information from a bar chart than from a table!

17. Box plot and histograms: for continuous variables • To show the distribution (shape, center, range, variation) of continuous variables.

18. Box Plot: Shock Index 2.0 1.3 Q3 + 1.5IQR = .8+1.5(.25)=1.175 maximum (1.7) minimum (or Q1-1.5IQR) Outliers median (.66) Shock Index Units “whisker” 75th percentile (0.8) interquartile range (IQR) = .8-.55 = .25 0.7 25th percentile (0.55) 0.0 SI

19. Histogram of SI 25.0 16.7 Percent 8.3 0.0 0.0 0.7 1.3 2.0 SI Bins of size 0.1 Note the “right skew”

20. 100 bins (too much detail)

21. 2 bins (too little detail)

22. Box Plot: Shock Index 2.0 1.3 Shock Index Units 0.7 0.0 SI Also shows the “right skew”

23. maximum minimum median 75th percentile interquartile range 25th percentile Box Plot: Age 100.0 More symmetric 66.7 Years 33.3 0.0 AGE Variables

24. 14.0 9.3 Percent 4.7 0.0 0.0 33.3 66.7 100.0 AGE (Years) Histogram: Age Not skewed, but not bell-shaped either…

25. Some histograms from your class (n=24) Starting with politics…

26. Feelings about math and writing…

27. Optimism…

28. Diet…

29. Habits…

30. Measures of central tendency • Mean • Median • Mode

31. Central Tendency • Mean– the average; the balancing point calculation: the sum of values divided by the sample size In math shorthand:

32. Mean: example Some data: Age of participants: 17 19 21 22 23 23 23 38

33. 14.0 Percent 9.3 4.7 0.0 0.0 33.3 66.7 100.0 Mean of age in Kline’s data • Means Section of AGE • Geometric Harmonic • Parameter Mean Median Mean Mean Sum Mode • Value 50.19334 49 46.66865 43.00606 46730 49 • 556.9546

34. 14.0 9.3 Percent 4.7 0.0 0.0 33.3 66.7 100.0 The balancing point Mean of age in Kline’s data

35. Mean of Pulmonary Embolism? (Binary variable?) 80.56% (750) 19.44% (181)

36. Mean • The mean is affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 Mean = 4 • Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

37. Central Tendency • Median– the exact middle value Calculation: • If there are an odd number of observations, find the middle value • If there are an even number of observations, find the middle two values and average them.

38. Median: example Some data: Age of participants: 17 19 21 22 23 23 23 38 Median = (22+23)/2 = 22.5

39. 14.0 Percent 9.3 4.7 0.0 0.0 33.3 66.7 100.0 AGE (Years) Median of age in Kline’s data • Means Section of AGE • Geometric Harmonic • Parameter Mean Median Mean Mean Sum Mode • Value 50.19334 49 46.66865 43.00606 46730 49

40. 14.0 50% of mass 50% of mass 9.3 Percent 4.7 0.0 0.0 33.3 66.7 100.0 Median of age in Kline’s data

41. Does PE have a median? • Yes, if you line up the 0’s and 1’s, the middle number is 0.

42. Median • The median is not affected by extreme values (outliers). 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Median = 3 Median = 3 • SSlide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

43. Central Tendency • Mode– the value that occurs most frequently

44. Mode: example Some data: Age of participants: 17 19 21 22 23 23 23 38 Mode = 23 (occurs 3 times)

45. Mode of age in Kline’s data • Means Section of AGE • Geometric Harmonic • Parameter Mean Median Mean Mean Sum Mode • Value 50.19334 49 46.66865 43.00606 46730 49

46. Mode of PE? • 0 appears more than 1, so 0 is the mode.

47. Measures of Variation/Dispersion • Range • Percentiles/quartiles • Interquartile range • Standard deviation/Variance