1 / 37

Introduction to Statistics

Introduction to Statistics. Dr Linda Morgan Clinical Chemistry Division School of Clinical Laboratory Sciences. Outline. Types of data Descriptive statistics Estimates and confidence intervals Hypothesis testing Comparing groups Relation between variables

nevina
Télécharger la présentation

Introduction to Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Statistics Dr Linda Morgan Clinical Chemistry Division School of Clinical Laboratory Sciences

  2. Outline • Types of data • Descriptive statistics • Estimates and confidence intervals • Hypothesis testing • Comparing groups • Relation between variables • Statistical aspects of study design • Pitfalls

  3. Types of data • Categorical data • Ordered categorical data • Numerical data • Discrete • Continuous

  4. Descriptive statisticsCategorical variables • Graphical representation – bar diagram • Numbers and proportions in each category

  5. Descriptive statisticsContinuous variables • Distributions • Gaussian • Lognormal • Non-parametric • Central tendency • Mean • Median • Scatter • Standard deviation • Range • Interquartile range

  6. Gaussian (normal) distribution

  7. Gaussian (normal) distribution • Central tendency • Mean =  x • n • Scatter • Variance = S(x-mean)2 • n –1 • Standard deviation =  variance

  8. Lognormal distribution

  9. Lognormal distribution

  10. Lognormal distribution • Mean =  log x n • Geometric mean = antilog of mean (10mean) • Median • Rank data in order • Median = (n+1) / 2th observation

  11. Variability • Variance = S(x-mean)2 n –1 • Standard deviation =  variance • Range • Interquartile range

  12. Variability of Sample Mean • The sample mean is an estimate of the population mean • The standard error of the mean describes the distribution of the sample mean • Estimated SEM = SD/  n • The distribution of the sample mean is Normal providing n is large

  13. Standard error of the difference between two means • SEM = SD/  n • Variance of the mean = SD2/n • Variance of the difference between two sample means = sum of the variances of the two means = (SD2/n)1 + (SD2/n)2 • SE of difference between means =  [(SD2/n)1 + (SD2/n)2 ]

  14. Variability of a sample proportion • Assume Normal distribution when np and n(1-p) are > 5 • SE of a Binomial proportion = (pq/n) where q = 1-p

  15. Standard error of the difference between two proportions • SE (p1 – p2) =  [variance (p1) + variance (p2) ] =  [ (p1 q1 /n1) + (p2 q2 /n2) ]

  16. Confidence intervals of means • 95% ci for the mean = Sample mean  1.96 SEM • 95% ci for difference between 2 means = (mean1 – mean2 )  1.96 SE of difference

  17. Confidence intervals of proportions • 95% ci for proportion = p  1.96 (pq/n) • 95% ci for difference between two proportions = (p1 – p2)  1.96 x SE (p1 – p2)

  18. Hypothesis testing • The null hypothesis • The alternative hypothesis • What is a P value?

  19. Comparing 2 groups of continuous data • Normal distribution: paired or unpaired t test • Non-Normal distribution: transform data OR Mann-Whitney-Wilcoxon test

  20. Paired t test We wish to compare the fasting blood cholesterol levels in 10 subjects before and after treatment with a new drug. What is the null hypothesis?

  21. Paired t test Subject Fasting cholesterol D Number Predrug Postdrug 01 6.7 4.4 2.3 02 7.8 7.0 0.8 03 8.1 6.0 2.1 04 5.5 5.8 -0.3 05 8.6 9.0 -0.4 06 6.7 6.1 0.6 07 7.1 7.3 -0.2 08 9.9 9.9 0 09 8.2 6.3 1.9 10 6.5 7.1 -0.6

  22. Paired t test • Calculate the mean and SEM of D • The null hypothesis is that D = 0 • The test statistic t = mean(d) – 0 SEM (d)

  23. Paired t test • Mean = 0.62 • SEM = 0.351 • t = 1.766 • Degrees of freedom = n - 1 = 9 • From tables of t, 2-tailed probability (P) is between 0.1 and 0.2 • How would you interpret this?

  24. Comparing 2 groups of categorical data • In a study of the effect of smoking on the risk of developing ischaemic heart disease, 250 men with IHD and 250 age-matched healthy controls were asked about their current smoking habits. • What is the null hypothesis?

  25. Results • 70 of the 250 patients were smokers • 30 of the healthy controls were smokers

  26. Calculate expected values, E, for each cell

  27. Calculate (observed – expected) value, D

  28. Calculate D2/E

  29. Calculate the sum of D2/E 8 + 8 + 2 + 2 = 20 This is the test statistic, chi squared Compare with tables of chi squared with (r-1)(c-1) degrees of freedom In this case, chi squared with 1 df has a P value of < 0.001 How do you interpret this?

  30. Statistical analysis using computer software SPSS as an example

  31. Planning • Experimental design • Suitable controls • Database design

  32. Statistical power • The power of a study to detect an effect depends on: • The size of the effect • The sample size • The probability of failing to detect an effect where one exists is called b • The power of a study is 100(1-b)% • Wide confidence intervals indicate low statistical power

  33. Statistical power • The necessary sample size to detect the effect of interest should be calculated in advance • Pilot data are usually required for these calculations

  34. Statistical power - example • 30% of the population are carriers of a genetic variant. You wish to test whether this variant increases the risk of Alzheimers Disease. • For P < 0.05, and 80% power, number of controls and cases required: Control carriers Case carriers Sample size 30% 50% 100 30% 40% 350 30% 35% 1400

  35. Multiple testing Number of Probability of Tests false positive 1 0.05 2 0.10 3 0.14 4 0.19 5 0.23 10 0.40 20 0.64 Bonferroni correction: Divide 0.05 by the number of tests to provide the required P value for hypothesis testing at the conventional level of statistical significance

  36. Data trawling • Decide in advance which statistical tests are to be performed • Post hoc testing of subgroups should be viewed with caution • Multiple correlations should be avoided

  37. HELP! • “In house” support • Cripps Computing Centre • Trent Institute for Health Service Research • Practical Statistics for Medical Research Douglas G Altman

More Related