Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Learning and Applying Biostatistics:How the Guinness Brewery Changed History Katheryne Downes, M.P.H. Statistical Data Analyst Tampa General/USF College of Medicine

Lecture Outline • Part I: The Literature Review • Part II: Statistics • Part III: Sample Size Calculations

Part I: The Literature Review

Who’s done what? • Literature Review • Don’t want to duplicate efforts (or maybe you should?) • Can give ideas about how to (or how not to) conduct the study • Required for sample size calculations

Critical Review of Literature • How were patients selected/recruited? • What population are they attempting to generalize to? • Definition of intervention? • Definition of outcomes? • What was the sample size? • Sample size calculations vs. power analysis • What are the possible confounding variables? What was done to control for these variables? • Statistics? • Interpretation of findings and conclusions?

Kat’s Notes: The Lit Review • How big is the sample size? • Sample size or power calculations? • Randomization? (if applicable) • If you’re dealing with a clinical trial, randomization helps you get rid of many potential sources of bias • Description of Design, Groups, Treatments? • You need DETAILED descriptions of the design of the study, how the study groups were defined and details of the treatment (dosage, machines, devices, etc)

Kat’s Notes: The Lit Review • Confounding variables? • Does the author discuss/address possible confounding variables? (i.e. variables that might be distorting the relationship between the two variables of interest) Does the author control for (statistically) the possible confounding variables? • Statistical significance ≠ Clinical Significance • Read carefully and critically!

Be Careful… • REMEMBER:Just because it’s published does not necessarily mean that it’s a good study or that it’s without flaw. Also- remember publication bias: Studies that show non-significant findings are often NOT published (Despite the fact that they are equally important)

BREAK

Part II: Statistics

Statistics in Literature: The Basics • Statistic • Confidence Intervals • (mean +/- SD) • Significance Values • (P-values)

Statistics in Literature: The Confidence Interval • Confidence Intervals • Estimation (Avg IQ = 100; 95% CI= 70-130) • Hypothesis Testing (Sample Avg IQ = 136, normal 95% CI = 70-130)

One Tail or Two? • One-tail: • We hypothesize Drug A is worse than Drug B • We Hypothesize Drug A is better than Drug B • Two-Tailed: • We hypothesize Drug A performs differently than Drug B (direction isn’t specified, more conservative test)

Confidence Intervals: FAQs Q: My Standard Deviation is larger than my mean- what did I do wrong?!?! A: Most likely, you didn’t do anything wrong. An SD that’s larger than the mean indicates one of two things: 1) a lot of variation in the dataset 2) a non-normal distribution Q: Why is my confidence interval SO wide (or narrow)? A: The width of the confidence interval is a reflection of its precision. If there’s a lot of variation in the dataset or if there’s a great deal of uncertainty in the estimate, your interval will be quite wide. The opposite is also true.

Statistics in Literature: Significance Values • P-value: the probability of observing your finding by chance alone. A p-value = .001 means that the probability of observing that particular event by chance would only be about 1/1000. Translation? You can be fairly certain that your observation did NOT occur by chance alone- something intervened.

Quiz Time! Q: What is the 95% CI for the following data: mean=30, SD=5 ? A: 95% CI = 20 – 40 Q: For the previous question, if you obtained a sample mean =10, what would you conclude? A: Since 10 lies outside of the 95% CI, this event is unlikely to have occurred by chance alone. In fact, the chances of observing this event by chance would most likely be less than 5% Q: How do you interpret a p-value = .05? A: The probability that the event occurred by chance is approximately 5%.

How the Guinness Brewery Changed History… • “Student’s” t-test • William Gossett (left) • R.A. Fisher (right)

Understanding Statistics in Literature • Are the statistics appropriate? • What, exactly, does this really mean? • What does an odds ratio of 1.5 really mean? • Why am I looking for a “1” or a “0” in this confidence interval? • What does a significant ANOVA tell you? (for that matter, what’s an ANOVA!?!?!)

T-test/Z-test • What type of data? (2) Group Means (continuous) • Reported as? t-statistic/z-score & p-value • What does it REALLY test? The difference in group distributions- in particular- the difference in group means.

T-test/Z-test Continued… • T-tests are used when the sample size for each group is very small • Z-tests utilize the normal distribution and can be used when the sample size is adequately large • Not Appropriate for categorical data

ANOVA: Analysis of Variance • What type of Data? (3+) Means (continuous) • Reported as? F-Statistic, p-value • What does it REALLY test? It compares the distributions of several groups simultaneously- it examines whether the amount of variation between groups is greater than that of within groups. A significant F-statistic tells you that the groups are not all equal, but it does NOT tell you which groups are different.

ANOVA • Once a significant F-statistic is obtained, your next step would be to conduct a post-hoc test to determine which groups are different (Tukey). • Again, cannot be used for categorical data.

Chi-Square • What type of data? Categorical/dichotomous • Reported as? Χ2, p-value • What does it REALLY test? A chi-square tests whether the observed frequency of an event is different than the expected frequency of the event (that which would occur by chance). • ***Chi-Square tests can ONLY be used when each cell count is greater than or equal to “5”

Fisher Exact Test • Works in basically the same manner as a chi-square, but it’s used when you have cell counts below “5” • An “exact” test CAN be used when cell counts are “5” or higher, but it becomes difficult to calculate with large sample sizes

OR, RR, HR • OR: Odds Ratio • RR: Relative Risk or Risk Ratio • HR: Hazard Ratio • All three are ratios of risk- one test group is reflected in the numerator, the other in the denominator- therefore, if you get a ratio = “1” that means there’s NO DIFFERENCE between groups. Keep this in mind while we look at them individually.

Odds Ratios • What type of data? Case/Control Studies • Reported as? OR, CI, p-value • What does it REALLY test? The amount of risk associated with a particular exposure. • ***An Odds Ratio must be used in case-control studies as the measure of risk because we have incomplete information about the prevalence/incidence of the disease in the calculations

OR: Interpretation • OR* <1: Exposure is Protective • OR*=1: No Difference • OR*>1: Exposure is Risk Factor • Example • OR, CI, and p-value • OR = 1 = NO DIFFERENCE • What would a CI containing “1” mean? (OR*: The same thing applies to RR and HR)

Relative Risk • What type of data? Cohort Studies • Reported as? RR, CI, p-value • What does it REALLY test? The amount of risk associated with a particular exposure. • ***Relative Risk can be safely used in cohort studies because we have incident rates available.

Quiz Time! Q: You’re conducting a study examining the complication rate (yes/no) in relationship to type of plate utilized in surgery (titanium/stainless steel). • What type of data is this? Categorical or Continuous? • Let’s say that there are 4 people with titanium plates that didn’t have complications- which test would you have to use? A: Categorical data, Fisher Exact Test

Quiz Time! Q: You’re conducting a study on the average number of hours a surgery takes to complete. You have 3 groups (70 people in each): interns, residents, and fellows. What’s the appropriate statistic to use to determine whether a difference exists between these groups? A. Chi-Square B. Fisher Exact Test C. T-test/Z-test D. ANOVA E. Odds Ratio

Quiz Time! Q: The t-distribution/test was created to test the brew quality of which of the following beers: • Budweiser • Coors • Presidente • Guinness • Samuel Adams • Miller *Bonus Point: Name the country of origin of Presidente

Kat’s Notes: Statistics • Confidence Intervals • Mean +/- SD • Estimation • Hypothesis Testing • P-value • Probability of observing a phenomenon by chance alone

Kat’s Notes: Statistics • T-Test/Z-Test • Used for testing 2 group means. • ANOVA • Used for testing 3+ group means. Tells you that a difference exists, but doesn’t tell you which groups are different. • Chi-Square • Used for categorical data (yes/no; male/female). Tells you whether observed matches expected outcomes. Every cell count MUST be “5” or greater.

Kat’s Notes: Statistics • Fisher Exact • Also used for categorical data. Necessary when any cell count is below “5” • Odds Ratio • Used for comparing categorical data again- observed vs. expected. Needed to approximate RR in Case-control studies • Relative Risk • Used for comparing risk in two groups with categorical data (sick/not sick; male/female). Can be used in cohort studies where incidence/prevalence data are available.

BREAK

Part III: Sample Size

Why does it matter? Why are sample size calculations so important? *A sample size calculation allows us to determine how many people we need to detect a difference if one exists…

Why does it matter? • Significant difference. -You might have been able to use a smaller sample size… • Not Significant. -You don’t know whether your lack of significance was due to low power or the fact that no difference really exists… So, What happens if you don’t do sample size calculations?

Sample Size Calculations vs. Power Analysis • Sample Size Calculations: • Completed prior to gathering data • Tells you how many people you need to investigate your phenomenon of interest • Power Analysis • Completed after all data has been collected and analyzed • Determines whether you had adequate power to find a significant difference

Sample Size Calculations • Depends on what test you’re planning on conducting, but, in general… • Expected value in your control (mean, proportion, etc) • Expected differences Large or small? • Amount of variation known to exist (SDs, etc) • Heterogeneous vs. homogeneous

Sample Size Calculations: t-tests From Literature/pilot study • Standard deviation • Expected difference (based off experience, previous research or other evidence) Remember: select your numbers from a well-designed study. Be Careful!!

Sample Size Calculations: Proportions Test From the literature/pilot study: • Proportion of observed events in the control group • Anticipated proportion of observed events in the active group (based off previous trends)

Kat’s Notes: Sample Size Calculations • Sample Size Calculations are much more desirable than power analysis • Obtain information from well-conducted studies- Remember: GIGO (garbage in, garbage out) Don’t pick out your numbers from a bad study! • You generally need the 1) average value and 2) amount of variation in your control (comparison group)

REMEMBER! • No matter what- if you find a significant result, there’s still a small possibility that you’re WRONG. This is inherent in probability- we don’t have 100% certainty. We can only attempt to minimize the possible problems. • If you fail to find a significant result- it doesn’t necessarily mean that there isn’t a relationship there. The study might have been structured incorrectly, used the wrong statistics, the wrong model, the relationship might not be the form that you think it is (linear regression on curvilinear data), or there might be another variable interfering that you don’t know about…

QUESTIONS?

On-Site Biostatistics: The Take-Home Menu • Clinical Trial Design • Database Design • Sample Size Calculations • Randomization Schemes • Data Analysis • Instruction • IRB Statistical Review • Publication consultation

Thank you!

Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Presentation Transcript

Interacting Electronically with NIH Post-submission Processing

Biostatistics

بسم الله الرحمن الرحيم

Officials: Two Sam Adams employees suffer minor injuries

Introduction to Statistics and Biostatistics and Definitions

The Age that Changed History

Brewery Safety

Integrating A Problem-Based Learning Approach Into Large Sections of Graduate-Level Introductory Biostatistics Courses

UNC Biostatistics

Good Brew- Havior Brewery

Underhill Brewery

Learning

Applying Learning and Cognition Theories

POLICY LEARNING – APPLYING THE CHANGING LEARNING PARADIGM FOR POLICY ADVICE ON VET REFORMS

30 years that changed the world!!

Brewery Student Beyond Borders

An Indian brewery

The Life of Jesus (25)

Introduction to Biostatistics (ZJU 2008)

A primer in Biostatistics