Southern California Clinical and Translational Science Institute:

Biostatistics Lunch & Learn SeriesSample size and study power: Why do I need so many subjects? What will my biostatistician need to know and how can I get that information? Southern California Clinical and Translational Science Institute: Research Development and Team ScienceBiostatistics, Epidemiology and Research Design (BERD) January 14, 2019

Biostatistics, Epidemiology and Research Design (BERD) Faculty: Wendy Mack, BERD Director Christianne Lane, USC Melissa Wilson, USC Cheryl Vigen, USC Ji HoonRyoo, CHLA Staff: Choo Phei Wei, CHLA Caron Park, USC Melissa Mert, USC Angeli Bernardo, USC

Objectives • Part 1: Formulating a sound research question and study hypotheses: hypothesis testing • Part 2: Study designs and data collection strategies: scientific and logistical considerations in selecting the design to address your research question • Today: Sample size and study power: Why do I need so many subjects? What will my biostatistician need to know and how can I get that information? • Part 4: Statistical analysis: What statistical methods are appropriate for my study design and data collected?

Reminder: Defining the Research Question and Hypothesis Testing • What are the components of a good research question? • How do I translate my research question to a statistical question (and hypothesis) that I can test? • What is statistical hypothesis testing? What does a p-value mean? • How does the research question relate to study design? What alternative designs might be used to address my research question? (Today)

PICOT Criteria to Develop the Research Question • P PopulationWhat specific population will you test the intervention in? • I Intervention (or Exposure)What is the intervention/exposure to be investigated?Intervention (clinical trial); Exposure (observational study) • C Comparison Group What is the main comparator to judge the effect of the intervention? • O OutcomeWhat will you measure, improve, affect? • T TimeOver what time period will outcome be assessed?

Spectrum of Study Designs From Center for Evidence-Based Medicine (CEBM), University of Oxford http://www.cebm.net/study-designs/

Estimating sample size for your study • What data do you need to estimate sample size? • How do you get the data you need? • Implications for trial feasibility • Resources for sample size estimation

Why do we care about sample size? • How bad can it be to have an incorrect sample size? • IF TOO Small: reliability, validity, generalizability issues • Conclusions based on a few patients or participants • Poor quality research: (unethical) • Type II error can occur (failed to reject H0 when HA is true) • IF TOO LARGE: feasibility • Difficult to perform • Costs more • Delay in completion • Ethical considerations: exposing more persons than needed to risks

Why do we care about sample size? • Grant reviewers expect to see a well-considered sample size estimation, that should include things like: • What is the outcome upon which I am basing sample size estimation? • What is that outcome like (mean, SD, %, etc.) in a comparator group? • How different do I expect my intervention/exposed group to be from my comparator group? Does my preliminary data or other data support this expected difference? • What are my type I and type II errors? (Remind ourselves, what are these?) • Do I anticipate subject dropoff/dropout? If yes, how much? • Do I anticipate other problems with adherence to my intervention? If yes, how much? • Given the above, what is my required sample size? • Can I feasibly achieve that sample size (availability of sufficient number of persons, retention)

Why do we care about sample size? • Alternatively, if your sample size is pre-determined (e.g., you have a particular study population available to you), grant reviewers expect to see a well-considered POWER estimation, that should include things like: • What is the outcome like (mean, SD, %, etc.) in a comparator group? • What is my sample size? • What is my type I error? • How different do I expect my intervention/exposed group to be from my comparator group? Does my preliminary data or other data support this expected difference? • Do I anticipate subject dropoff/dropout? If yes, how much? • Do I anticipate other problems with adherence to my intervention? If yes, how much? • Given the above, what is my statistical power (1 – type II error) to detect the group differences I expect to see?

Why do we care about sample size? • Alternatively, if your sample size is pre-determined (e.g., you have a particular study population available to you), grant reviewers expect to see a well-considered POWER estimation, that should include things like: • What is the outcome like (mean, SD, %, etc.) in a comparator group? • What is my sample size? • What is my type I error? • Do I anticipate subject dropoff/dropout? If yes, how much? • Do I anticipate other problems with adherence to my intervention? If yes, how much? • Given the above, what is the power to detect different magnitudes of group differences? • Are these differences reasonably expected (from my preliminary data or other literature)?

Why do we care about sample size? • What does NOT work on grant review • This is the number I usually use, and I have found “significant” results • This is the number I can afford • This is the number of patients I have available • Other studies use similar numbers and have found “significant” results

Sample size/power estimation is based on EFFECT SIZE for a specific outcome • Effect size (ES) is a measure of the size of your group difference, association, etc. • Effect sizes are a function of • The magnitude of the intervention • An estimate of the variability • A “signal-to-noise” measure. Larger effect size = larger signal (e.g., group difference), and lower noise (less variability, lower SD).

Sample size/power estimation is based on EFFECT SIZE for a specific outcome • Large effect sizes require fewer subjects • Small effect sizes can require a LOT of subjects • Effect sizes must be clinically relevant as well as statistically validWe can always estimate a sample size required to statistically detect a given effect size (however small), and we may even be able to recruit this sample size into our study. However, the differences we are detecting may not be at all clinically relevant.

Effect size measures depend on the type of outcome variable and study design • Continuous measures: group differencesES = (Mean Group 1 – Mean Group 2) / Pooled SD • Continuous measures: associations (correlations)ES = correlation coefficient, regression coefficient

Effect size measures depend on the type of outcome variable and study design • Proportion measures: group differencesES = Proportion Group 1 – Proportion Group 2 • There are MANY other types of effect sizes, depending on the type of outcome and study design

Effect size: A common problem when investigators approach statisticians “What sample size do I need to detect a 10% difference in mean blood pressure in my two groups?” What is missing in terms of what is needed to compute effect size? a. What is the mean difference in BP I want to detect? (Numerator of ES) b. What is the SD of BP in this population? (Denominator of ES)

Effect size: A common problem when investigators approach statisticians “What sample size do I need to detect a 10% difference in mean blood pressure in my two groups (e.g., standard treatment vs new treatment)?” What is the mean difference in BP? If investigator can give mean BP in comparator group (standard treatment), we can then compute the 10% difference. What is the SD of BP in this population? Need to get this from other studies or investigator’s clinical population.

Where do you get that effect size data? • Go to the literature • Similar studies, what can you extrapolate?Mean differences, SDs, proportionsBy group • Unfortunately, such data are not always available

Where do you get that effect size data? • Do a small pilot study • Determine the variability (SD) of the data • NIH R21, R34 funds pilot studies to obtain such data

Where do you get that effect size data? • What if you still don’t have all the information you need? • Compute power/sample size for a range of plausible effect sizesPlausible: 1) Effects found for other interventions in the same population2) What could be considered “clinically relevant”

Pulling from the literature: This is useful! We get the means and SDs by group

Pulling from the literature: Not so useful! The means are here, but no SDs are provided. We need to search more in the article, or use SDs from another source.

Pulling from the literature: If this is all you got… Problem: Have to “guess” at means, etc. What are those error lines? SD? SE?

Pulling from the literature: SD or SE? 1. Be clear on whether the measure of variability is SD or SE (not all papers clearly specify which they are reporting. 2. Since SE = SD * , one can simply calculate the SD, given SE and n (sample size) 3. If one mistakenly uses SE in sample size calculation, measure of variability will be TOO small, effect size TOO large, and sample size severely underestimated!

Sample size also depends on the size of Type I and Type II errors • Objective: make an inference about a population, based on information contained in a sample

Hypothesis Testing: Type I and Type II Errors • How wrong are you willing to be? • Type I, or α error: the probability that your conclusion to reject H0 was wrong (false positive). The percentage of the time you will reject the null hypothesis Conclude: (Something is going on!) when there really is nothing going on Standard Type 1, α = 0.05 (will be wrong 5% of the time when we decide to reject H0). • Use smaller type I error for things like testing of multiple outcomes (to account for multiple hypothesis testing)

Hypothesis Testing: Type I and Type II Errors • Type II, or β error: the probability that your conclusion to not reject H0 was wrong (false negative). the percentage of the time you will fail to reject the null hypothesis Conclude: (Nothing is going on!) when there really is something going onPower = 100*(1- β)Standard Type II, β = 0.20 (80% power) • Use smaller type II error (higher power) to be conservative (e.g., trials often use 90% power).

Relationships to Sample Size and Power • Lower type I or type II error: Large sample size • Smaller effect size (signal-to-noise): Larger sample size • For a given sample size: • Smaller effect size: lower power to detect

Sample Size as a Function of Effect Size Effect Size: Continuous Measure (mean difference per SD)

And for a large effect size…. Effect Size: Continuous Measure (mean difference per SD)

Power as a function of effect size: Fixed sample size Effect Size: Continuous Measure (mean difference per SD)

And for a small sample size (n=5 per group) Effect Size: Continuous Measure (mean difference per SD)

Repeated Measures (Longitudinal studies) • In computing sample size or power, we need a couple of other pieces of information • How many repeated measures are planned per subject?In general, more repeated measures requires smaller sample size, or provides higher power. • What is the within-subject correlation (intraclass correlation) of the repeated measures? Higher within-subject correlations gives less independent informationLower within-subject correlations gives more independent information

Other considerations in estimating sample size • What level of dropout is expected? Bump up sample size so required sample size is obtained.E.g., If anticipate 10% dropout, recruit n/0.90 subjects to obtain n outcomes. • What level of adherence is expected?May need to bump up sample size if want to analyze within adherent subjects only.

Sample size programs • We strongly recommend speaking with a statistician before finalizing your sample size or power calculations. However, here are some programs that can help you get estimates for feasibility: • G*Power (free) • PS (free) • Quanto (free) • High-end sample size software: nQuery, PASS • There are also sample size and power routines within statistical packages (SAS, Stata, etc.)

Next workshop on Mar 11: Statistical analysis: What statistical methods are appropriate for my study design and data collected? SC CTSI | www.sc-ctsi.org

CTSI Biostatistics (BERD): a resource for you at USC • Biostatisticians to help you with study design, sample size estimation, data management plan, statistical analyses, and summarizations of your methods and results • Recharge center • To request a consult:http://sc-ctsi.org/bbr-consult

Southern California Clinical and Translational Science Institute:

Southern California Clinical and Translational Science Institute:

Presentation Transcript

Clinical and Translational Science Award (CTSA)

Southern California Bioinformatics Summer Institute

The UCSF Clinical and Translational Science Institute and Campus Core Resources

University of Pittsburgh Clinical and Translational Science Institute Update: 2012

Science Careers and the Clinical and Translational Science Network

Frontiers Clinical and Translational Science Unit

Irving Institute for Clinical and Translational Research

The Institute for Clinical and Translational Science (ICTS)

Southern California Bioinformatics Summer Institute

Clinical and Translational Science Strategic Plan (CSP)

Clinical and Translational Science Center

Clinical Translational Science Awards (CTSA): Outline

Clinical Translational Science Institute (CTSI)

Tufts Clinical and Translational Science Institute

University of Rochester Clinical and Translational Science Institute CTSI Ethics Resource

Columbia University Irving Institute for Clinical and Translational Science

Clinical and Translational Science Awards (CTSA)

University of Iowa Institute for Clinical and Translational Science

Irving Institute for Clinical and Translational Research

University of Rochester Clinical and Translational Science Institute Overview

Opportunities for Collaborative Clinical and Translational Science:

University of California, San Francisco Clinical and Translational Sciences Institute (CTSI)