Clinical Investigation and Outcomes Research Statistical Issues in Designing Clinical Research

Clinical Investigation and Outcomes ResearchStatistical Issues in Designing Clinical Research Marcia A. Testa, MPH, PhD Department of Biostatistics Harvard School of Public Health

Objective of Presentation • Introduce statistical issues that are critical for designing a clinical research study and developing a research protocol, with a special focus on • Power and sample size • Readings: Textbook, Designing Clinical Research, Chapter 6, Estimating Sample Size and Power: Applications and Examples and Chapter 19, Writing and Funding a Research Proposal.

Research Proposal • Carefully planning the analytical and statistical methods is critical to any clinical research study. • An outline of the main elements of a research proposal are listed in Table 19.1 of your textbook. • Two very important components of the “Research Methods” section are “Measurements” and “Statistical Issues”.

Measurement and Statistical Components of the Research Proposal • Measurements – you first must define: • Main predictor/independent variables (intervention, if an experiment) • Potential confounding variables • Outcome/dependent variables • Statistical Issues – you should outline: • Approach to statistical analyses • Hypothesis, sample size and power

Power and Sample Size • Depends upon: • measurements and study hypotheses • statistical test used on primary outcome • study design • variability and precision of the dependent measure • alpha (type 1 error) • effect size • number of hypotheses that you want to test

Types of Errors Confidence

What is power analysis? • Statistical power: • the probability of correctly identifying a trend or effect (Being correct that there is a trend or effect) • Statistical confidence: • the probability of not identifying a false trend or effect (false alarm) (Being correct that there is no trend)

Why is power analysis useful in research planning? • Clinical research is primarily concerned with detecting improvements or worsening due to interventions or risk factors. • Power analysis answers the question: “How likely is my statistical test to detect important clinical effects given my research design?”

Elements of power analysis • Variability (stochastic noise in the data) • Sample Size (accumulated information) • time horizon (e.g.,survival analysis) • sampling frequency • replication • Confidence level/statistical test Beyond our control Within our control

Dealing with Variability • Variability is often a barrier to detection • Minimizing variability is often the goal • Choose variables with a high signal to noise ratio • Caution: these variables may be less sensitive to change • Sample within a more homogeneous population • Caution: greater homogeneity often means we are limiting the inferences we can make. At the extreme we would have highly reliable results that are for the most part clinically irrelevant

Power Curve optimal use of resources 100 effective but inefficient use of resources low return on investment 0 The Balancing of Cost and Power High Cost Low Cost

Limitations of power analysis • Power analysis is only as good as the information you provide: • How appropriate is the statistical test? • How accurate are estimates of variability? • Power analysis can’t tell you: • How much power is enough? • What’s a meaningful change?

How much power is enough? • There is no universal standard • What is more important? • Not missing a trend? • Power > Confidence • Reporting a false trend? • Confidence > Power • Usual range for confidence and power: 80-95%

Power = 95% for declines = -17% What’s a meaningful change? Example: You want to be able to detect the withdrawal (decline in participation) from a diet and exercise program under “usual care”. effect size

What’s a meaningful change? Power = 80% for decline = -13% effect size

What’s a meaningful change? Power = 60% for decline = -10% effect size

17% withdrawal after one year After 5 years, more than 50% of your original population has withdrawn for the program Is a 17% annual withdrawal rate clinically meaningful? • Example – Start with 100 patients

What is a meaningful change? • Most people would concur that a withdrawal of 17% per year from a diet and exercise is large enough to be considered clinically meaningful. • However, how meaningful are smaller withdrawal rates (13%, 10%, 5% 1%) ? • This can not be answered using a formula. • The answer will depend on the research objectives and clinical objectives, and the research budget.

1. Chose Statistical Hypothesis • Set up Null Hypotheses: Examples 1. Compare sample group mean to a known value 0 • Mean of group = Known population mean (H0 :   0 ) vs (HA :   0 ) 2. Compare two sample group means • Mean Group (1) = Mean Group (2) (H0 : 1  2 ) (HA : 1 2 ) Note – because you are testing “not equal” in the alternative hypothesis () you have selected a “two-tailed test”.

2. Chose Statistical Test • There are many statistical tests that are used in clinical research, however, for this presentation we will restrict ourselves to the following:

3. Chose Alpha Level and Effect Size • Alpha = 0.05 – probability of rejecting the null when the null is true = 5% • You will conclude that there was a difference 5% of the time when there really was no difference • You would like to detect a difference of X units or higher(effect size) in one group as compared to the other

4. Need SD of the Dependent Variable • Use historical data if available • Use the sample data from a feasibility study (e.g. 15 subjects) • If you have no data to serve as a reference, you have to make an educated guess. Here’s a trick if your data is mound shaped and approximately normal. • Choose a representative lowand highfrom your clinical experience, take the difference and divide by 4. • = ((high) – (low))/4 = SD estimate

5. Calculate a Standard Effect Size • Effect size/standard deviation = standardized effect size • Choose the  error • Remember Power = 1 - , so a type 2 error of 0.20 yields a power of 0.80 • Power is the probability of failure to reject the null hypothesis when the null hypothesis is false  concluding no difference when there really is a difference.

Power and Sample Size Example Continuous Glucose Monitoring Diabetes Study

CGM Study Two-group Comparison • How many subjects do we need to be able to detect a difference in CGM mean daily glucose between patients on Lantus and Apidra insulin versus Premix analogue insulin? • Before you can answer this question, you must gather some more information.

Break down the problem • CGM glucose at Week 12 = dependent variable of interest • Want to compare two groups – each group has different patients • Simple independent t-test • Need SD of daily glucose • Need to specify how large an effect you want to detect

Data from feasibility study Week 12 Data

CGM Study Two-group Comparison • Compare Lantus & Apidra to Premix at 12 weeks • Feasibility data available on 15 patients • Independent t test will be used • Alpha = 0.05, beta = 0.20, 2-tailed test • Power = 0.80 • Null: Mean L & A = Mean Premix (H0 : 1  2 ) (HA : 1 2 )

CGM Study Two-group Comparison • SD from 15 patient feasibility study = 33

Estimating Sample Size of CGM Study • Alpha = 0.05 for 1-sided, 0.025 for 2-sided test • Beta = 0.20, hence, power = 0.80 • Clinically meaningful effect = 10 mg/dL difference (based upon clinical judgement) • SD CGM glucose = 33 (from feasibility study) • Standardized effect = 10/33 = 0.30 • Check Appendix 6A in textbook for power • Table 6A says you need 176 subjects per treatment group for a total of 352 subjects.

http://www.epibiostat.ucsf.edu/biostat/sampsize.html This is a directory of where you can find sample size and power programs

Useful Power Calculator Website http://www.stat.uiowa.edu/~rlenth/Power/

Online Power/Sample Size Power = 0.9, detect ES = 0.35 (11.6 mg/dL) N = 175 per group Power = 0.8, detect ES = 0.3 (10 mg/dL) N = 175 per group

Online Power/Sample Size Power = 0.8, detect ES = 0.5 (16.5 mg/dL) Sample size = 64/group Power = 0.8, detect ES = 1.57 (52 mg/dL) Sample size = N1 = 7, N2 = 8

CGM Study Paired Comparison • Useful for longitudinal assessments • CGM Study – You want to detect a decrease between Week 12 and Week 24 of 10 mg/dL • You only have one group of patients, but they are measured on two separate occasions (Week 12 and Week 24).

15 patient feasibility study What is the mean glucose, parameter for the subjects at Week 12 versus Week 24? For simplicity, we are going to use the single value summary mean glucose levels at Wk 12 and Wk 24. Wk 0 Wk 12 Wk 24

Power and Sample Size for Paired t-test Power = 0.8, detect ES = 0.30 Need 92 subjects or “pairs” (Wk 12 and Wk 24) data. Remember with two independent groups we needed 175 subjects per group for a total of 350 subjects. When patients serve as their own control, you need “fewer” subjects to detect an equivalent effect size (ES) with the same power.

HRV Study Correlation and Multiple Regression • Single-Group Study • Session 1 – Signal 1  HRV • Session 1 – Signal 2  BP • Demographic variables = Age, Gender • Clinical characteristics = Disease Status • Suppose you want to look at associations between HRV, BP, demographic and clinical characteristics -- use bivariate correlation coefficient for 2 variables of multiple regression R2 multiple predictors.

Power and Sample Size for Correlations (H0: r = 0) Power = 0.97, r = 0.4, ES = R2 = 0.16, Sample size = 85 Power = 0.0.80, r = 0.3, ES = R2 = 0.09, Sample size = 85 Only 1 “regressor” or predictor

Power and Sample Size for Correlations (H0: r = 0) Power = 0.80, r = 0.3, ES = R2 = 0.09, Sample size = 177, if number of predictor variables = 10 Power = 0.80, r = 0.3, ES = R2 = 0.09, Sample size = 139, if number of ipredictor variables = 5

Power and Sample Size for Test of Two Proportions You want to detect a difference between two proportions. Example: How many patients do you need in each group to detect a difference in the numbers of patients who adhere to diet and exercise at the end of 5 years. Old Program = 0.5 Adhere New Program= 0.7 Adhere Alpha = 0.05, Power = 0.8. You will need 103 individuals in each group.

Final Points • Design your study such that you will have a sufficient number of subjects to be able to detect the effects that are clinically meaningful (high power). • If you have a limited budget, and you can not afford to increase your sample size to the necessary levels, and lowering the variability is not feasible, you should consider alternative designs and hypotheses rather than proceeding with a study design with low power.

Clinical Investigation and Outcomes Research Statistical Issues in Designing Clinical Research