Statistical Methods in Clinical Research James B. Spies M.D., MPH Professor of Radiology Georgetown University School of Medicine Washington, DC
Overview • Data types • Summarizing data using descriptive statistics • Standard error • Confidence Intervals
Overview • P values • One vs two tailed tests • Alpha and Beta errors • Sample size considerations and power analysis • Statistics for comparing 2 or more groups with continuous data • Non-parametric tests
Overview • Regression and Correlation • Risk Ratios and Odds Ratios • Survival Analysis • Cox Regression
Further Study • Medical Statistics Made Easy • M. Harris and G. Taylor • Informa Healthcare UK • Distributed in US by: Taylor and Francis 6000 Broken Sound Parkway, NW Suite 300 Boca Raton, FL 33487 1-800-272-7737
Types of Data • Discrete Data-limited number of choices • Binary: two choices (yes/no) • Dead or alive • Disease-free or not • Categorical: more than two choices, not ordered • Race • Age group • Ordinal: more than two choices, ordered • Stages of a cancer • Likert scale for response • E.G. strongly agree, agree, neither agree or disagree, etc.
Types of data • Continuous data • Theoretically infinite possible values (within physiologic limits) , including fractional values • Height, age, weight • Can be interval • Interval between measures has meaning. • Ratio of two interval data points has no meaning • Temperature in celsius, day of the year). • Can be ratio • Ratio of the measures has meaning • Weight, height
Types of Data • Why important? • The type of data defines: • The summary measures used • Mean, Standard deviation for continuous data • Proportions for discrete data • Statistics used for analysis: • Examples: • T-test for normally distributed continuous • Wilcoxon Rank Sum for non-normally distributed continuous
Descriptive Statistics • Characterize data set • Graphical presentation • Histograms • Frequency distribution • Box and whiskers plot • Numeric description • Mean, median, SD, interquartile range
HistogramContinuous Data No segmentation of data into groups
Frequency Distribution Segmentation of data into groups Discrete or continuous data
Box and Whisker Plots Popular in Epidemiologic Studies Useful for presenting comparative data graphically
Numeric Descriptive Statistics • Measures of central tendency of data • Mean • Median • Mode • Measures of variability of data • Standard Deviation • Interquartile range
Sample Mean • Most commonly used measure of central tendency • Best applied in normally distributed continuous data. • Not applicable in categorical data • Definition: • Sum of all the values in a sample, divided by the number of values.
Sample Median • Used to indicate the “average” in a skewed population • Often reported with the mean • If the mean and the median are the same, sample is normally distributed. • It is the middle value from an ordered listing of the values • If an odd number of values, it is the middle value • If even number of values, it is the average of the two middle values. • Mid-value in interquartile range
Sample Mode • Infrequently reported as a value in studies. • Is the most common value • More frequently used to describe the distribution of data • Uni-modal, bi-modal, etc.
Interquartile range • Is the range of data from the 25th percentile to the 75th percentile • Common component of a box and whiskers plot • It is the box, and the line across the box is the median or middle value • Rarely, mean will also be displayed.
Standard Error • A fundamental goal of statistical analysis is to estimate a parameter of a population based on a sample • The values of a specific variable from a sample are an estimate of the entire population of individuals who might have been eligible for the study. • A measure of the precision of a sample in estimating the population parameter.
Standard Error • Standard error of the mean • Standard deviation / square root of (sample size) • (if sample greater than 60) • Standard error of the proportion • Square root of (proportion X 1 - proportion) / n) • Important: dependent on sample size • Larger the sample, the smaller the standard error.
Clarification • Standard Deviation measures the variability or spread of the data in an individual sample. • Standard error measures the precision of the estimate of a population parameter provided by the sample mean or proportion.
Standard Error • Significance: • Is the basis of confidence intervals • A 95% confidence interval is defined by • Sample mean (or proportion) ± 1.96 X standard error • Since standard error is inversely related to the sample size: • The larger the study (sample size), the smaller the confidence intervals and the greater the precision of the estimate.
Confidence Intervals • May be used to assess a single point estimate such as mean or proportion. • Most commonly used in assessing the estimate of the difference between two groups.
Confidence Intervals Commonly reported in studies to provide an estimate of the precision of the mean.
P Values • The probability that any observation is due to chance alone assuming that the null hypothesis is true • Typically, an estimate that has a p value of 0.05 or less is considered to be “statistically significant” or unlikely to occur due to chance alone. • The P value used is an arbitrary value • P value of 0.05 equals 1 in 20 chance • P value of 0.01 equals 1 in 100 chance • P value of 0.001 equals 1 in 1000 chance.
P Values and Confidence Intervals • P values provide less information than confidence intervals. • A P value provides only a probability that estimate is due to chance • A P value could be statistically significant but of limited clinical significance. • A very large study might find that a difference of .1 on a VAS Scale of 0 to 10 is statistically significant but it may be of no clinical significance • A large study might find many “significant” findings during multivariable analyses. “a large study dooms you to statistical significance” Anonymous Statistician
P Values and Confidence Intervals • Confidence intervals provide a range of plausible values of the population mean • For most tests, if the confidence interval includes 0, then it is not significant. • Ratios: if CI includes 1, then is not significant • The interval contains the true population value 95% of the time. • If a confidence interval range is very wide, then plausible value might range from very low to very high. • Example: A relative risk of 4 might have a confidence interval of 1.05 to 9, suggesting that although the estimate is for a 400% increased risk, an increased risk of 5% to 900% is plausible.
Errors • Type I error • Claiming a difference between two samples when in fact there is none. • Remember there is variability among samples- they might seem to come from different populations but they may not. • Also called the error. • Typically 0.05 is used
Errors • Type II error • Claiming there is no difference between two samples when in fact there is. • Also called a error. • The probability of not making a Type II error is 1 - , which is called the power of the test. • Hidden error because can’t be detected without a proper power analysis
Errors Test Result Truth
Sample Size Calculation • Also called “power analysis”. • When designing a study, one needs to determine how large a study is needed. • Power is the ability of a study to avoid a Type II error. • Sample size calculation yields the number of study subjects needed, given a certain desired power to detect a difference and a certain level of P value that will be considered significant. • Many studies are completed without proper estimate of appropriate study size. • This may lead to a “negative” study outcome in error.
Sample Size Calculation • Depends on: • Level of Type I error: 0.05 typical • Level of Type II error: 0.20 typical • One sided vs two sided: nearly always two • Inherent variability of population • Usually estimated from preliminary data • The difference that would be meaningful between the two assessment arms.
One-sided vs. Two-sided • Most tests should be framed as a two-sided test. • When comparing two samples, we usually cannot be sure which is going to be be better. • You never know which directions study results will go. • For routine medical research, use only two-sided tests.
Sample size for proportions Stata input: Mean 1 = .2, mean 2 = .3, = .05, power (1-) =.8.
Sample Size for Continuous Data Stata input: Mean 1 = 20, mean 2 = 30, = .05, power (1-) =.8, std. dev. 10.
Statistical Tests • Parametric tests • Continuous data normally distributed • Non-parametric tests • Continuous data not normally distributed • Categorical or Ordinal data
Comparison of 2 Sample Means • Student’s T test • Assumes normally distributed continuous data. T value = difference between means standard error of difference • T value then looked up in Table to determine significance
Paired T Tests • Uses the change before and after intervention in a single individual • Reduces the degree of variability between the groups • Given the same number of patients, has greater power to detect a difference between groups
Analysis of Variance • Used to determine if two or more samples are from the same population- the null hypothesis. • If two samples, is the same as the T test. • Usually used for 3 or more samples. • If it appears they are not from same population, can’t tell which sample is different. • Would need to do pair-wise tests.
Non-parametric Tests • Testing proportions • (Pearson’s) Chi-Squared (2) Test • Fisher’s Exact Test • Testing ordinal variables • Mann Whiney “U” Test • Kruskal-Wallis One-way ANOVA • Testing Ordinal Paired Variables • Sign Test • Wilcoxon Rank Sum Test
Use of non-parametric tests • Use for categorical, ordinal or non-normally distributed continuous data • May check both parametric and non-parametric tests to check for congruity • Most non-parametric tests are based on ranks or other non- value related methods • Interpretation: • Is the P value significant?
(Pearson’s) Chi-Squared (2) Test • Used to compare observed proportions of an event compared to expected. • Used with nominal data (better/ worse; dead/alive) • If there is a substantial difference between observed and expected, then it is likely that the null hypothesis is rejected. • Often presented graphically as a 2 X 2 Table
Chi-Squared (2) Test • Chi-Squared (2) Formula • Not applicable in small samples • If fewer than 5 observations per cell, use Fisher’s exact test
Correlation • Assesses the linear relationship between two variables • Example: height and weight • Strength of the association is described by a correlation coefficient- r • r = 0 - .2 low, probably meaningless • r = .2 - .4 low, possible importance • r = .4 - .6moderate correlation • r = .6 - .8 high correlation • r = .8 - 1 very high correlation • Can be positive or negative • Pearson’s, Spearman correlation coefficient • Tells nothing about causation
Correlation Source: Harris and Taylor. Medical Statistics Made Easy
Correlation Perfect Correlation Source: Altman. Practical Statistics for Medical Research
Correlation Correlation Coefficient .3 Correlation Coefficient 0 Source: Altman. Practical Statistics for Medical Research
Correlation Correlation Coefficient .7 Correlation Coefficient -.5 Source: Altman. Practical Statistics for Medical Research
Regression • Based on fitting a line to data • Provides a regression coefficient, which is the slope of the line • Y = ax + b • Use to predict a dependent variable’s value based on the value of an independent variable. • Very helpful- In analysis of height and weight, for a known height, one can predict weight. • Much more useful than correlation • Allows prediction of values of Y rather than just whether there is a relationship between two variable.