Statistics for Medical Researchers

Statistics for Medical Researchers Hongshik Ahn Professor Department of Applied Math and Statistics Stony Brook University Biostatistician, Stony Brook GCRC

Experimental Design Descriptive Statistics and Distributions Comparison of Means Comparison of Proportions Power Analysis/Sample Size Calculation Correlation and Regression Contents

Experiment Treatment: something that researchers administer to experimental units Factor: controlled independent variable whose levels are set by the experimenter Experimental design Control Treatment Placebo effect Blind single blind, double blind, triple blind 1. Experimental Design

Randomization Completely randomized design Randomized block design: if there are specific differences among groups of subjects Permuted block randomization: used for small studies to maintain reasonably good balance among groups Stratified block randomization: matching 1. Experimental Design

Completely randomized design The computer generated sequence: 4,8,3,2,7,2,6,6,3,4,2,1,6,2,0,……. Two Groups (criterion: even-odd): AABABAAABAABAAA…… Three Groups: (criterion:{1,2,3}~A, {4,5,6}~B, {7,8,9}~C; ignore 0’s) BCAACABBABAABA…… Two Groups: different randomization ratios(eg.,2:3): (criterion:{0,1,2,3}~A, {4,5,6,7,8,9}~B) BBAABABBABAABAA…….. 1. Experimental Design

Permuted block randomization With a block size of 4 for two groups(A,B), there are 6 possible permutations and they can be coded as: 1=AABB, 2=ABAB, 3=ABBA, 4=BAAB, 5=BABA, 6=BBAA Each number in the random number sequence in turn selects the next block, determining the next four participant allocations (ignoring numbers 0,7,8 and 9). e.g., The sequence 67126814…. will produce BBAA AABB ABAB BBAA AABB BAAB. In practice, a block size of four is too small since researchers may crack the code and risk selection bias. Mixing block sizes of 4 and 6 is better with the size kept un known to the investigator. 1. Experimental Design

Methods of Sampling Random sampling Systematic sampling Convenience sampling Stratified sampling 1. Experimental Design

Random Sampling Selection so that each individual member has an equal chance of being selected Systematic Sampling Select some starting point and then select every k th element in the population 1. Experimental Design

Convenience Sampling Use results that are easy to get 1. Experimental Design

Stratified Sampling Draw a sample from each stratum 1. Experimental Design

Parameter: population quantity Statistic: summary of the sample Inference for parameters: use sample Central Tendency Mean (average) Median (middle value) Variability Variance: measure of variation Standard deviation (sd): square root of variance Standard error (se): sd of the estimate Median, quartiles, min., max, range, boxplot Proportion 2. Descriptive Statistics & Distributions

Normal distribution 2. Descriptive Statistics & Distributions

Standard normal distribution: Mean 0, variance 1 2. Descriptive Statistics & Distributions

Z-test for means T-test for means if sd is unknown 2. Descriptive Statistics & Distributions

Two-sample t-test Two independent groups: Control and treatment Continuous variables Assumption: populations are normally distributed Checking normality Histogram Normal probability curve (Q-Q plot): straight? Shapiro-Wilk test, Kolmogorov-Smirnov test, Anderson-Darling test If the normality assumption is violated T-test is not appropriate. Possible transformation Use non-parametric alternative: Mann-Whitney U-test (Wilcoxon rank-sum test) 3. Inference for Means

A clinical trial on effectiveness of drug A in preventing premature birth 30 pregnant women are randomly assigned to control and treatment groups of size 15 each Primary endpoint: weight of the babies at birth TreatmentControl n 15 15 mean 7.08 6.26 sd 0.90 0.96 3. Inference for Means

Hypothesis: The group means are different Null hypothesis (Ho):1 = 2 Alternative hypothesis (H1):12 Significance level:  = 0.05 Assumption: Equal variance Degrees of freedom (df): Calculate the T-value (test statistic) P-value: Type I error rate (false positive rate) Reject Ho if p-value <  Do not reject Ho if p-value >  3. Inference for Means

Previous example: Test at P-value: 0.026 < 0.05 Reject the null hypothesis that there is no drug effect. 3. Inference for Means

Confidence interval (CI): An interval of values used to estimate the true value of a population parameter. The probability 1-  that is the proportion of times that the CI actually contains the population parameter, assuming that the estimation process is repeated a large number of times. Common choices: 90% CI ( = 10%), 95% CI ( = 5%), 99% CI ( = 1%) 3. Inference for Means

3.Inference for Means CI for a comparison of two means: where A 95% CI for the previous example:

SAS programming for Two-Sample T-test Data steps : Click ‘File’ Click ‘Import Data’ Select a data source Click ‘Browse’ and find the path of the data file Click ‘Next’ Fill the blank of ‘Member’ with the name of the SAS data set Click ‘Finish’ Procedure steps : Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘File’ Click ‘Open By SAS Name’ Select the SAS data set and Click ‘OK’ Click ‘Statistics’ Click ‘ Hypothesis Tests’ Click ‘Two-Sample T-test for Means’ Select the independent variable as ‘Group’ and the dependent variable as ‘Dependent’ Choose the interested Hypothesis and Click ‘OK’ 3. Inference for Means

3. Inference for Means Click ‘File’ to import data and create the SAS data set. Click ‘Solution’to create a project to run statistical test Click ‘File’ to open the SAS data set. Click ‘Statistics’ to select the statistical procedure.

Mann-Whitney U-Test (Wilcoxon Rank-Sum Test) Nonparametric alternative to two-sample t-test The populations don’t need to be normal H0: The two samples come from populations with equal medians H1: The two samples come from populations with different medians 3. Inference for Means

Mann-Whitney U-Test Procedure Temporarily combine the two samples into one big sample, then replace each sample value with its rank Find the sum of the ranks for either one of the two samples Calculate the value of the z test statistic 3. Inference for Means

Mann-Whitney U-Test, Example Numbers in parentheses are their ranks beginning with a rank of 1 assigned to the lowest value of 17.7. R1 and R2: sum of ranks 3. Inference for Means

Hypothesis: The group means are different Ho: Men and women have same median BMI’s H1: Men and women have different median BMI’s p-value= 0.33, thus we do not reject H0 at =0.05. There is no significant difference in BMI between men and women. 3. Inference for Means

SAS Programming for Mann-Whitney U-Test Procedure Data steps : The same as slide 21. Procedure steps : Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘File’ Click ‘Open By SAS Name’ Select the SAS data set and Click ‘OK’ Click ‘Statistics’ Click ‘ ANOVA’ Click ‘Nonparametric One-Way ANOVA’ Select the ‘Dependent’ and ‘Independent’ variables respectively and choose the interested test Click ‘OK’ 3. Inference for Means

3. Inference for Means Click ‘File’ to open the SAS data set. Click ‘Statistics’ to select the statistical procedure. Select the dependent and independentvariables:

Paired t-test Mean difference of matched pairs Test for changes (e.g., before & after) The measures in each pair are correlated. Assumption: population is normally distributed Take the difference in each pair and perform one-sample t-test. Check normality If the normality assumption is viloated T-test is not appropriate. Use non-parametric alternative: Wilcoxon signed rank test 3. Inference for Means

Notation for paired t-test d= individual difference between the two values of a single matched pair µd= mean value of the differences dfor the population of paired data = mean value of the differences dfor the paired sample data sd= standard deviation of the differences dfor the paired sample data n = number of pairs 3. Inference for Means

Example: Systolic Blood Pressure OC:Oral contraceptive 3. Inference for Means

Hypothesis: The group means are different Ho: vs. H1: Significance level:  = 0.05 Degrees of freedom (df): Test statistic P-value: 0.009, thus reject Ho at =0.05 The data support the claim that oral contraceptives affect the systolic bp. 3. Inference for Means

Confidence interval for matched pairs 100(1-)% CI: 95% CI for the mean difference of the systolic bp:  (1.53, 8.07) 3. Inference for Means

SAS Programming for Paired T-test Data steps : The same as slide 21. Procedure steps : Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘File’ Click ‘Open By SAS Name’ Select the SAS data set and Click ‘OK’ Click ‘Statistics’ Click ‘ Hypothesis tests’ Click ‘Two-Sample Paired T-test for means’ Select the ‘Group1’ and ‘Group2’ variables respectively Click ‘OK’ (Note: You can also calculate the difference, and use it as the dependent variable to run the one-sample t-test) 3. Inference for Means

3. Inference for Means Click ‘File’ to open the SAS data set. Click ‘Statistics’ to select the statistical procedure. Put the two group variables into ‘Group 1’ and ‘Group 2’

Comparison of more than two means: ANOVA (Analysis of Variance) One-way ANOVA: One factor, eg., control, drug 1, drug 2 Two-way ANOVA: Two factors, eg., drugs, age groups Repeated measures: If there is a repeated measures within subject such as time points 3. Inference for Means

Example: Pulmonary disease Endpoint: Mid-expiratory flow (FEF) in L/s 6 groups: nonsmokers (NS), passive smokers (PS), noninhaling smokers (NI), light smokers (LS), moderate smokers (MS) and heavy smokers (HS) 3. Inference for means

Example: Pulmonary disease Ho: group means are the same H1: not all the groups means are the same P-value<0.001 There is a significant difference in the mean FEF among the groups. Comparison of specific groups: linear contrast Multiple comparison: Bonferroni adjustment (/n) 3. Inference for means

SAS Programming for One-Way ANOVA Data steps : The same as slide 21. Procedure steps : Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘File’ Click ‘Open By SAS Name’ Select the SAS data set and Click ‘OK’ Click ‘Statistics’ Click ‘ ANOVA’ Click ‘One-Way ANOVA’ Select the ‘Independent’ and ‘Dependent’ variables respectively Click ‘OK’ 3. Inference for Means

3. Inference for Means Click ‘File’ to open the SAS data set. Click ‘Solutions’ to select the statistical procedure. Select the dependent and Independentvariables:

Chi-square test Testing difference of two proportions n: #successes, p: success rate Requirement: & H0: p1 = p2 H1: p1 p2 (for two-sided test) If the requirement is not satisfied, use Fisher’s exact test. 4. Inference for Proportions

Decide significance level (eg. 0.05) Decide desired power (eg. 80%) One-sided or two-sided test Comparison of means: two-sample t-test Need to know sample means in each group Need to know sample sd’s in each group Calculation: use software (Nquery, power, etc) Comparison of proportions: Chi-square test Need to know sample proportions in each group Continuity correction Small sample size: Fisher’s exact test Calculation: use software 5. Power/Sample Size Calculation

Correlation Pearson correlation for continuous variables Spearman correlation for ranked variables Chi-square test for categorical variables Pearson correlation Correlation coefficient (r): -1<r<1 Test for coefficient: t-test Larger sample  more significant for the same value of the correlation coefficient Thus it is not meaningful to judge by the magnitude of the correlation coefficient. Judge the significance of the correlation by p-value 6. Correlation and Regression

Regression Objective Find out whether a significant linear relationship exists between the response and independent variables Use it to predict a future value Notation X: independent (predictor) variable Y: dependent (response) variable Multiple linear regression model Where is the random error Checking the model (assumption) Normality: q-q plot, histogram, Shapiro-Wilk test Equal variance: predicted y vs. error is a band shape Linear relationship: predicted y vs. each x 6. Correlation and Regression

6. Correlation and Regression

The regression equation is The mean blood pressure increases by 1.08 if weight (x1) increases by one pound and age (x2) remains fixed. Similarly, a 1-year increase in age with the weight held fixed will increase the mean blood pressure by 0.425. s=2.509 R2=95.8% Error sd  is estimated as 2.509 with df=13-3=10 95.8% of the variation in y can be explained by the regression. 6. Correlation and Regression

SAS Programming for Linear Regression Data steps : The same as slide 21. Procedure steps : Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘File’ Click ‘Open By SAS Name’ Select the SAS data set and Click ‘OK’ Click ‘Statistics’ Click ‘ Regression’ Click ‘Linear’ Select the ‘Dependent’ (Response) variable and the ‘Explanatory’ (Predictor) variable respectively Click ‘OK’ 6. Correlation and Regression

6. Correlation and Regression Click ‘File’ to open the SAS data set. Click ‘Solutions’ to select the statistical procedure. Select the dependent and explanatory variables:

Other regression models Polynomial regression Transformation Logistic regression 6. Correlation and Regression

Statistics for Medical Researchers

Statistics for Medical Researchers

Presentation Transcript

Health Statistics including Medical Statistics

Medical research: not just for researchers

Linux for Researchers

Skills for Researchers

Advanced Statistics for Researchers

Quantitative Methods for Researchers

Medical Statistics

Resources for Researchers

Medical Statistics Joan Morris (j.k.morris@qmul.ac.uk) Professor of Medical Statistics

Medical Statistics

Medical Statistics: Hypothesis Testing

Statistics in Medical Research

Medical research: not just for researchers

Medical statistics for cardiovascular disease Part 1

Medical Statistics Joan Morris (j.k.morris@qmul.ac.uk) Professor of Medical Statistics

A Short Guide for researchers/scholars interested in a Statistics

Applied Biostatistics_ Statistics for Medical Research - Edukite

Writing For Researchers

So You’re Doing Some Statistics: Tips for Researchers

Introduction to Medical Statistics

Basic Course in Statistics for Medical Doctors