Topics in Biostatistics Part 2

1. Topics in BiostatisticsPart 2 Sarah J. Ratcliffe, Ph.D. Center for Clinical Epidemiology and Biostatistics University of Penn School of Medicine

2. Outline Hypothesis testing Examples Interpreting results Resources

3. Hypothesis testing Steps: Select a one-sided or two-sided test. Establish the level of significance (e.g., ? = .05). Select an appropriate test statistic. Compute test statistic with actual data. Calculate degrees of freedom (df) for the test statistic.

4. Hypothesis testing Steps cont�d: Obtain a tabled value for the statistical test. Compare the test statistic to the tabled value. Calculate a p-value. Make decision to accept or reject null hypothesis.


6. Hypothesis testing: One-sided versus Two-sided Determined by the alternative hypothesis. Unidirectional = one-sided Example: Infected macaques given vaccine or placebo. Higher viral-replication in vaccine group has no benefit of interest. H0: vaccine has no beneficial effect on viral-replication levels at 6 weeks after infection. Ha: vaccine lowers viral-replication levels by 6 weeks after infection.

7. Hypothesis testing: One-sided versus Two-sided Bi-directional = two-sided Example: Infected macaques given vaccine or placebo. Interested in whether vaccine has any effect on viral- replication levels, regardless of direction of effect. H0: vaccine has no beneficial effect on viral-replication levels at 6 weeks after infection. Ha: vaccine effects the viral-replication levels.


9. Hypothesis testing: Level of Significance How many different hypotheses are being examining? How many comparisons are needed to answer this hypothesis? Are any interim analyses planned? e.g. test data, depending on results collect more data and re-test. => How many tests will be ran in total?

10. Hypothesis testing: Level of Significance ?total = desired total Type-I error (false positives) for all comparisons. One test ?1 = ?total Multiple tests / comparisons If ?i = ?total, then ??i > ?total Need to use a smaller ? for each test.

11. Hypothesis testing: Level of Significance Conservative approach: ?i = ?total / number comparisons Can give different ?�s to each comparison. Formal methods include: Bonferroni, Tukey-Cramer, Scheffe�s method, Duncan-Walker. O�Brien-Fleming boundary or a Lan and Demets analog can be used to determine ?i for interim analyses. Benjamini Y, and Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. JRSSB, 57:125-133.

12. Hypothesis testing Steps: Select a one-tailed or two-tailed test. Establish the level of significance (e.g., ? = .05). Select an appropriate test statistic. Compute test statistic with actual data. Calculate degrees of freedom (df) for the test statistic.

13. Hypothesis testing: Selecting an Appropriate test How many samples are being compared? One sample Two samples Multi-samples Are these samples independent? Unrelated subjects in each sample. Subjects in each sample related / same.

14. Hypothesis testing: Selecting an Appropriate test Are your variables continuous or categorical? If continuous, is the data normally distributed? Normality can be determined using a P-P (or Q-Q) plot. Plot should be approximately a straight line for normality. If not normal, can it be transformed to normality? Blindly assuming normality can lead to wrong conclusions!!!

15. Hypothesis testing: Selecting an Appropriate test



18. Hypothesis testing: Geometric versus Arithmetic mean Geometric mean of n positive numerical values is the nth root of the product of the n values. Geometric will always be less than arithmetic. Geometric better when some values are very large in magnitude and others are small. If geometric is used, log-transform the data before analyzing. Arithmetic mean of log-transformed data is the log of the geometric mean of the data E.g. t-test on log-transformed data = test for location of the geometric mean Langley R., Practical Statistics Simply Explained, 1970, Dover Press

20. Hypothesis testing: Selecting an Appropriate test Other tests are available for more complex situations. For example, Repeated measures ANOVA: >2 measurements taken on each subject; usually interested in time effect. GEEs / Mixed-effects models: >2 measurements taken on each subject; adjust for other covariates.

21. Hypothesis testing Steps: Select a one-tailed or two-tailed test. Establish the level of significance (e.g., ? = .05). Select an appropriate test statistic. Run the test.

22. Example 1 Expression of chemokine receptors on CD14+/CD14- populations of blood monocytes. Percent of cells positive by FACS.

24. Example 1 cont�d Continuous data, 2 samples => t-test, if normal OR => Wilcoxon rank sum or signed-rank sum test, if non-normal Are samples independent or paired? If independent, can test for equality of variances using a Levene�s test

25. Example 1 cont�d T-tests in excel =TTEST(L6:L15,M6:M15,2,2)

27. Example 1 cont�d Possible results for different assumptions:

28. Example 1 cont�d Which result is correct? Data are paired The differences for each subject are normally distributed. => Paired t-test p = .0095 There is a difference in the percentage of positive CD14+ and CD14- cells.

29. A graph of the 95% CIs for the means would give the impression there is no difference �

30. When it�s really the differences we are testing.

31. Example 1 cont�d Note: paired tests don�t always give lower p-values. A 1-sided test on the CCR5 values would give p-values of: p = 0.06 independent samples p = 0.11 paired samples WHY?

32. Example 1 cont�d The differences have a larger spread than the individual variables.

33. Example 2 Does the level of CCR5 expression on PBLs (basal or upregulated using lentiviral vector) determine the % of entry that occurs via CCR5? Two viruses 89.6 DH12

34. Example 2 cont�d

35. Example 2 cont�d How do we know if the slope of the line is significantly different from 0? Can perform a t-test on the slope estimate. For simple linear regression, this is the same as a t-test for correlation (= square root of R2).

36. Example 2 cont�d

37. Interpreting Results P-values Is there a statistically significant result? If not, was the sample size large enough to detect a biologically meaningful difference?

38. Online Resources Power / sample size calculators http://calculators.stat.ucla.edu/powercalc/ http://www.stat.uiowa.edu/~rlenth/Power/ Free statistical software http://members.aol.com/johnp71/javasta2.html#Freebies

39. BECC � Consulting Center www.cceb.upenn.edu/main/center/becc.html Hourly fee service Design and analysis strategies for research proposals; Selecting and implementing appropriate statistical methods for specific applications to research data; Statistical and graphical analysis of data; Statistical review of manuscripts.

Topics in Biostatistics Part 2

Topics in Biostatistics Part 2

Presentation Transcript

topics in biostatistics: part 1

Biostatistics course Part 16 Lineal regression

Biostatistics course Part 2 Types of studies in epidemiology

Biostatistics course Part 4 Probability

Biostatistics course Part 5 Binomial distribution

Biostatistics in Practice

Biostatistics in Practice

Biostatistics-Lecture 2

Biostatistics in Practice

Biostatistics in Practice

Biostatistics in Practice

Biostatistics in Practice

Topics to cover in 2 nd part

Biostatistics in Practice

Topics part 1

Biostatistics in Practice

Biostatistics in Practice

Biostatistics course Part 13 Effect measures in 2 x 2 tables

Biostatistics course Part 5 Binomial distribution

Biostatistics in Practice

Biostatistics Assignment 2

Biostatistics Methods – Part 2 - Edukite