Stat 401 Lab 5

1. Stat 401 Lab 5 James D. Abbey Iowa State University

2. Contact Information James Abbey Email: jdabbey@iastate.edu Website: www.public.iastate.edu/~jdabbey

3. Website Address: www.public.iastate.edu/~jdabbey JMP Tutorials and Data Large number of high resolution videos for JMP operations Data set in JMP format for HW New Site for the On-Campus Lab Lecture notes and other useful handouts Check for material to print before coming to Lab

4. Homework Schedule for today Homework 4 topics and examples Homework 3 review Homework 2 review Questions? Ask away!

5. Homework 4 Topics Know the difference between observational studies and (randomized) experiments Assignment of units to groups/treatments makes an experiment Inference of causality only when randomized Selection from a population (note that we are observing traits of the population) Inference back to the population of interest only when randomly sampled See text pg. 9 for the graphic of randomization types

6. Homework 4 Summary Transformations Why transform? To meet assumptions of normality and equal variance. See pages 57-74 for an in-depth discussion. Note that the log(mean) is not the mean of the log(values) Beware of transforming data that is already normal and has equal variance!

7. Homework 4 Summary Take the data sets 1, 2, 3, 50 and 20, 25, 35, 500 Mean1: 14 Mean2: 145 Log values Data set 1: 0, 0.69314, 1.0986, 3.91202 Mean: 1.4259 vs. Log(14) = 2.63905 Data set 2: 2.9957, 3.2188, 3.5553, 6.2146 Mean: 3.9961 vs. Log (145) = 4.9767 Thus, we see that the log(mean) is not the same as the mean of the log(values)

8. Why transform? Original Data Data 1 Data 2 Too much skew!

9. Transformed Data Data 1 Data 2 Not ideal, but closer to equal variance and normality. This data may have needed a stronger transformation.

10. Homework 4 Summary So, how do we interpret the log values? Since the log(mean) is not equal to the mean of log(values), we cannot simply back-transform to our original units However, we can still get a useful result As the book states on pages 68-73, we actually have a median ratio estimator when comparing groups

11. Homework 4 Summary Results on this data set from JMP To get a useful result, we take the exponent of the values (e^value or exp(value)).

12. Homework 4 Summary Take log data set 2 � log data set 1 Summary Numbers Mean Difference: 2.57016 Sp = 1.6113 SE of the difference = 1.139 t-value: 2.447 for a 95% CI (6 df at 0.975 quantile) See pages 38-41 for formulas

13. Homework 4 Summary Finally, we get our estimates Mean Difference: 2.57016 95% CI: 2.57016 +/- (2.447 * 1.139) (-0.216973, 5.357293) Back-transforming ? Median Ratio Exp(2.57016) = 13.0697 (estimate of the ratio) Exp(-0.216973) = 0.8049517 Exp(5.358293) = 212.149 So, we are 95% confident that the median of data set 2 is between 0.804 and 212.149 times as great as the median of data set 1

14. Homework 4 Summary Text References Sp, SE and CI on pages 38-41 Discussion of back transformation on pages 68-73. Pay close attention to display 3.9 on page 71.

15. Homework 3 Summary Hypothesis testing The p-value. See pages 46-47. A small p-value indicates that Ho, our default reality, is unlikely. Hence, if the p-value is small enough, we reject Ho. A large p-value means that Ho is not an unlikely event, at least statistically. Possible Results: Fail to reject Ho (we do NOT accept Ho) Reject Ho in favor of Ha or find strong evidence against Ho

16. Hypothesis Tests General Notes

17. Homework 3 Topics Randomization Distributions We have two samples After treatment A, we observed values 1, 2, 3 After treatment B, we observed values 4, 5, 6 So, the mean difference is (1+2+3)/3 � (4+5+6)/3 = 2 � 5 = -3 Is this a common value? How many ways could these samples have appeared if there is no effect due to a treatment?

18. Homework 3 Topics Randomization Continued We now pool all the values in a jar. From this jar, we draw samples assuming that all results could happen for either treatment (e.g., the treatment does not affect the outcome). Draw 1, 5, 6 for treatment A, which leaves 2, 3, 4 for treatment B. New difference is (1+5+6)/3 � (2+3+4)/3 = 2/3. Repeat this until you exhaust all the possibilities. In this example, we have only 20 total ways to draw the samples. How likely is a value as or more extreme than the one we observed? In other words, how many samples have values as or more extreme than -3 or 3? The p-value is (# as or more extreme) / (total possible). See pages 11-14, 44-46 and 95-98.

19. Homework 2 Summary Standard deviation and standard error Know the distinction of a sample distribution vs. a sampling distribution (see pages 29-40 for an extensive discussion) Sample distribution comes from a sample Associated with a standard deviation Sampling distribution is a theoretical device The standard error is the measure of spread for the sampling distribution. Measure the spread of estimates of the sample mean y-bar.

20. Homework 2 Summary The five number Summary Want 5 numbers with mean and median 9 and 0 standard deviation? 9,9,9,9,9 Extreme Values The mean is heavily impacted by outliers/large values. The median is resistant. Backing out information from a CI:

21. Homework 2 Summary See the above p-value discussion Remember, a small p-value ? evidence against Ho Also, see the above null and alternative hypothesis discussion Do we ever �accept� Ho? Finally, you need to understand experimental vs. observational studies. Review the slide above if necessary.

Stat 401 Lab 5

Stat 401 Lab 5

Presentation Transcript

STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Lab #5

Stat 401 Lab November 28, 2005

STAT 135 LAB 11

Lab Management (MLLM-401)

Lab #5

Lab Management (MLLM-401)

Lab Management ( MLLM-401 )

Stat 405 Lab # 4

Lab Management (MLLM-401)

STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS

Lab Management (MLLM-401)

Stat 470-5

STAT 135 LAB 12

STAT 135 LAB 10

STAT 135 LAB 3

Stat 135 Lab 5

STAT 110 - Section 5 Lecture 5

Stat 112 -- Notes 5

My stat lab

LAB 5

Welcome to My Stat Lab