1 / 16

Lecture 4 Outline

Lecture 4 Outline. Chapter 1.3, 1.5 Control in Experimental Design Causal Inference in Observational Studies Summarizing Data Numerical methods Graphical methods. The meaning of the causal inference.

andrew
Télécharger la présentation

Lecture 4 Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 4 Outline • Chapter 1.3, 1.5 • Control in Experimental Design • Causal Inference in Observational Studies • Summarizing Data • Numerical methods • Graphical methods

  2. The meaning of the causal inference • In the motivation-creativity study, we concluded that there is a strong evidence that the “intrinsic questionnaire” treatment caused a difference in creativity compared to the “extrinsic questionnaire” treatment. • This difference could be caused by anything that differs between the two treatments, e.g, the actual questionnaire, the order in which the poems were judged, the relative preferences of the judges for the two treatments.

  3. Control in Experimental Design • The principle of control in experimental design is to make sure that all other factors besides the intended treatments are kept the same in the different groups. Then we can conclude that the intended treatment causes a difference between the groups. • Examples of control: • Use a placebo for the control group. • Double blinding • Judge poems in random order.

  4. Causal Inference in an Observational Study • In an observational study, we could assume that, unbeknownst to us, the subjects were randomly assigned to treatments (i.e., there are no confounding variables). Then we could use the randomization test p-value to make inferences. • But this is a “fictitious” probability model which might or might not be valid. • Inferences based on a randomized experiment are much stronger because the probability model on which they are based (that of random assignment) is known to be correct.

  5. Meaningful Comparisons • Main lesson of chapter: The best way to compare two (or more) groups is to do a random experiment or take a random sample. This avoids systematic bias due to confounding variables and selection bias • But if this is not possible, we should generally try to make the groups as “comparable” as possible by adjusting for known confounding variables and selection biases. Often times, important first steps are to use an appropriate control group and to compare the appropriate rate rather than absolute numbers

  6. Control Group • In a randomized experiment, we want the treatment and control group to be similar in every way except that one takes the treatment and the other doesn’t, i.e., we use placebo and double blinding. • Similarly in an observational study, we want to compare the treatment group to a control group that is as similar as possible. • Explain the need for a control group by criticizing the statement “A study on the benefits of vitamin C showed that 90% of the people suffering from a cold who take vitamin C get over their cold within a week”

  7. Use of Rates • An article in This Week magazine says that if you went “hurtling down the highway at 70 miles an hour, careening from side to side,” you would have four times as good a chance of staying alive if the time were seven in the morning than seven at night. • The evidence: “Four times more fatalities occur on the highways at 7 p.m. than 7 a.m.” • Does the conclusion follow from the evidence? • More accidents occur in clear weather than foggy weather. Is clear weather safer to drive in?

  8. Experimental Design Example: Salk Vaccine Field Trial • In the first half of the 20th century, polio was one of the most frightening diseases, striking hardest at young children and leaving many helpless cripples. • By the 1950s, Jonas Salk developed a vaccine for polio that had proved promising in laboratory experiments but it was necessary to try it in the real world before releasing it for general use.

  9. Designs for Salk Vaccine Field Trial • Historical Control Approach: Distribute the vaccine as widely as possible, through the schools, to see whether the rate of reported polio was appreciably less than usual during the subsequent season. • Observed Control Approach: Offer vaccination to all children in the second grade of participating schools and follow the polio experience not only in these children but in the first and third grade children. • Placebo Control Approach: Choose the control group from the same population as the treatment group – children whose parents consented to vaccination. Assign the treatment randomly. Give a placebo to control group. Do not tell doctors which group children belong to.

  10. Polio Example • Using figure 1 as an example, explain why a contemporaneous control group is needed in experiments where the effectiveness of a drug or vaccine is being tested? • Comment on the use of the number of cases. What would be a more appropriate indicator of whether polio incidence was increasing?

  11. Summarizing Data • Numerical summaries • Measures of center: mean, median mode • Measures of spread: sample standard deviation ( ) , interquartile range • Graphical Methods • Relative frequency histograms • Stem and leaf diagrams • Box plots

  12. Relative Frequency Histograms • A histogram is a graph that shows the relative frequency per unit of measurement. • The areas of blocks represent the percentage of observations in the blocks. • The heights of the blocks represent relative frequency per unit of measurement, i.e., crowding – percentage per unit of measurement • Histograms show broad features – particularly the center, spread and shape of the distribution (symmetric or skewed, light tailed or heavy tailed).

  13. Histograms in JMP • Click Analyze, then Distribution • Click red triangle next to Distributions, stack to see horizontal layout • Click tools, hand (grabber in Version 5) and click on histogram, drag to change position of bars. • To make histograms by group (e.g., sex discrimination), put Salaries in Y and Sex in By box. Click red triangle next to distributions and click Stack to display horizontally. For both groups, click red triangle next to distributions and click Uniform Scaling to display histograms on same scale.

  14. Stem and leaf diagrams • Cross between graph and table • Gives quick idea of distribution • Shows center, spreads and shapes as does histogram but also shows exact values, easy to construct by hand, median can be computed. • Stem and leaf plots in JMP • Click Analyze, Distribution • Put variable of interest in Y and click OK • Click red triangle next to variable of interest (e.g., salaries) and click Stem and Leaf • Back to back stem and leaf plots are not available in JMP but are useful (see page 17)

  15. Box plots • Middle 50% of a group of measurements is represented by a box. • Line in middle of box is the median • Various features of upper and lower 25% by other symbols • The whiskers extend to the farthest point that is within 1.5 interquartile ranges of upper and lower quartiles. (IQR=third quartile – first quartile) • Points farther away are shown individually as outliers. • Width of a box plot is chosen to make the box look nice; it does not represent any aspect of data.

  16. Box plots in JMP • To draw one box plot • Click Analyze, Distribution. • To draw side by side box plots • Click Analyze, Fit Y by X, putting outcome in Y and group variable in X • Click red triangle next to One Way Analysis, click Display Options and then click Box Plot (this produces box plots that display the box, the whiskers and all of the data points individually). • Display 1.13 shows histograms and box plots for four types of distributions.

More Related