Effect Sizes

CAMPBELL COLLABORATION Effect Sizes

Overview Overview of Effect Sizes Effect Sizes from the d Family Effect Sizes from the r Family Effect Sizes for Categorical Data Connections Between the Effect-Size Metrics

Effect sizes • Meta-analysis expresses the results of each study using a quantitative index of effect size (ES). • ESs are measures of the strength or magnitude of a relationship of interest. • ESs have the advantage of being comparable (i.e., they estimate the same thing) across all of the studies and therefore can be summarized across studies in the meta-analysis. • ESs are relatively independent of sample size.

Effect sizes • An effect size is a quantitative index that represents the results of a study. • Effect sizes make study results comparable so that … • results can be compared across studies, or • results can be summarized across studies. • Examples of effect-size indices include • standardized mean differences (ds), and • correlation coefficients (rs).

Effect sizes • A crucial conceptual distinction is between effect-size … • estimates, computed from studies (sample effect sizes), and • parameters (population or true effect sizes). • We want to make inferences about effect-size parameters using effect-size estimates.

Types of effect size • Most reviews use effect sizes from one of three families of effect sizes: • the d family, including the standardized mean difference, • the r family, including the correlation coefficient, and • the odds ratio (OR) family, including proportions and other measures for categorical data.

Types of effect size • Test statistics (e.g., t statistics, F tests, and so on) are not ideal ES because they depend on: • Effect size and • Sample size (n) • That is, Test Statistic = f(Effect Size, sample size)

Types of effect size • The significance level (a.k.a. the pvalue) is also not an ideal ES because it depends on the test statistic and n. • Studies with the same effect sizes can get differentp values, simply because they differ in sample size. • Studies with fundamentally different results can get the samep values, because they differ in size. • Thus, thep value is a misleading index of effect size.

The choice of effect size • A particular index is chosen to make results from different studies comparable to one another. The choice depends on the ... • question of interest for the review, • designs of studies being reviewed, • statistical analyses that have been reported, and • measures of the outcome variable.

The choice of effect size • When we have continuous data (means and standard deviations) for two groups, we typically compute a raw mean difference or a standardized difference – an effect size from the d family, • When we have correlational data, we typically compute a correlation (from the r family), or • When we have binary data (the patient lived or died, the student passed or failed), we typically compute an odds ratio, a risk ratio, or a risk difference.

Features of most effect sizes • We introduce some notation for a common case – the treatment/control comparison. • Let be the mean posttest score in the treatment group, be the mean control-group posttest score, and SYpooled be the pooled within-groups standard deviation for the Y scores (i.e., the t-test SD). Then we may compute standardized T - C differences using posttest means as

Features of most effect sizes • Remember that all statistical estimators are estimating some parameter. What parameter is being estimated by gpost? • The answer is, the population standardized mean difference, usually denoted by the Greek letter delta, where population means and the population SD s appear in place of the sample values:

Expected values of effect sizes • Some ES indices are biased in small samples. It is common to correct for this small-sample bias. • The posttest effect size gpost is biased, with expected value • E[gpost ] = δ/c(m), • where c(m)=1 - 3/(4m-1), and m = nT+ nC– 2. • In generalmis the df for the appropriate t test, here, the two-sample t test.

Expected values of effect sizes • So now we can correct for bias: d = c(m)*gpost. • The expected value of d is δ. • The correlation is also biased, and can be corrected via • ru= r [1- (1 - r2)/(2n-2)] . • Proportions are not biased, and do not need correction.

Variances of effect sizes • Effect-size indices also have variances that can be estimated using data from the individual study from which the ES is obtained. • Below we provide the variances of many ES indices, noting that in all cases the variance is an inverse function of the study sample size. Thus smaller studies have larger variances, representing less precise information about the effect of interest. The ES variance is a key component of nearly all statistical analyses used in meta-analysis.

Statistical properties (Variances) • Often the variances of ES indices are also conditional on (i.e., are functions of) the parameter values. Consider the variance of d: which is a function of d. Below we introduce transformations that can be used with some ES indices to remove the parameter from the variance (i.e., to stabilize the variances).

Variance of the standardized mean difference • As d increases (becomes more unusual or extreme) the variance also increases. We are more uncertain about extreme effects. • The variance also depends on the sample sizes, and as the ns increase, the variance decreases. Large studies provide more precise data; we are more certain about effects from large studies.

Statistical properties (Variances) • Variances of effect sizes are not typically equal across studies, even if stabilized. This is because most variances depend on sample sizes, and it is rare to have identical-sized samples when we look at sets of studies. • Thus,homoscedasticity assumptions are nearly always not met by most meta-analysis data!! • This is why we do not use “typical” statistical procedures (like t tests and ANOVA) for most analyses in meta-analysis.

Quick Examples: Common Study Outcomes for Treatment-Control Meta-analyses

Common study outcomes for trt/ctrl meta-analysis • Treatment (T)/control (C) studies: • Above we introduced the standardized T - C difference in posttest means: • We also can compute T-C differences in other metrics and for other outcomes.

Common study outcomes for trt/ctrl meta-analysis: d family • We may also compute standardized T - C differences in: • gain or difference score means for D = Y – X • standardized by the • difference SD • standardized by the • posttest SD • covariate adjusted means

Common study outcomes for trt/ctrl meta-analysis: Categorical outcomes • differences between proportions: • odds ratios for proportions: • log odds ratios: • differences between arcsine-transformed proportions

Less common study outcomes for trt/ctrl meta-analysis • differences between transformed variances: • 2 log(ST) - 2log(SC) or 2 log(ST/SC) • probability values from various tests of Trt/Ctrl differences, such as the t test (shown), ANOVA F test, etc.

Other common study outcomes for meta-analysis: d family • Single group studies • standardized posttest - pretest mean difference: • covariate adjusted means: • proportions (e.g., post-trt counts for outcome A): • arcsine proportions:

Other common study outcomes for meta-analysis • odds ratios for single proportions • correlations r • correlation matrices r1, ..., rp(p-1) • variance ratios Spost/Spreor2log(Spost/Spre) • “variance accounted R2, Eta2 , etc. for” measures or

Common study outcomes for meta-analysis • We next treat each of the three families of effect sizes in turn : • Effect Sizes from the d Family • Effect Sizes from the r Family • Effect Sizes for Categorical Data

More Detail on Effect Sizes: The d Family

Standardized mean difference • The standardized mean difference may be appropriate when • Studies use different (continuous) outcome measures • Study designs compare the mean outcomes in treatment and control groups • Analyses use ANOVA, t tests, and sometimes chi-squares (if the underlying outcome can be viewed as continuous)

Standardized mean difference: Definition

Computing standardized mean difference • The first steps in computing d effect sizes involve assessing what data are available and what’s missing. You will look for: • Sample size and unit information • Means and SDs or SEs for treatment and control groups • ANOVA tables • F or t tests in text, or • Tables of counts

Sample sizes • Regardless of exactly what you compute you will need to get sample sizes (to correct for bias and compute variances). • Sample sizes can vary within studies so check initial reports of n against • n for each test or outcome or • dfassociated with each test

Calculating effect-size estimates from research reports • A major issue is often computing the within-group standard deviation Spooled. • The standard deviation determines the “metric” for standardized mean differences. • Different test statistics (e.g., t vs. multi-way ANOVA F ) use different SD metrics. • In general it is best to try to compute or convert to the metric of within-group (i.e., Treatment and Control) standard deviations.

Calculating effect sizes from means and SDs • Glass’s or Cohen’s effect size is defined as and where nT and nCare group sample sizes, and are group variances. Also recall that d = g *[1 - 3/(4m-1)], wherem = nT + nC – 2.

Variance of the standardized mean difference • Most notable for statistical work in meta-analysis is the fact that each of the study indices has a “known” variance. These variances are often conditional on the parameter values. • For d the variance is The variance is computed by substituting d for d.

Confidence interval for the standardized mean difference • The 95% confidence interval for d is

Equal n example: Pooled standard deviation: Effect size: Calculating effect sizes from means and SDs

Calculating effect sizes from means and SDs Data: Unbiased effect size: 95% CI: g= -0.72, d = -0.72*[1-3/(4*58 - 1)] = -0.72*.987 = -0.71 -0.71 + 1.96*(.27) = -0.71 + 0.53 or -1.24 to -0.18

Calculating effect sizes: Practice Answers are at the end of the section. • Compute the values of d, the SEs, and the 95% CIs for these two studies:

Calculating effect sizes from the independent groups F test • If the study’s design is a two group (treatment-control) comparison and the ANOVA F statistic is reported, then • You must determine the sign from other information in the study.

Calculating effect sizes from the independent groups t test • When the study makes a two group (treatment-control) comparison and the t statistic is reported, we can also compute d easily. Then

Calculating effect sizes from the two-way ANOVA • Exactly how we compute d for the two-way ANOVA depends on the information reported in the study. • We consider two cases: • the full ANOVA table is reported, and • the cell means and SDs are reported.

Calculating effect sizes from the two-way ANOVA table • Suppose A is the treatment factor and B is the other factor in this design. We pool the B and AB factors with within-cell variation to get where MSWithin is the MSW for the one-way design with A as the only factor. Then d is computed as

Calculating effect sizes from the two-way ANOVA cell means and SDs • Suppose we have J subgroups within the treatment and control groups, with means and sample sizes nij (i = 1 is the treatment group and i = 2 is the control group). We first compute the treatment and control group means:

Calculating effect sizes from the two-way ANOVA cell means and SDs • Then compute the standard deviation Sp via where SSB is the between cells sum of squares within the treatment and control groups • Then calculate the effect size as

Calculating effect sizes from the two-way ANOVA: Variants • There are, of course, variants of these two methods. • For example, you might have the MSW, but not the within cell standard deviations (the Sij). • Then you could use dfW MSW in place of the sum of weighted Sij2 values in the last term of the numerator in the expression for S2pooled on the previous slide.

Calculating effect sizes from the one-way AANCOVA • Suppose a study uses a one-way ANCOVA with a factor that is a treatment-control comparison. • Can we use the ANCOVA F statistic to compute the effect size? NO! Or rather, if we do we will not get a comparable effect-size measure. • The error term used in the ANCOVA F test is not the same as the unadjusted within (treatment or control) group variance, and is usually smaller than the one-way MSW.

Calculating effect sizes from the one-way AANCOVA • The F statistic is F = MSB/MSW, but • MSW is the covariate adjusted squared SD within the treatment and control groups, and • MSB is the covariate adjusted mean difference between treatment and control groups. • To get the SD needed for a comparable effect size, we must reconstruct the unadjustedSD within treatment and control groups.

Calculating effect sizes from the one-way AANCOVA • The unadjusted SD is where ris the covariate-outcome correlation, so

Calculating effect sizes from the one-way AANCOVA • The procedure is equivalent to • computing the g using the ANCOVA F, as if it was from a one-way ANOVA (we will call this gUncorrected) • then “correcting” the g for covariate adjustment via

Calculating effect sizes from the one-way A ANCOVA • The effect size given previously uses the adjusted means in the numerator. • However, the reviewer needs to decide whether unadjusted or covariate adjusted mean differences are desired. In randomized experiments, they will not differ much. • Unadjusted means may not be given in the research report, leading to a practical decision to calculate effects based on adjusted means.

Effect Sizes

Effect Sizes

Presentation Transcript

Calculating Effect Sizes

Calculating Effect Sizes for Single Subject Designs

Effect Sizes

Effect Sizes, Power Analysis and Statistical Decisions

Effect Sizes for Meta-analysis of Single-Subject Designs

Sizes

Portion Sizes and Serving Sizes

EVAL 6970: Meta-Analysis Effect Sizes and Precision: Part II

Sizes

Combining Effect Sizes

How to Calculate Effect Sizes for Meta-analysis in R

Effect Sizes & Power Analyses for k-group Designs

Effect Sizes

More Effect Sizes

The “NHST Controversy”– Confidence Intervals, Effect Sizes & Power Analyses

COMPUTING EFFECT SIZES

Combining Effect Sizes

Empirically Based Characteristics of Effect Sizes used in ANOVA

Sizes

Sizes

Effect Sizes and Power Review

Effect sizes, power, and violations of hypothesis testing

Effect Sizes

Effect Sizes

Presentation Transcript

Calculating Effect Sizes

Calculating Effect Sizes for Single Subject Designs

Effect Sizes

Effect Sizes, Power Analysis and Statistical Decisions

Effect Sizes for Meta-analysis of Single-Subject Designs

Sizes

Portion Sizes and Serving Sizes

EVAL 6970: Meta-Analysis Effect Sizes and Precision: Part II

Sizes

Combining Effect Sizes

How to Calculate Effect Sizes for Meta-analysis in R

Effect Sizes &amp; Power Analyses for k-group Designs

Effect Sizes

More Effect Sizes

The “NHST Controversy”– Confidence Intervals, Effect Sizes &amp; Power Analyses

COMPUTING EFFECT SIZES

Combining Effect Sizes

Empirically Based Characteristics of Effect Sizes used in ANOVA

Sizes

Sizes

Effect Sizes and Power Review

Effect sizes, power, and violations of hypothesis testing

Effect Sizes & Power Analyses for k-group Designs

The “NHST Controversy”– Confidence Intervals, Effect Sizes & Power Analyses