Session 3.1: Revision of Day 2 Funded through the ESRC’s Researcher Development Initiative Meta-analysis Department of Education, University of Oxford
Questions • Based on the meta20 data used in the practical, how similar are the results of the fixed, random, and multilevel models? • Which model seems the most appropriate for this data and why?
The formulae for multilevel models can be simplified to fixed and random effects • If between-study variance = 0, the multilevel model simplifies to the fixed effects regression model • If no predictors are included the model simplifies to random effects model • If the level 2 variance = 0 , the model simplifies to the fixed effects model
Questions • Why is it important to consider study quality? Does it make a difference? • What are the approaches to evaluating study quality? • Would you consider excluding “low quality” studies?
Meta-analyses should always include subjective and/or objective indicators of study quality. In Social Sciences, there is some evidence that studies with highly inadequate control for pre-existing differences leads to inflated effect sizes. However, it is surprising that other indicators of study quality make so little difference. In medical research, studies are largely limited to RCTs where there is MUCH more control than in social science research. Here, there is evidence that inadequate concealment of assignment and lack of double-blind inflate effect sizes, but perhaps only for subjective outcomes. These issues are likely to be idiosyncratic to individual discipline areas and research questions. Study quality: Does it make a difference? 7
Evaluation of study quality • Sometimes this is a global holistic (subjective) rating. In this case it is important to have multiple raters to establish inter-rater agreement • Sometimes study quality is quantified in relation to objective criteria of a good study, e.g. • larger sample sizes; • more representative samples; • better measures; • use of random assignment; • appropriate control for potential bias; • double blinding, and • low attrition rates (particularly for longitudinal studies) 8
Evaluation of study quality • Requires designing the code materials to include adequate questions about the study design and reporting • May require additional analyses: • Quality weighting (Rosenthal, 1991) • Use of kappa statistic in determining validity of quality filtering for meta-analysis (Sands & Murphy, 1996) • Regression with “quality” as a predictor of effect size (see Valentine & Cooper, 2008) 9
Quality assessment • Uses of information about quality: • Narrative discussion of impact of quality on results • Display study quality and results in a tabular format • Weight the data by quality - not usually recommended because scales are not always consistent (see Juni et al., 1999; Valentine & Cooper, 2008) • Subgroup analysis by quality • Include quality as a covariate in meta-regression
Caveats & considerations... • Quality of reporting • It is often hard to separate quality of reporting from methodological quality - “Not reported” is not always “Not done” • Should code “Unspecified” as distinct from “Criteria not met” • Consult as many materials as possible when developing coding materials • There are some good references for systematic reviews that also apply to meta-analysis • Torgerson’s (2003) book • Gough’s (2007) framework • Search Cochrane Collaboration (http://www.cochrane.org/) for “assessing quality”
Questions • What is publication bias? • Why is it considered to be an issue for meta-analysis? Describe the arguments for inclusion and exclusion of unpublished studies • What are some ways of assessing the impact of potential publication bias?
Exclusion? • Inclusion of unpublished papers is likely to add considerable “noise” to the analyses • Methods typically used to find unpublished papers are ‘ad hoc’ • the resulting selection of studies is likely to be less representative of the unknown population of studies than is the population of published studies, and typically will be more homogeneous (White, 1994). • “Whether bias is reduced or increased by including unpublished studies cannot formally be assessed as it is impossible to be certain that all unpublished studies have been located (Smith & Egger, 1998)” • Hence, for published papers, there is a more clearly defined population of studies to which to generalize than would be the case if unpublished studies were included.
Inclusion? • A central goal of meta-analysis is to be inclusive. • Meta-analyses call for a balance between practicality and comprehensiveness (Durlak & Lipsey, 1991). • A compromise is for meta-analysts to report how they dealt with publication bias
Methods for assessing publication bias • Examination of the focus of the included studies • Fail-safe N • Trim & Fill • Sensitivity analysis (Vevea & Woods, 2005)
Conclusion: publication bias • “The author of the meta-analysis, then, is faced with a logically impossible task: to show that publication bias is not a problem for the particular data set at hand. We describe the task as logically impossible because it amounts, in essence, to an attempt at confirming a null hypothesis. (Vevea & Woods, 2005, p. 438)” • Different methods can attempt to address (assess?) the issue, but none is perfect. • At least we can conclude that the Fail-safe N is not appropriate! • Include unpublished studies?
Questions • What are some examples of data structures that might require 3-level modelling? • What were the results of the 3-level multilevel model based on the peer review dataset? What does this mean in the ‘real world’ for the peer review process (i.e., does there appear to be a gender bias)?
Results of peer review study • The mean effect size was very small, but significantly in favour of men. However, the results did not generalise across studies (there was study-to-study variation). • The effect size was significantly moderated by the type; it was almost exactly 0 for grants and in favour of men for fellowship applications. This difference was not moderated or mediated by other moderators. • There appeared to be some discipline effects (bias in favour of men in social sciences) and country effects (large bias in favour of men for Sweden). However, when all “main” effects included, discipline effects disappeared. • For Grant Proposals there was no evidence of any effect of gender on outcome.