NCRM Research Methods Festival University of Oxford Meta-analysis Department of Education
Today’s content • What is meta-analysis, • when and why we use meta-analysis, • Examples of meta-analyses • benefits and pitfalls of using meta-analysis, • defining a population of studies and finding publications, • coding materials, • inter-rater reliability, • computing effect sizes, • structuring a database, and • a conceptual introduction to analysis and interpretation of results based on fixed effects, random effects, and multilevel models.
Why a course on meta-analysis? • Meta-analysis is an increasingly popular tool for summarising research findings • Cited extensively in research literature • Relied upon by policymakers • Important that we understand the method, whether we conduct or simply consume meta-analytic research • Should be one of the topics covered in all introductory research methodology courses
Background... What is meta-analysis When and why we use meta-analysis
What is meta-analysis? • Systematic synthesis of various studies on a particular research question Do boys or girls have higher self-concepts? • Collect all studies relevant to a topic Find all published journal articles on the topic • An effect size is calculated for each outcome Determine the size/direction of gender difference for each study • “Content analysis” code characteristics of the study; age, setting, ethnicity, self-concept domain (math, physical, social), etc. • Effect sizes with similar features are grouped together and compared; tests moderator variables Do gender differences vary with age, setting, ethnicity, self-concept, domain, etc.
A blend of qualitative and quantitative approaches • Coding: the process of extracting the information from the literature included in the meta-analysis. Involves noting the characteristics of the studies in relation to a priori variables of interest (qualitative) • Effect size: the numerical outcome to be analysed in a meta-analysis; a summary statistic of the data in each study included in the meta-analysis (quantitative) • Summarise effect sizes: central tendency, variability, relations to study characteristics (quantitative)
When & why we use meta-analysis • One of the primary aims is to reach a conclusion related to the magnitude of the effect on a specific sample inferred to the population • Meta-analysis can test if the studies' outcomes show more variation than the variation that is expected because of sampling different research participant • In such cases, study characteristics (e.g., the measurement instrument used, population sampled, or aspects of the study‘s design) are coded. These characteristics are then used as predictor variables to analyze the excess variation in the effect sizes
What Disciplines do meta-analysis?ISI: 10 Feb, 2008. Topic: meta-analysis; Results found: , 21,286 What Disciplines do meta-analysis? ISI: 10 Feb, 2008. Topic: meta-analysis; Results found: , 21,286
Psychology: Where it all began • Amato, P. R., & Keith, B. (1991). Parental divorce and the well-being of children: A meta-analysis . Psychological Bulletin, 110, 26-46.Times Cited: 471 • Linn, M. C., & Petersen, A. C. (1985). Emergence and characterization of sex differences in spatial ability: A meta-analysis . Child Development, 56, 1479-1498.Times Cited: 570 • Johnson, D. W., & et al (1981). Effects of cooperative, competitive, and individualistic goal structures on achievement: A meta-analysis . Psychological Bulletin, 89, 47-62.Times Cited: 426 • Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A meta-analytic review . Personnel Psychology, 44, 703-742Times Cited: 387 • Hyde, J. S., & Linn, M. C. (1988). Gender differences in verbal ability: A meta-analysis . Psychological Bulletin, 104, 53-69.Times Cited: 316 • Iaffaldano, M. T., & Muchinsky, P. M. (1985). Job satisfaction and job performance: A meta-analysis . Psychological Bulletin, 97, 251-273.Times Cited: 263.
Education: Widely Cited Meta-analyses • De Wolff, M., & van IJzendoorn, M. H. (1997). Sensitivity and attachment: A meta-analysis on parental antecedents of infant attachment . Child Development, 68, 571-591.Times Cited: 340 • Wellman, H. M., Cross, D., & Watson, J. (2001). Meta-analysis of theory-of-mind development: The truth about false belief . Child Development, 72, 655-684.Times Cited: 276 • Cohen, E. G. (1994). Restructuring the classroom: Conditions for productive small groups . Review of Educational Research, 64, 1-35. Times Cited: 235 • Hansen, W. B. (1992). School-based substance abuse prevention: A review of the state of the art in curriculum, 1980-1990 . Health Education Research, 7, 403-430.Times Cited: 207 • Kulik, J. A., Kulik, C-L., Cohen, P. A. (1980). Effectiveness of Computer-Based College Teaching: A Meta-Analysis of Findings. Review of Educational Research, 50, 525-544.Times Cited: 198.
Business/Management: Widely Cited Meta-analyses • Sheppard, B. H., Hartwick, J., & Warshaw, P. R. (1988). The theory of reasoned action: A meta-analysis of past research with recommendations for modifications and future research . Journal of Consumer Research, 15, 325-343.Times Cited: 515 • Jackson, S. E., & Schuler, R. S. (1985). A meta-analysis and conceptual critique of research on role ambiguity and role conflict in work settings . Organizational Behavior and Human Decision Processes, 36, 16-78.Times Cited: 401 • Tornatzky Lg, Klein Kj. (1994). Innovation characteristics and innovation adoption-implementation - A meta-analysis of findings . IEEE Transactions On Engineering Management, 29, 28-4. Times Cited: 269. • Lowe KB, Kroeck KG, Sivasubramaniam N. (1996). Effectiveness correlates of transformational and transactional leadership: A meta-analytic review of the MLQ literature. Leadership Quarterly, 7, 385-425. Times Cited: 203. • Churchill GA, Ford NM, Hartley SW, et al. (1985). Title: The determinants of salesperson performance - A meta-analysis . Journal Of Marketing Research, 22, 103-118. Times Cited: 189.
Most Widely Cited Meta-analyses are in Medicine • Jadad AR, Moore RA, Carroll D, et al. (1996). Assessing the quality of reports of randomized clinical trials: Is blinding necessary? Controlled Clinical Trials, 17, 1-12. Times Cited:2008 • Boushey Cj, Beresford Saa, Omenn Gs, Et . Al. (1995). A quantitative assessment of plasma homocysteine as a risk factor for vascular-disease - Probable benefits of increasing folic-acid intakes. JAMA-journal Of The American Medical Assoc, 274, 1049-1057. Times Cited: 2,128 • Alberti W, Anderson G, Bartolucci A, et al. (1995). Chemotherapy in non-small-cell lung-cancer - A metaanalysis using updated data on individual patients from 52 randomized clinical-trials. British Medical Journal, 311, 899-909. Times Cited:1,591 • Block G, Patterson B, Subar A (1992). Fruit, vegetables, and cancer prevention - A review of the epidemiologic evidence. Nutrition And Cancer-an International Journal, 18, 1-29. Times Cited: 1,422
Cohen, P. A. (1980). Effectiveness of student-rating feedback for improving college instruction: A meta-analysis. Research in Higher Education, 13, 321-341. • Question: Does feedback from university students’ evaluations of teaching lead to improved teaching? • Teachers are randomly assigned to experimental (feedback) and control (no feedback) groups • Feedback group gets ratings, augmented, perhaps, with personal consultation • Groups are compared on subsequent ratings and, perhaps, other variables • Feedback teachers improved their teaching effectiveness by .3 standard deviations compared to control teachers on the Overall Rating item; even larger differences for ratings of Instructor Skill, Attitude Toward Subject, Student Feedback • Studies that augmented feedback with consultation produced substantially larger differences, but other methodological variations had little effect.
Hattie, J, & Marsh, H. W. (1996). The relationship between research and teaching -- a meta-analysis. Review of Educational Research, 66, 507-542. • Question: What is the correlation between university teaching effectiveness and research productivity? • Based on 58 studies and 498 correlations: • The mean correlation between measures of teaching effectiveness (mostly based on SETs) and research productivity was + .06; • This near-correlation was consistent across different disciplines, types of university, indicators of research, and icomponents of teaching effectiveness. • This meta-analysis was followed by Marsh & Hattie (2002) primary data study to more fully evaluate theoretical model
O’Mara, A. J., Marsh H. W., Craven, R. G., & Debus, R. (2006). Do self-concept interventions make a difference? A synergistic blend of construct validation and meta-analysis.Educational Psychologist, 41, 181–206. • Contention about global self-esteem versus multidimensional, domain-specific self-concept • Traditional reviews and previous meta-analyses of self-concept interventions have underestimated effect sizes by using an implicitly unidimensional perspective that emphasizes global self-concept. • We used meta-analysis and a multidimensional construct validation approach to evaluate the impact of self-concept interventions for children in 145 primary studies (200 interventions). • Overall, interventions were significantly effective (d = .51, 460 effect sizes). • However, in support of the multidimensional perspective, interventions targeting a specific self-concept domain and subsequently measuring that domain were much more effective (d = 1.16). • This supports a multidimensional perspective of self-concept
Hanson, R K., Morton-Bourgon, K. E. (2005). The Characteristics of Persistent Sexual Offenders: A Meta-Analysis of Recidivism Studies. Journal of Consulting & Clinical Psychology, 73, 1154-1163. • Examined predictors of sexual, nonsexual violent, and general (any) recidivism • 82 recidivism studies • Identified deviant sexual preferences and antisocial orientation as the major predictors of sexual recidivism for both adult and adolescent sexual offenders. Antisocial orientation was the major predictor of violent recidivism and general (any) recidivism • Concluded that many of the variables commonly addressed in sex offender treatment programs (e.g., psychological distress, denial of sex crime, victim empathy, stated motivation for treatment) had little or no relationship with sexual or violent recidivism
Bazzano, L. A., Reynolds, K., Holder, K. N., & He, J. (2006).Effect of Folic Acid Supplementation on Risk of Cardiovascular Diseases: A Meta-analysis of Randomized Controlled Trials. JAMA, 296, 2720-2726 • “Epidemiologic studies have suggested that folateintake decreases risk of cardiovascular diseases. However, theresults of randomized controlled trials on dietary supplementationwith folic acid to date have been inconsistent” • Included 12 studies with randomised control trials • The overall relative risks (95%confidence intervals) of outcomes for patients treated withfolic acid supplementation compared with controls were 0.95(0.88-1.03) for cardiovascular diseases, 1.04 (0.92-1.17) forcoronary heart disease, 0.86 (0.71-1.04) for stroke, and 0.96(0.88-1.04) for all-cause mortality. • Concluded folic acid supplementation does not reduce risk of cardiovascular diseases or all-cause mortality among participants with prior history of vascular disease.
Fiske, P., Rintamaki, P. T., Karvonen, E. (1998). Mating success in lekking males: a meta-analysis. Behavioral Ecology, 9, 328-338. • In lekking species (those that gather for competitive mating), a male's mating success can be estimated as the number of females that he copulates with. • Aim of the study was to find predictors of lekking species’ mating success through analysis of 48 studies • Behavioural traits such as male display activity, aggression rate, and lek attendance were positively correlated with male mating success. The size of "extravagant" traits, such as birds tails and ungulate antlers, and age were also positively correlated with male mating success. • Territory position was negatively correlated with male mating success, such that males with territories close to the geometric centre of the leks had higher mating success than other males. • Male morphology (measure of body size) and territory size showed small effects on male mating success.
Benefits of meta-analysis • Compared to traditional literature reviews: • (1) there is a definite methodology employed in the research analysis; and • (2) the results of the included studies are quantified to a standard metric thus allowing for statistical techniques for further analysis. • Therefore less biased and more replicable • Able to establish generalisability across many studies (and study characteristics).
Benefits of meta-analysis • Analyzing the results from a group of studies can allow more accurate data analysis • Increased power • Enhanced precision due to averaging out the sampling error deviations from the true values • Also, provides corrections to mean values with distortions due to measurement error and other possible artefacts
Publication bias • Studies that are published are more likely to report statistically significant findings. This is a source of potential bias. • The debate about using only published studies: • peer-reviewed studies are presumably of a higher quality VERSUS • significant findings are more likely to be published than non-significant findings • There is no agreed upon solution. However, one should retrieve all studies that meet the eligibility criteria, and be explicit with how they dealt with publication bias. Some methods for dealing with publication bias have been developed (e.g., Fail-safe N, Trim and Fill method).
Study quality • Increasingly, meta-analysts evaluate the quality of each study included in a meta-analysis. • Sometimes this is a global holistic (subjective) rating. In this case it is important to have multiple raters to establish inter-rater agreement (more on this later). • Sometimes study quality is quantified in relation to objective criteria of a good study, e.g. • larger sample sizes; • more representative samples; • better measures; • use of random assignment; • appropriate control for potential bias; • double blinding, and • low attrition rates (particularly for longitudinal studies)
Study quality: Does it make a difference? Meta-analyses should always include subjective and/or objective indicators of study quality. In Social Sciences there is some evidence that studies with highly inadequate control for pre-existing differences leads to inflated effect sizes. However, it is surprising that other indicators of study quality make so little difference. In medical research, studies largely limited to RCTs where there is MUCH more control than in social science research. Here there is evidence that inadequate concealment of assignment and lack of double-blind inflate effect sizes, but perhaps only for subjective outcomes. These issues are likely to be idiosyncratic to individual discipline areas and research questions. 26
Conducting a meta-analysis Defining a population of studies and finding publications Coding materials Inter-rater reliability Computing effect sizes Structuring a database
Establish research question • Comparison of treatment & control groups? What is the effectiveness of a reading skills program for treatment group compared to an inactive control group? • Pretest-posttest differences? Is there a change in motivation over time? • What is the correlation between two variables? What is the relation between teaching effectiveness and research productivity • Moderators of an outcome? Does gender moderate the effect of a peer-tutoring program on academic achievement?
Establish research question • Do you wish to generalise your findings to other studies not in the sample? • Do you have multiple outcomes per study. e.g.: • achievement in different school subjects; • 5 different personality scales; • multiple criteria of success • Such questions determine the choice of meta-analytic model • fixed effects • random effects • multilevel
Defining a population of studies and finding publications • Need to have explicit inclusion and exclusion criteria • The broader the research domain, the more detailed they tend to become • Refine criteria as you interact with the literature • Components of a detailed criteria • distinguishing features • research respondents • key variables • research methods • cultural and linguistic range • time frame • publication types
Locate and collate studies • Search electronic databases (e.g., ISI, Psychological Abstracts, Expanded Academic ASAP, Social Sciences Index, PsycINFO, and ERIC) • Examine the reference lists of included studies to find other relevant studies • If including unpublished data, email researchers in your discipline, take advantage of Listservs, and search Dissertation Abstracts International
Locate and collate studies • Inclusion process usually requires several steps to cull inappropriate studies • Example from Bazzano, L. A., Reynolds, K., Holder, K. N., & He, J. (2006).Effect of Folic Acid Supplementation on Risk of Cardiovascular Diseases: A Meta-analysis of Randomized Controlled Trials. JAMA, 296, 2720-2726
Develop code materials Code Sheet Code Book/manual 1 99 2 1 87 41 46 Publication type (1-5) Journal article Book/book chapter Thesis or doctoral dissertation Technical report Conference paper __ Study ID _ _ Year of publication __ Publication type (1-5) __ Geographical region (1-7) _ _ _ _ Total sample size _ _ _ Total number of males _ _ _ Total number of females
Pilot coding • Random selection of papers coded by both coders • Meet to compare code sheets • Where there is discrepancy, discuss to reach agreement • Amend code materials/definitions in code book if necessary • May need to do several rounds of piloting, each time using different papers
Interrater reliability • Percent agreement: Common but not recommended • Cohen’s kappa coefficient • Kappa is the proportion of the optimum improvement over chance attained by the coders, where a value of 1 indicates perfect agreement and a value of 0 indicates that agreement is no better than that expected by chance • Kappa’s over .40 are considered to be a moderate level of agreement (but no clear basis for this “guideline”) • Correlation between different raters • Intraclass correlation. Agreement among multiple raters corrected for number of raters using Spearman-Brown formula (r)
Exercise 1a • The purpose of this exercise is to explore various issues of meta-analytic methodology • Discuss in groups of 3-4 people the following issues in relation to the gender differences in smiling study (LaFrance et al., 2003) • Did the aims of the study justify conducting a meta-analysis? • Was selection criteria and the search process explicit? • How did they deal with interrater (coder) reliability?
Ex. 1a: discussion points • Extend previous meta-analyses, include previously untested moderators based on theory/empirical observations • Search process: detailed databases and 5 other sources of studies, search terms. Selection criteria: justification provided (e.g., for excluding under the age of 13). However, not clear how many studies were retrieved and then eventually included (compare with flow chart on slide 51) • Multiple coders (group of coders consisted of four people with two raters of each sex coding each moderator). Interrater reliability was calculated by taking the aggregate reliability of the four coders at each time using the Spearman–Brown formula
Effect size calculation • The effect size makes meta-analysis possible • It is based on the “dependent variable” (i.e., the outcome) • It standardizes findings across studies such that they can be directly compared • Any standardized index can be an “effect size” (e.g., standardized mean difference, correlation coefficient, odds-ratio), but must • be comparable across studies (standardization) • represent magnitude & direction of the relation • be independent of sample size
Effect size calculation Means and standard deviations Correlations d SE P-values F-statistics t-statistics 41
Effect sizes • Lipsey & Wilson (2001) present many formulae for calculating effect sizes from different information • However, need to convert all effect sizes into a common metric, typically based on the “natural” metric given research in the area. E.g.: • Standardized mean difference • Odds-ratio • Correlation coefficient 42
Effect size calculation • Standardized mean difference • Group contrast research • Treatment groups • Naturally occurring groups • Inherently continuous construct • Odds-ratio • Group contrast research • Treatment groups • Naturally occurring groups • Inherently dichotomous construct • Correlation coefficient • Association between variables research
Effect size calculation In an intervention study with experimental and control groups, the effect size might be: In a gender difference study, the effect size might be: Represents a standardized group contrast on an inherently continuous measure Uses the pooled standard deviation (some situations use control group standard deviation) Commonly called “d”
Effect size calculation • Represents the strength of association between two inherently continuous measures • Generally reported directly as r (the Pearson product moment coefficient)
r to d, d to r Alternatively: transform rs into Fisher’s Zr-transformed rs, which are more normally distributed 48
Effect size calculation • The odds-ratio is based on a 2 by 2 contingency table • The Odds-Ratio is the odds of success in the treatment group relative to the odds of success in the control group
Correction for bias • Hedges proposed a correction for small sample size bias (n < 20) • Must be applied before analysis