Debunking Myths and Urban Legends about Meta-Analysis

Debunking Myths and Urban Legends about Meta-Analysis Herman Aguinis Dean’s Research Professor Professor of Organizational Behavior & Human Resources Director, Institute for Global Organizational Effectiveness Department of Management and Entrepreneurship Kelley School of Business, Indiana University http://mypage.iu.edu/~haguinis/

What is a Methodological Myth and Urban Legend? • AOM symposium organized by Vandenberg (2004, New Orleans) • SIOP symposia (2007-2011) • Edited book by Lance and Vandenberg (2009), Statistical and methodological myths and urban legends: Received doctrine, verity, and fable in the organizational and social sciences (NY: Routledge) • Feature topics in Organizational Research Methods (2006 and 2011) 2

What is a Methodological Myth and Urban Legend (MUL)? • Researchers are not immune to received doctrines and things we just know to be true (Lance, 2011, ORM). • These issues are “taught in undergraduate and graduate classes, enforced by gatekeepers (e.g., grant panels, reviewers, editors, dissertation committee members), discussed among colleagues, and otherwise passed along among pliers of the trade far and wide and from generation to generation” (Lance, 2011, ORM) • “…they are truly legends in that those applying it were told it was so and have thus accepted it and perpetuated it themselves in interactions with others” (Vandenberg, 2006, ORM) 3

Today’s Presentation: Overview • Importance and influence of meta-analysis • Seven meta-analytic practices, misconceptions, claims, and assumptions that have reached the status of myths and urban legends (MULs) • Nature of each myth/urban legend • Kernel of truth value for each MUL • Misunderstandings • Recommendations for meta-analytic practice 4

Why Seven MULs? • Aurelius Prudentius Clemens (aka Prudentius), Roman poet born in northern Spain in 348 • Prudentius wrote Psychomachia (“Battle for the Soul”) • Poem of about 1,000 lines describing the conflict between virtues and vices • Prudentius described seven virtues that can be used as “cures” or “remedies” that stand in opposition to each of the seven vices • Following Prudentius’ teachings, we will attempt to combat seven meta-analytic “vices” (i.e., MULs) with seven meta-analytic “virtues” (i.e., research-based recommendations) 5

Acknowledgments • Presentation based on: • Aguinis, H., Dalton, D. R., Bosco, F. A., Pierce, C. A., & Dalton, C. M. 2011. Meta-analytic choices and judgment calls: Implications for theory building and testing, obtained effect sizes, and scholarly impact. Journal of Management, 37: 5-38. • Aguinis, H., Pierce, C. A., Bosco, F. A, Dalton, D. R., & Dalton, C. M. 2011. Debunking myths and urban legends about meta-analysis. Organizational Research Methods. 14: 306-331. • Aguinis, H., Gottfredson, R. K., & Wright, T. A. in press. Best-practice recommendations for estimating interaction effects using meta-analysis. Journal of Organizational Behavior. • Dalton, D. R., Aguinis, H., Dalton, C. A., Bosco, F. A., & Pierce, C. A. 2011. Revisiting the file drawer problem in meta-analysis. Academy of Management Best Paper Proceedings. 6

Seven MULs • A single effect size can summarize a literature • Meta-analysis can make lemonade out of lemons • The file drawer problem biases meta-analytic results • Meta-analysis provides evidence about causal relationships • Meta-analysis has sufficient statistical power to detect moderating effects (i.e., contingency relationships) • A discrepancy between results of a meta-analysis and randomized controlled trials/experiments indicates that the meta-analysis is defective • Meta-analytic technical refinements lead to important scientific and practical advancements 7

Importance and Influence of Meta-analysis • Quantitative versus qualitative literature reviews • Summary effect sizes and across-study variance • Impact and influence of meta-analysis in management and related fields • Citations (about 3X primary-level studies) • Textbooks; Annual Review of Psychology • Evidence-based movement in management and medicine • Not easy to refute meta-analytic conclusions using primary-level research

MUL #1: A Single Effect Size can Summarize a Literature • Dual goals of MA • Summary effect size estimate • Effect size dispersion (i.e., moderators?) • MA reporting and citation practices • 20% of MAs provide no indication of dispersion (Geyskens et al., 2009, JOM) • 7 out of 1,489 citations of MA results consider dispersion (Carlson & Ji, 2011, ORM) • Primary-level studies are rarely homogenous • Apples and oranges • The study of fruit? 9

MUL #1: A Single Effect Size can Summarize a Literature • Kernel of truth value: A point estimate, usually the mean effect size, as any summary statistic, provides an estimate of the overall direction and strength of a relationship across the studies included in the meta-analysis • Misunderstandings: An examination of a single summary effect size to the exclusion of an examination of the variance of study-level effect sizes provides an incomplete picture because it fails to recognize the conditions under which a particular relationship may change in direction and strength • Recommendations: Report summary effect sizes but also the variance around the overall estimate as well as moderator variables that may explain this across-study variance 10

MUL #2: Meta-analysis can Make Lemonade out of Lemons • The Philosopher’s Stone • Transmuting lead into gold—Sir Isaac Newton • Garbage in-gold out? • A group of inconclusive and perhaps poorly designed studies (e.g., small N, unreliable measures) used to draw impressive conclusions with confidence • Integrity of meta-analytically derived estimates? • Primary study quality assessment • Assessment based on meta-analyst coding: Low inter-rater agreement and suspect construct validity • Assessment based on methodological risk factors: Cochrane Collaboration and threats to validity • MA is not immune to low-quality inputs • Meta-analysts should assess input quality; potential moderator 11

MUL #2: Meta-analysis can Make Lemonade out of Lemons • Kernel of truth value: Holding primary-level study quality constant, MA allows researchers to draw more accurate conclusions than primary-level studies due to larger samples and improved external validity and stability in the resulting effect-size estimates • Misunderstandings: Meta-analyses are not immune to low-quality studies and resulting estimates will be biased if the quality of the primary-level studies included in the meta-analysis is erroneously assumed to be uniformly high • Recommendations: Assess each primary-level study, and the resulting meta-analytically derived estimates, in terms of risk factors that can lead to biased results. If studies are excluded, be clear about the criteria for exclusion and these criteria should be logically consistent in terms of the goals of the study, as well as be taken into account when discussing the generalizability and/or limitations of the results 12

MUL #3: File Drawer Problem Biases Meta-analytic Results • The file drawer problem is one of the most enduring threats to the validity of meta-analytic results • Rosenthal (1979) has been cited 1,700 times • Assumption: • Null results are less likely to be published in primary-level studies, less likely to be included in meta-analytic reviews, resulting in an upwardly biased sample of primary-level effect size estimates and upwardly biased meta-analytically derived effect sizes • Failsafe-N is the most frequently applied method to fix the file drawer problem • Assumes missing studies show null result • Lack of definition of tolerable failsafe-N value 13

MUL #3: File Drawer Problem… • Considered to be a pervasive and insurmountable problem in management, psychology, education, medicine, and other fields • Fiedler (2011, PPS) issued the warning that “a file-drawer bias (Rosenthal, 1979) facilitates the selective publication of strong correlations while reducing the visibility of weak research outcomes” and issued the warning that “voodoo correlations” are everywhere • Potential bias in effect-size estimates is important for theory and practice • Calls into question the accuracy of all meta-analytic estimates and appropriateness of all MA-derived practices • What if the file drawer problem is a “methodological myth and urban legend?” 14

MUL #3: File Drawer Problem… • Four-study research program including more than 131,000 correlations to assess the extent of the file drawer problem. Question: What is the percentage of non-significant correlations…. • Study 1: …in 403 correlation matrices including 75,942 correlations published in Academy of Management Journal (AMJ), Journal of Applied Psychology (JAP), and Personnel Psychology (PPsych) between 1985 and 2009? • Study 2: …used as input in 51 meta-analyses published in organizational science journals including AMJ, JAP, and PPsych between 1982 and 2009 (i.e., total of 6,935 correlations)? • Study 3: … in 167 correlation matrices including 27,886 correlations reported in non-published manuscripts written by applied psychology and management scholars? • Study 4: …in 217 correlation matrices including 20,860 correlations reported in doctoral dissertations? 15

MUL #3: File Drawer Problem… • Study 1 (published primary-level studies) • 46.81% of those correlations are not statistically significant • Study 2 (correlations used as input in published meta-analyses) • 44.31% of those correlations are not statistically significant • Study 3 (non-published primary-level studies) • 45.45% of those correlations are not statistically significant • Study 4 (doctoral dissertations) • 50.78 % of those correlations are not statistically significant 16

MUL #3: File Drawer Problem… 17

MUL #3: File Drawer Problem… 18

MUL #3: File Drawer Problem… • We studied the percentage of statistically non-significant correlations—we did not assess differences in magnitude between published and non-published correlations directly • However, the probability of finding a correlation that is statistically significant is determined by three factors: (1) pre-specified Type I error rate (α), (2) sample size, and (3) effect size (Cohen, 1988) • We held α constant at .05, sample sizes across all of the seven data sets are similar, and the percentages of statistically non-significant correlations across all of our data sets are similar • Thus, given that there are no differences regarding alpha, N, and percentage of non-significant correlations across data sets, we can conclude that there are no differences in the magnitude of correlation coefficients comparing published versus non-published effect sizes 19

MUL #3: File Drawer Problem… • Kernel of truth value: Publication bias can affect meta-analytic results and there are many sources of bias (e.g., difficulty to publish results that contravene financial, political, ideological, professional, or other interests of investigators and research sponsors) • Misunderstandings: Did not find support for the assumption that meta-analyses may report upwardly biased effects due to “censorship” of statistically non-significant results • Recommendations: Cautionary statements that meta-analytic estimates are inflated and attempts to “fix” the file drawer problem do not seem justified in most cases (still, use trim-and-fill method to assess possible publication bias) 20

MUL #4: Meta-analysis Provides Evidence about Causal Relationships • MAs are passive observational studies • The language of causality • MA does not establish causality • Most inputs are cross-sectional studies • Even if experimental, causality not established • MA can provide preliminary evidence for causality • Meta-analytic structural equation modeling • Relationship consistency across settings • Temporality in a specific relationship 21

MUL #4: MA & Causality • Kernel of truth value: Meta-analysis can provide evidence regarding the plausibility of causal relationships by using meta-analytically derived correlation matrices as input for testing models positing competing causal relationships and assessing the consistency and temporality of a relationship across settings • Misunderstandings: Phrases such as the “effect” or “impact” of a variable on another are subtle and implicit statements about causality, but claims about knowledge of causal relationships based on meta-analytic results are typically not justified • Recommendations: Use meta-analysis to gather preliminary evidence, as well as produce hypotheses, regarding the possible causal relationship between variables 22

MUL #5: Meta-analysis has Sufficient Statistical Power to Detect Moderators • Large sample size, greater power, thus… • Assumed that MA will detect within- and between-group heterogeneity if it exists • Sources of power reduction • Small number of primary studies • Variable truncation, measurement error, scale coarseness, unequal proportions across subgroups • Downward corrections in observed across-study variance (i.e., corrections for artifacts) • May reduce power even further due to correlation between artifacts and moderators 23

MUL #5: Meta-analysis has Sufficient Statistical Power to Detect Moderators • Kernel of truth value: Because of its large sample size, a meta-analysis is likely to have greater statistical power than a primary-level study examining a similar research question • Misunderstandings: Many factors in addition to sample size have a detrimental effect on the statistical power of tests for moderating effects and, hence, such tests are usually performed at insufficient levels of statistical power • Recommendations: Follow recommendations offered by Aguinis, Gottfredson, and Wright (in press, J. of Organizational Behavior) and perform a priori power calculations as described by Hedges and Pigott (2004, Psych. Methods) 24

MUL #6: A Discrepancy Between Results of a Meta-analysis and RCTs Indicates that the Meta-analysis is Defective • Is the discrepancy problematic? • Equal effect sizes between MA and any one particular RCT would not be expected • One goal of MA is to explain variability in RCT-level effect size estimates • Discrepancy may be due to artifacts or moderators • Does the RCT/MA discrepancy suggest MA is defective? • RCTs: All that glitters is not gold standard • Both MAs and RCTs have their own methodological strengths and weaknesses • More appropriate question: • Whether and to what extent heterogeneity exists 25

MUL #6: MA versus RCT • Kernel of truth value: In some cases it may be that a discrepancy can be due to problems in the design and/or execution of a meta-analysis such as the inclusion of poor-quality studies (see MUL #2) • Misunderstandings: In most cases, a discrepancy between results of a meta-analysis and some RCTs is to be expected because the meta-analytic summary estimate is an average of the primary-level effect sizes • Recommendations: Do not focus on possible discrepancies between results of meta-analyses and RCTs; rather, use meta-analysis to explain across-study variability caused by artifactual sources of variance (e.g., sampling error) and possible moderator variables 26

MUL #7: Meta-analytic Technical Refinements Lead to ImportantScientific and Practical Advancements • Dozens of articles addressing refinements and choices and judgment calls that meta-analysts face have been published on a regular basis since the early 1980s • Design (e.g., elimination of studies and criteria used) • Data analysis (e.g., fixed versus random effects model) • Reporting of results (e.g., reporting fail-safe test results) • These choices and judgment calls presumably have important implications in terms of the meta-analytic results and subsequent implications for theory and applications • Monte Carlo simulation results: Assumptions 27

MUL #7: Technical Refinements • Effect of MA judgment calls on effect size estimates • Aguinis, Dalton, Bosco, Pierce, and Dalton (2011, JOM): Content analysis of 196 meta-analyses & 5,581 effect sizes in AMJ, JOM, JAP, PPsych, and SMJ from 1982 through 2009 • 21 choices and judgment calls (e.g., elimination of studies, model used, corrections for statistical and methodological artifacts such as measurement error and range restriction, reporting CIs) • Little substantive impact on resulting effect sizes • The more a meta-analysis attempts to test an existing theory, the larger the number of citations • The more a meta-analysis attempts to build new theory, the lower the number of citations • The magnitude of the derived effects is not related to the extent to which a meta-analysis is cited (i.e., particularism vs. universalism) 28

MUL #7: Technical Refinements • Kernel of truth value: Technical refinements lead to improved accuracy and estimation that may be meaningful in certain contexts only • Misunderstandings: In most organizational science contexts, technical refinements do not have a meaningful and substantive impact on theory or practice • Recommendations: Use the best and most accurate estimation procedures available, even if the resulting estimates are only marginally superior 29

Seven Meta-analytic “Virtues” • Report summary effect sizes but also the variance around the overall estimate as well as moderator variables that may explain this across-study variance • Assess each primary-level study, and the resulting meta-analytically derived estimates, in terms of risk factors that can lead to biased results • The practice of estimating the extent to which results aren’t vulnerable to the file drawer problem may be eliminated in many cases (i.e., no need to “fix” an upward bias that doesn’t exist)—still, use trim-and-fill method to assess possible publication bias 30

Seven Meta-analytic “Virtues” • Causal relationships: Use meta-analysis to gather preliminary evidence, as well as produce hypotheses, regarding the possible causal relationship between variables • Statistical power: Follow recommendations offered by Aguinis, Gottfredson, and Wright (in press, J. of Organizational Behavior) and perform a priori power calculations as described by Hedges and Pigott (2004, Psych. Methods) 31

Seven Meta-analytic “Virtues” • Do not focus on possible discrepancies between results of meta-analyses and RCTs; rather, use meta-analysis to explain across-study variability caused by artifactual sources of variance (e.g., sampling error) and possible moderator variables • Use the best and most accurate estimation procedures available, even if the resulting estimates are only marginally superior 32

Additional Resources • Please visit http://mypage.iu.edu/~haguinis/to download manuscripts and additional resources regarding meta-analysis and other methodological issues 33

Thank you! 34

Debunking Myths and Urban Legends about Meta-Analysis