EPI-820 Evidence-Based Medicine

EPI-820 Evidence-Based Medicine LECTURE 9: Meta-Analysis I Mat Reeves BVSc, PhD

Objectives • Understand the rationale for quantitative synthesis • Describe the steps in performing a meta-analysis: • identification, selection, abstraction, and analysis. • Know the appropriate analytic approach for meta-analysis of key study designs: • Experimental (RCT’s) • Observational (cohort, case-control, diagnostic tests) • Other issues: • Publication bias • Quality assessment • Random versus fixed effects models • Meta-regression

Background • Facts: • For most clinical problems/public health issues there is an overwhelming amount of existing information, as well as new information produced every year • However, much of this information • isn't very good (= poor quality) • is derived from different methods & definitions (= poor standardization) • is often contradictory (= heterogeneity) • Very few single studies resolve an issue unequivocally (….. a “home run study”) • So how should we go about summarizing medical information?

How do we summarize medical information? • Traditional Approach • Expert Opinion • Narrative review articles • Validity? Unbiased? Reproducible? • Methods? (one study one vote?) • Consensus statements (group expert opinion) • New Approach (Meta-analysis) • Explicit quantitative synthesis of ALL the evidence

Definition - Meta-analysis • A technique for quantitatively combining the results of previous studies to: • Generate a summary estimate of effect, OR • Identify and explain heterogeneity • Alternate definition: a study of studies, to help guide further research and identify reasons for heterogeneity between studies. • Overview or Synthesis

Overview • Initially developed in social sciences in mid-1960’s • Adapted to medical studies in early 1980’s • Initially applied to RCT’s – esp. when indv. studies were small and under-powered • Also applied to observational epidemiologic studies – often with little fore-thought which generated much controversy • Explosion in the number of published meta-analyses in the last 10-15 years.

Overview • Often the initial step of a cost-effectiveness analysis, decision analysis, or grant application (esp. for RCT’s). • Are much cheaper than a big RCT!!! • Usually correspond to later randomized trials, but not always (from LeLorier, 1997):

Discrepancies between meta-analyses and subsequent large RCT’s(LeLorier NEJM 1997) 27/40 (68%) agreement

When is a meta-analysis appropriate? • When several studies are known to exist • When studies disagree (= heterogeneity) resulting in a lack of consensus • When both exposures and outcomes are quantified and presented in a useable format. • When existing individual studies are under-powered • M-A could then produce a precise estimate of effect • When you want to identify reasons for heterogeneity • M-A could illustrate why and identify important sub-group differences • When no one else has done it (yet!), or an update of an existing meta-analysis is justified.

Before you begin…… plan • M-A’s appear easy to do but require careful planning, and adequate resources (time = $$$) • Need to develop study protocol • Specify primary and secondary objectives • Methods • Describe search strategy (sources, published studies only?, fugitive lit?, blinding?, reliability checks?) • Define eligibility criteria • Type of quality assessment (if any) • Analysis • Type of model (fixed vs random, use of quality scores?) • Subgroup analyses? • Sensitivity analysis?

Estimating Time Required to do a M-A • Meta-Works (Boston, MA), private company • Provided estimates based on 37 M-A’s • Size of the body of literature, quality, complexity, reviewer pool and support services all important • Aver. total # hrs per study = 1139 (range 216 – 2516) • Search, selection, abstraction = 588 hrs • Stat Analysis = 144 hrs • Write up = 206 hrs • Other tasks = 201 hrs • Size of body of literature before any deletions (x) is best single guide (Hrs = 721 + 0.243x – 0.0000123x2)

Steps in a meta-analysis • 1. Identification (Search) • 2. Selection • 3. Abstraction • 4. Analysis • 5. Write-up

1. Identification - Sources • M-A’s use systematic, explicit search procedures (cf. qualitative literature review) • MEDLINE • 4100 journals • 1966 - present • Web search at PubMed: http://www.ncbi.nlm.nih.gov/PubMed • other search engines: BRS Colleague, WinSPIRs, etc • EMBASE • similar to MEDLINE, European version • Expensive, not widely available in US

Identification - Sources • Cochrane Collaboration Controlled Trials Register • Over 160,000 trials, including abstracts (+ translations) • by subscriptions….. MSU Electronic Library database • includes • MEDLINE, EMBASE • non-English publications • non-indexed publications • hand-search of journals • Other MEDLARS • CancerLit, AIDSLINE, TOXLINE, Dissertation Abstracts Online • Index Medicus • important if searching before 1966 • hand-search only

Identification - Steps: • 1. Search own personal files • 2. Search electronic databases • Review titles and on-line abstracts to eliminate irrelevant • Retrieve remaining articles, review, and determine if meet inclusion/exclusion criteria • 3. Review reference lists of articles for missed references • 4. Consult experts/colleagues/companies • 5. Conduct hand-searches of non-electronic databases and/or relevant journals • 6. Consider consulting an expert (medical librarian) with training in MEDLINE and use of MeSH terms.

Limitations of electronic databases • Electronic resources have been essential for growth of M-A, but they are far from perfect • 1. Databases are incomplete • Medline contains only 1/3rd of all biomed journals • 2. Indexing is never perfect • Want search to have high Se (include all relevant studies) and high Sp (but exclude the irrelevant!) • Ratio of retrieved articles : relevant articles can vary widely

Limitations of electronic databases 2. Indexing is never perfect • Accuracy of indexing per se relies on: • authors understanding how studies are categorized • “database” assigning correct category to study • Indexing also depends on ability of search strategies (e.g., MeSH) to identify relevant articles

Limitations of electronic databases 3. Search Strategies are never perfect - Its hard to find all the relevant studies - Average Se of expert searchers using MEDLINE (vs known Registries of studies) = 0.51 Example – National Perinatal RCT Registry

Other search issues…… • Non-English Studies • MEDLINE • Translation of title usually provided but abstracts often not. But N.B. that many non-English journals are not included anyway! • No a priori justification for excluding non-English studies • Quality is often equivalent or even better! • Excluding non-English studies can effect conclusions • But including means you need a translation just to determine eligibility!

Fugitive Literature • unpublished studies (… why are they unpublished?) • dissertations • drug company studies • book chapters • non-indexed studies and abstracts • conference proceedings • government reports • pre-MEDLINE (1966) • Sometimes important sources of information • Hard to track down – contact experts/colleagues • Need to decide whether to include or not - general consensus is that you should.

Publication bias • Published studies are not representative of all studies that have been performed • Articles with “positive findings” (P < 0.05) are more likely to be published • Hence published studies are a biased sub-set • Publication bias = systematic error of M-A that results from using only published studies

Evidence of Publication Bias Easterbrook (1991): 285 analyzed studies reviewed by Oxford Ethics Committee 1984-87

Implications of Publication Bias Simes (1986): Chemotherapy for Advanced Ovarian CA Comparison of Published Trials vs Registered Trials

Publication Bias • Probably results from a combination of author and editor practices and decisions (Ioannidis, 98) • Emphasizes the importance of registries of trials (N.B. Similar registries of observational studies are probably not feasible, although in Social Sciences Campbell Collaboration is attempting to do this) • Simple Solution: • Don’t base publication decisions on statistically significance! • Focus on interval estimation. • Yeah right……!

Publication bias – Approaches • 1. Attempt to Retrieve all Studies • Required for Cochrane Publications • Difficult to identify unpublished studies and then to find out details about them • Worst Case Adjustment • Number of unpublished negative studies to negate a “positive” meta-analysis: • X = [N x (ES) / 1.645]2 - N • where: N = number of studies in meta-analysis, • ES = effect size • Example: • If N = 25, and ES = 0.6 then X = 58.2 • Almost 60 unpublished negative studies would be required to negate the meta-analysis of 25 studies.

2. Graphical Approaches - Funnel plot Missing studies = small effects size with negative findings X X X X Sample Size (precision) X X X X X X X X X X X X X X X X Effect Size

2. Selection • Inclusion/eligibility criteria essential to: • Produce a more focused (valid) study • Ensure reproducibility and minimize bias • Apply criteria systematically and rigorously • Balance between highly restrictive versus non-restrictive criteria in terms of • face validity, homogeneity, power (N), generalizability • Always develop in advance and include clinical expert(s) in the team

Typical inclusion criteria: • study design (e.g., RCT’s?, DBPC?, Cohort & CCS?) • setting (emergency department, outpatient, inpatient) • age (adults only, > 60 only, etc) • year of publication or conduct (esp. if technology or typical dosing changes) • similarity of exposure or treatment (e.g., drug class, or dosage) • similarity of outcomes (case definitions) • minimum sample size or follow-up • languages? • complete vs incomplete (abstracts) • published vs fugitive? • pre-1966?

Selection – Other Issues • multiple publications from same study? • Include only one! (double dipping is common!) • report should provide enough information for analysis (i.e. point estimate and variability = SD or SE) • Selection process should be done independently by at least 2 reviewers • Measure agreement (K) and resolve discrepancies • Document excluded studies and reasons for exclusion • Keep pertinent but excluded studies

Typical Searching and Selection Results • First pass, using title in computer search: 300 - 500 articles • Second pass, using abstract in computer search: 60 - 100 articles • Final pass, using copy of entire article: 30 - 60 articles • Included in study: 30 articles

3. Abstraction • Goal: to abstract reliable, valid and bias free information from all written sources • Should expect a degree of unreliability • intra- and inter- rater reliability is rarely if ever 100%!! • Many sources of potential error: • Article may be wrong due to typographical or copyediting errors • Reported results can be misinterpreted • Errors in data entry during abstraction process

Abstraction • Ways to minimize error: • Develop and pilot test abstraction forms • Develop definitions, abstraction instructions, and rules • Train abstractors, pilot test, get feedback, and refine • Abstraction Forms • Number each data item • Require a response for EVERY item • Distinguish between negative, missing, and not-applicable • Simple instructions/language • Clear skip and stop instructions • Items clearly linked to definitions and abstraction rules

Abstraction • Typical process • 2 independent reviewers • Practice with 2 or 3 articles to “calibrate” • Use a 3rd reviewer or consensus meeting to resolve conflicts • Measure agreement (K) and resolve discrepancies

Other Issues - Abstraction • Outcome measures of interest may have to be calculated from original data • For example, data to calculate relative risk may be present but not described as such. • Multiple estimates from same study? • Exp: intention-to-treat vs not, adjusted for loss-to-follow up • Obs: crude vs age-adjusted vs multiple adjusted (model) • Include only one estimate per study, avoid over-fitted model estimates (as often more imprecise)

Investigator Bias: • Abstractor may be biased in favor of (or against!) a particular outcome (positive or negative finding), or researcher/institution, or journal. • prominent journals may be given greater weight or authority (rightly or wrongly) • if this may be an issue, have research assistant eliminate identifiers from articles (= blind review)

Blind Review • Remove study information that could affect inclusion or quality of abstraction, like: • author, title, journal, institution, country • Berlin (‘97): • compared blinded vs non-blinded reviews • Found discrepancy in which studies to include but little difference in summary effect sizes • Time consuming • Probably can avoid esp. if use well defined abstraction procedures

Assessment of study quality • Quality is an implicit measure of validity • Poor quality studies have lower validity • Using quality scoring should theoretically improve the validity of M-A’s • Process • Develop criteria (…how?) • Develop scale (= scoring system) • Abstract information and score each study • Example RCT scoring systems • Chalmers (1981) – 36 item scale! (see HWK #5) • Jadad (1997) – 5 point scale

Jadad Criteria for Scoring RCTs (1997 Cont Clin Trials 17:1-12) • 1. Randomization • Appropriate (= 1 point) if each patient had equal chance of receiving intervention and investigators could not predict • Add 1 point if mechanism described and appropriate • Deduct 1 point if mechanism described and inappropriate • 2. Double blinding • Appropriate (= 1 point) if stated that neither the patient nor investigators could identify intervention, or if “active placebo”, “identical placebo” or “dummies” mentioned • Add 1 point if method described and appropriate • Deduct 1 point if mechanism described and inappropriate • 3. Withdrawals and dropouts • Appropriate (= 1 point) if number and reasons for loss-to-FU in each group described.

Uses of Quality Scores • Threshold (minimum score for inclusion) • Categorize study quality • High, medium, low quality • Use as sub-group analyses • Sensitivity analysis • Combine study-specific scores with variance (based on N) to generate modified weights • Poorer studies “count less” • Generally not recommended • Meta-regression

Other Issues – Quality Scoring • Quality is difficult to measure • No consensus on method of scale development – not even for RCT’s • Few reliability/validity studies of scoring systems • inter-rater reliability of quality assessment often poor • Relies on quality of the reporting itself • sometimes study is blinded or randomized, but if not explicitly stated then it suffers in quality assessment • Difficult to detect bias from publications • More recent studies score higher – partly because they conform to recent standardized reporting protocols (e.g., RCT’s – CONSORT)

EPI-820 Evidence-Based Medicine