Facing Challenging Situations When Grading Strength of Evidence Presenters: Nancy Santesso, RD, MLIS, McMaster University Nancy Berkman, PhD, RTI International
Background • Systematic reviewers need to provide clear judgments about the evidence that underlies conclusions of the review to enable decision-makers to use them effectively. • “Strength of evidence” grading is a key indicator of a review team’s level of confidence that the studies included in the review collectively reflect the true effect of an intervention on a health outcome. • Deciding on the appropriate strength of evidence grades can be challenging because of the complexity and unique characteristics of the evidence included in the review.
Session approach and goals • Briefly review the AHRQ approach to grading the strength of evidence • Assume some prior experience in grading • Present a series of strength of evidence grading challenges • Not necessarily one “right answer” and would like session participants to share their thoughts with their neighbor and then discuss with the full group • Nancy S. will review how GRADE would approach the decision
Steps in AHRQ EPC Approach to Grading SOE • Separately for RCT and observational study evidence, aggregated across studies, for each outcome • Score 5 required domains • Risk of bias (Study limitations), Consistency, Directness, Precision • Maybe Publication bias • Considering, possibly scoring, 3 additional domains • Dose-response association • Plausible confounding • Strength of association • Combine into a separate SOE grade for RCTs and observational studies and then combine into final grade
Risk of bias domain score • Concerns adequate control for bias based on both study design and study conduct of individual studies • Assesses the aggregate risk of bias of studies separately for RCTs and observational studies • Scores: high, medium, or low • Based on design, RCTs start as low Risk of Bias and Observational studies start as higher Risk of Bias • May be adjusted based on individual study conduct
Consistency domain score • Degree of similarity in the magnitude (or direction of effect) of different studies within the evidence base. • Consistent: same direction of effect (same side of “no effect”) and narrow range of effect sizes • Inconsistent: non-overlapping confidence intervals, significant unexplained clinical or statistical heterogeneity, etc • Unknown or not applicable: single study so cannot be assessed
Directness domain score • Whether evidence reflects a single, direct link between the intervention of interest and the ultimate health outcome under consideration • Direct: single direct link between the intervention and health outcome • Indirect: evidence relies on • Surrogate or proxy outcomes • More than one body of evidence (no head-to-head studies)
Precision domain score • Degree of certainty for estimate of effect with respect to a specific outcome • Precise: estimate allows a clinically useful decision • Imprecise: confidence interval is so wide that it could include clinically distinct (even conflicting) conclusions • Unknown: measures of dispersion not provided
Reporting Bias domain score • Publication bias: nonreporting of results • Selective outcome reporting: nonreporting of planned outcomes • Selective analysis reporting: reporting only the most favorable analyses • Suspected • Undetected
Additional “discretionary” domains • Dose-response association (pattern of larger effect with greater exposure): present, not present, NA • Plausible confounders (confounding that works in the direction opposite, “weakens” effect): present, absent • Strength of association (effect so large that cannot have occurred solely as a result of bias from confounders): strong, weak • Applicability is considered separately
Integrating domain scores into a SOE grade • EPCs can use different approaches to incorporating multiple domains into an overall strength of evidence grade • Important that it is consistent within the review and transparent • Evaluation needs to be made by (at least) 2 reviewers • Must document approach used
Challenge 1: 1 study, continuous outcome, ‘significant effects’ • Question: What are the effects of a ‘fasting followed by vegan’ diet for rheumatoid arthritis in adults? • Outcome: Pain (13 months) – measured on a 10 cm VAS scale • Kjeldsen-Kragh 1991 - population (age 18-75), mild to severe rheumatoid arthritis
Challenge 1: 1 study, continuous outcome, ‘significant effects’ Risk of Bias: LOW • Allocation concealment • Random sequence generation by computerised random number generator • Blinding: no participants; outcome assessors, investigators and data analysts blinded • No loss to follow-up • Other biases – none Reporting bias: UNDETECTED: Comprehensive search of major databases, grey literature, contacting authors in field, & government funding--no additional studies
Challenge 1: What is the strength of evidence and why? Discuss with your neighbor Vote! Strength of evidence • High • Moderate • Low • Insufficient
Challenge 1: Assessment • Risk of bias LOW • Consistency: Unknown (one study) • Reporting bias: Undetected • Directness: Direct (outcome, population, intervention) • Precision? • Confidence intervals? • 34 people?
Optimal information size • We suggest the following: if the total number of patients included in a systematic review is less than the number of patients generated by a conventional sample size calculation for a single adequately powered trial, consider rating down for imprecision. Authors have referred to this threshold as the “optimal information size” (OIS) • http://stat.ubc.ca/~rollin/stats/ssize/
Rule of thumb • For continuous outcomes: suggest at least a sample size of 400 • More empirical evidence needed • Minimally Important Differences
Challenge 1 Assessment (modification) • Risk of bias: MEDIUM – no allocation concealment; 30% loss to follow-up - most treatment related but evenly distributed • Consistency: Unknown (single study) • Reporting bias: Undetected • Directness: Indirect (outcome, population – age >65 only, intervention) • Precision: Imprecision • Rating???
Challenge 2: 1 study, dichotomous outcome, ‘non-significant effects’ • Question: What are the effects of over the counter medications in acute pneumonia in children? • Outcome: not cured or not improved • Principi 1986 - population – inpatients age 2-16
Challenge 2: Assessment Risk of bias: LOW • Allocation concealment – unclear? • Adequate sequence generation – computer generated random numbers • Blinding of participants and outcome assessors; unclear for data analysts • Complete outcome data Reporting bias • Undetected; Selective outcome reporting bias: no, one study found for this medication and reported this outcome
Challenge 2: What is the strength of evidence and why? Discuss with your neighbor Vote! Strength of evidence • High • Moderate • Low • Insufficient
Precision? • Confidence intervals • Power calculation • Rules of thumb
Sample size: Optimal information size given alpha of 0.05 and beta of 0.2 for varying control event rates and relative risks For any chosen line, evidence meets optimal information size criterion if sample size above the line
Precision: • Confidence intervals • Power calculation • Rules of thumb
Challenge 3: Inconsistency and Precision Question: Effects of taxane chemotherapy in early breast cancer Outcome: febrile neutropaenia (adverse event) A priori exploration of heterogeneity: type of cancer; age; dose of taxane – could not explain heterogeneity
Challenge 3: Assessment Risk of bias: LOW Reporting bias: undetected Direct (population, intervention, outcome) Discuss with your neighbor Vote! Strength of evidence • High • Moderate • Low • Insufficient
Consistency and Precision Confidence intervals - Non significant?? Rules of thumb Optimal Information size - power calculation Unexplained inconsistency Overlapping confidence intervals I2, p value of Chi2
Challenge 4: RCT and observational study data • Major bleeding: Cold Knife Conization vs. LEEP for women with confirmed cervical abnormalities • What is the overallSOE grade?
Challenge 4: Are you more or less confident in the RCT data given the observational data? Discuss with your neighbor: Does the addition of the observational studies data make you more or less confident? Vote! Overall strength of evidence • High • Moderate • Low • Insufficient
Challenge 5: Telephone counselling to improve adherence to diet Narrative synthesis Total number of studies: 4 Total number of participants: 255
Challenge 5: Assessment • Risk of bias • Medium • Precision • All together 162 participants with small effect: OIS not met • Consistency • Some inconsistency • Directness • No concern • Reporting bias • No small negative study?
Challenge 5: What is the strength of evidence and why? Discuss with your neighbor Vote! Strength of evidence • High • Moderate • Low • Insufficient
Nancy Santesso RD, MLIS, PhD Cand Department of Clinical Epidemiology and Biostatistics McMaster University email@example.com Nancy Berkman, PhD Senior Health Policy Research Analyst Program on Healthcare Quality and Outcomes firstname.lastname@example.org More Information