Facing Challenging Situations When Grading Strength of Evidence

Facing Challenging Situations When Grading Strength of Evidence Presenters: Holger Schünemann, MD, PhD, McMaster University Nancy Berkman, PhD, RTI International

Process Overview Topic Development & Refinement Evidence Review Dissemination & Research Needs Development Topic Generation Translation & Implementation In Practice Establish Review & Stakeholder Team With Appropriate Expertise* Nomination Of Topics Clarify Intent Comprehensive Search Grade Body Of Evidence Evidence Report Gaps + Public, Expert* ID Topics Narrative (& Quantitative) Synthesis Screen & Select Studies Horizon Scanning Develop Protocol Analytic Framework Public Comments & Peer Review* Analytic/ Conceptual Framework Appraise Risk Of Bias/Quality Public Comment Finalize Key Questions Prioritization Finalize Protocol Final Report Abstract Data Future Research Needs Report * Manage COI

Steps in AHRQ EPC Approach to Grading SOE • Separately for RCT and observational study evidence, aggregated across studies, for each outcome • Score 4 required domains • Risk of bias • Consistency • Directness • Precision • Considering, possibly scoring, 4 additional domains • Dose-response association • Plausible confounding • Strength of association • Publication bias • Combine into a single SOE grade

Risk of bias domain score • Concerns both study design and study conduct for individual studies • Assesses the aggregate quality or risk of bias of studies separately for RCTs and observational studies and integrates those assessments into an overall risk of bias score • Scores: high, medium, or low • High risk of bias lowers SOE grade • Low risk of bias raises SOE grade

Consistency domain score • Degree of similarity in the effect sizes of different studies within the evidence base. • Consistent: same direction of effect (same side of “no effect”) and narrow range of effect sizes • Inconsistent: non-overlapping confidence intervals, significant unexplained clinical or statistical heterogeneity, etc • Unknown or not applicable: single study so cannot be assessed

Directness domain score • Whether evidence reflects a single, direct link between the intervention of interest and the ultimate health outcome under consideration • Direct: single direct link between the intervention and health outcome • Indirect: evidence relies on • Surrogate or proxy outcomes • More than one body of evidence (no head-to-head studies)

Precision domain score • Degree of certainty for estimate of effect with respect to a specific outcome • Precise: estimate allows a clinically useful decision • Imprecise: confidence interval is so wide that it could include clinically distinct (even conflicting) conclusions

Additional “discretionary” domains • Dose-response association (pattern of larger effect with greater exposure): present, not present, NA • Plausible confounders (confounding that works in the direction opposite, “weakens” effect): present, absent • Strength of association (effect so large that cannot have occurred solely as a result of bias from confounders): strong, weak • Publication bias: (not formally scored) • Unlike GRADE, applicability is considered separately

Integrating domain scores into a SOE grade • EPCs can use different approaches to incorporating multiple domains into an overall strength of evidence grade • GRADE algorithm • EPC’s own weighting system • A qualitative approach • Evaluation needs to be made by (at least) 2 reviewers • Must document approach used

AHRQ and GRADE Grading Categories

Challenge 1: CER of benefits, 1 study, no meta-analysis or CIs • Topic: Antidepressant medication response in the elderly • Evidence description: 1 fair quality RCT (N = 108). Outcome evaluated through 2 validated scales that are clinician administered. • Scale 1: Results reported in bar graph only: (p = 0.03) • Scale 2: Results reported in bar graph only: (p = 0.04)

Challenge 1: Precision Score • AHRQ/GRADE approach: Precise • AHRQ approach: Imprecise • GRADE approach: Imprecision Serious (-1) • GRADE approach: Imprecision Very Serious (-2)

Challenge 1: Strength of evidence grade • AHRQ/GRADE approach: High • AHRQ/GRADE approach: Moderate • AHRQ/GRADE approach: Low • AHRQ approach: Insufficient • GRADE approach: Very low

Challenge (1) - Response • Rules for precision: • Based on CI, number of events, effect size • Not perfect but good guides • Judgment simple and possible for this example • Given only 108 people, downgrade for imprecision unless effect is huge (which we need for this evaluation) and possibly by two levels

Creating a new GRADEpro file

Optimal information size • We suggest the following: if the total number of patients included in a systematic review is less than the number of patients generated by a conventional sample size calculation for a single adequately powered trial, consider rating down for imprecision. Authors have referred to this threshold as the “optimal information size” (OIS)

For systematic reviews • If the 95% CI excludes a relative risk (RR) of 1.0 and the total number of events or patients exceeds the OIS criterion, precision is adequate. If the 95% CI includes appreciable benefit or harm (we suggest a RR of under 0.75 or over 1.25 as a rough guide) rating down for imprecision may be appropriate even if OIS criteria are met.

Figure 4: Optimal information size given alpha of 0.05 and beta of 0.2 for varying control event rates and relative risks For any chosen line, evidence meets optimal information size criterion if sample size above the line

Challenge 2: CER of harms Mixed outcomes & mixed results from RCTs and obs studies • Topic: Risk of suicidality from antidepressants • Evidence description: • RCT: 1 fair quality study • Suicidal ideation worse Drug B (p = 0.03) • Case control: 1 fair quality study (N = 1300) • Non-fatal suicidal behavior; Drug A (OR = 1.16); Drug B (OR = 1.29) • Overlapping confidence intervals comparing each with Drug C • Nested case control: 1 good quality study (N = 10,000) • Completed suicides in adjusted analyses (P = NS)

Challenge 2: CER of harmsMixed outcomes & mixed results from RCTs and obs studies

Challenge 2: Directness Score RCTs • AHRQ/GRADE approach: Direct • AHRQ approach: Indirect • Grade approach: Serious indirectness (-1) • Grade approach: Very serious indirectness (-2)

Challenge 2: Directness Score Observational Studies • AHRQ/GRADE approach: Direct • AHRQ approach: Indirect • Grade approach: Serious indirectness (-1) • Grade approach: Very serious indirectness (-2)

Challenge (2) - Response • Indirect comparison • Downgrade • Observational study can provide more direct evidence • Need to go through full framework to find that out

Challenge 3: CER of benefits, RCTs found no difference between treatments • Topic: Medication response • Evidence description: 5 fair quality RCTs, # of participants ranges from 90-200, each study: (p = NS) • Meta-analysis pooled risk ratio: 1.03 (95% CI, 0.92-1.16)

Challenge 3: Are the treatments equivalent for this outcome? • Yes • No • Don’t know

Challenge (3) - Response • Superiority, inferiority and non-inferiority depend on more than one outcome. • Need to specify threshold. If threshold met, not imprecise, if not met, imprecise.

Figure 1, Rating down for imprecision in guidelines: Thresholds are key Threshold if side effects, toxicity and cost minimal, NNT = 200. Entire confidence interval to left of threshold, do not rate down for imprecision Mortality estimate and confidence interval Threshold if side effects, toxicity and Cost appreciable, NNT = 100. Confidence interval crosses threshold, rate down for imprecision 2.0 0.5 0 0.5 Favors Intervention Favors Control Risk difference (%)

Challenge 4: CER of serious harms, Mixed findings in RCTs and observational studies • Topic: Serious infection from rheumatoid arthritis treatments • Evidence description: • RCTs: 4 fair quality studies. Number of participants ranges from 80 to 531. Number of serious infections presented for each treatment, very rare event. In each study (p = NS) • Retrospective cohort study 1: fair quality(N = 5,326). Hospitalization with a definite bacterial infection: Higher for Treatment A. Adjusted HR =1.94 (95% CI, 1.32 to 2.83) • Retrospective cohort study 2: good quality/low risk of bias (N = 2,369) Adjusted rate of serious bacterial infection: RR =1.0 (95% CI, 0.6 to 1.71)

Challenge 4: CER of serious harms, Mixed findings in RCTs and observational studies

Challenge 4: Risk of bias score • AHRQ/GRADE approach: Low risk of bias • AHRQ approach: Medium risk of bias • AHRQ approach: High risk of bias • GRADE approach: Serious risk of bias (-1) • GRADE approach: Very serious risk of bias (-2)

Challenge (4) - Response • Sequential work • Use the evidence that is of higher quality • Mention observational evidence in footnote

Challenge 5: Can you use less stringent criteria to evaluate risk of bias if the outcome without treatment is likely to result in death? • Topic: use of Hematopoietic stem cell transplantation (HSCT), also known as bone marrow transplantation. • Low Risk of Bias modified to be: natural history (or severity) of disease made spontaneous remission highly unlikely or impossible. • Evidence description: • For single HSCT for Wolman’s disease: The natural history of this disease death occurs by approximately 6 months of age. Of five cases reported in the evidence, three patients were alive at 4 to 11 years’ followup, with normal function and attending school. The strength of the body of evidence is high.

Challenge 5: Do you agree that it would be appropriate to use less stringent criteria to evaluate risk of bias under these circumstances? • Yes • No • Don’t know

Challenge 5: Do you agree that it would be appropriate to use less stringent criteria to evaluate risk of bias under these circumstances? • One reviewer commented that, rather than modifying Risk of Bias criteria, “the SOE system does allow consideration of other factors through the ‘optional domains’ if applied correctly.” These optional domains are: • dose-response association, • plausible confounding that would decrease observed effect, • strength of association (magnitude of effect), and • publication bias. • Do you agree?

Challenge (5) - Response • Particular design features of extremely rigorous well-conducted observational studies may warrant consideration for rating up quality of evidence. For instance, a case-control study found that sigmoidoscopy was associated with a reduction in colon cancer mortality for lesions in range of the sigmoidoscope (OR 0.30, 95% CI 0.19 to 0.48), but not beyond the range of the sigmoidoscope (OR 0.96, 95% CI 0.61 to 1.50). Possible bias because of unmeasured confounders should have been very similar if not identical in the two situations, considerably raising confidence in the causal effect of the sigmoidoscopy.

Challenge (5) - Response • Furthermore, when considering rating up the quality of evidence for magnitude of effect, factors relating to the magnitude are rapidity of treatment response, and the previous underlying trajectory of the condition6. For example, we feel confident that hip replacement has a large effect not only because of the size of the treatment response, but because the natural history of hip osteoarthritis is a progressive deterioration that surgery rapidly and uniformly reverses. The rapidity of response compared to the known trajectory of the condition can also be considered (and calculated6) as a large effect size. • An additional factor mitigating the problem of rating up the quality because of a large effect is that indirect evidence usually provides further support for large treatment effects. For example, oral anticoagulation in mechanical heart valves has not been compared to placebo in an RCT, but evidence from observational studies suggests a large effect of oral anticoagulation in decreasing thromboembolic events87. Supplementary indirect evidence from randomized trials that have demonstrated large reductions in the relative risk of thrombosis with anticoagulation in analogous conditions such as atrial fibrillati further increases our confidence in the beneficial effect of anticoagulation9. • Similarly, the effectiveness of antibiotic prophylaxis in a variety of other situations supports observational studies that suggest that antibiotic prophylaxis results in an 89% relative risk reduction in meningococcal disease in contacts of patients who have suffered the illness10. • Another situation allows an inference of a strong association without a formal comparative study. Consider the question of the impact of routine colonoscopy versus no screening for colon cancer on the rate of perforation associated with colonoscopy. Here, a large series of representative patients undergoing colonoscopy will provide high quality evidence on the risk of perforation associated with colonoscopy. When control rates are near 0 (i.e. we are certain that the incidence of spontaneous colon perforation in patients not undergoing colonoscopy is very low), case series of representative patients (one might call these cohort studies of affected patients if they include large numbers of patients) can provide high quality evidence of adverse effects associated with an intervention, thereby allowing us to infer a strong association from even a limited number of events. One should not confuse the situation highlighted in the previous example with isolated case reports of associations between exposures and rare adverse outcomes (as have, for instance, been reported with vaccine exposure).

Challenge 6: Challenges in using GradePro. • “I find it challenging to use GRADEpro to grade the body of evidence for non-RCTs and unpooled data.” • Comments?

Challenge (6) • Response: • GRADEpro is updated for observational studies. • Unpooled data: headcount as last resort, can still make qualitative judgments as long as transparent (e.g. inconsistency, imprecision)

Challenge 7 Current grading schemes are not amenable to healthcare quality improvement studies because: • They may only distinguish between RCTs and “all other” types of studies. • They may not distinguish quality of studies within RCTs and other types of study designs. • They do not have a way to appropriately grade external validity, which is critically important in QI studies. • Comments?

Challenge 7 - Response • They may only distinguish between RCTs and “all other” types of studies. • GRADE makes explicit judgments necessary about the confidence in estimates of effects for any study design. Randomization is just one of the criteria early on in the process as it is the key method to protect against bias • They may not distinguish quality of studies within RCTs and other types of study designs. • GRADE’s explicit judgments do make this distinction • They do not have a way to appropriately grade external validity, which is critically important in QI studies. • Judgments about directness do accomplish that (PICO) where P includes the setting

Holger Schunemann Chair and Professor Department of Clinical Epi and Biostatistics McMaster University schuneh@mcmaster.ca Nancy Berkman Senior Health Policy Research Analyst Program on Healthcare Quality and Outcomes berkman@rti.org More Information

Facing Challenging Situations When Grading Strength of Evidence

Facing Challenging Situations When Grading Strength of Evidence

Presentation Transcript

Challenging Situations

GRADing Evidence

Grading evidence and recommendations

Grading the Strength of a Body of Evidence on Diagnostic Tests

Unit 6: Challenging Situations

Grading Strength of Evidence

Grading Evidence in Medicine

Facing Challenging Situations When Grading Strength of Evidence

Unit 6: Challenging Situations

Unit 6: Challenging Situations

Grading evidence and recommendations

SYNTHESIZING THE EVIDENCE Grading the Evidence

Evaluating and grading evidence

Grading the quality of evidence

Systematic Review Module 11: Grading Strength of Evidence

Grading Strength of Evidence

Systematic Review Module 11: Grading Strength of Evidence Interactive Quiz

Schools Facing Challenging Circumstances

Challenging Situations

The worst situations of low libido, strength

Grading the Strength of a Body of Evidence on Diagnostic Tests

Grading Strength of Evidence