1 / 42

Grading Strength of Evidence

Grading Strength of Evidence. Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews Methods Guide www.ahrq.gov. Systematic Review Process Overview. Learning Objectives. To define what “ grading strength of evidence (SOE) ” is

rea
Télécharger la présentation

Grading Strength of Evidence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grading Strength of Evidence Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews Methods Guide www.ahrq.gov

  2. Systematic Review Process Overview

  3. Learning Objectives • To define what “grading strength of evidence (SOE)” is • To describe why grading SOE is important • To distinguish between grading SOE and rating the quality of individual articles • To list primary and additional domains for grading SOE • To describe options for scoring SOE domains • To describe how to score and present SOE grades

  4. Grading Strength of Evidence • Is distinct from rating the quality of individual studies • Is generally used only to assess: • Major outcomes (benefits and harms) • Major comparisons, when relevant

  5. Why Grade Strength of Evidence? • To facilitate use of systematic reviews by diverse decisionmakers and stakeholders • To give decisionmakers: • A comprehensive evaluation of the evidence • A sense of how much confidence they can place in the evidence • To foster transparency and documentation

  6. Three Steps to Grading Strength of Evidence • Scoring four required domains • Risk of bias • Consistency • Directness • Precision • Considering, and possibly scoring, four additional domains • Dose-response association • Plausible confounders • Strength of association • Publication bias • Combining scores from required domains into a single strength-of-evidence score, taking scores on additional domains into account as needed

  7. Four Required Domains: Risk of Bias • Concerns both study design and study conduct for individual studies, rated by usual methods • Assesses the aggregate quality of studies within each major study design and integrates those assessments into an overall risk-of-bias score • Risk-of-bias scores: • High — lowers strength-of-evidence grade • Medium • Low — raises strength-of-evidence grade

  8. Four Required Domains: Consistency • Defined as the degree of similarity in the effect sizes of different studies within an evidence base • Consistent evidence bases: • Have the same direction of effect (same side of “no effect”) • Have a narrow range of effect sizes • Inconsistent evidence bases: • Have nonoverlapping confidence intervals • Have significant unexplained clinical or statistical heterogeneity

  9. Four Required Domains: Consistency Scores • Only three possible scores for consistency: • Consistent (i.e., no inconsistency) • Inconsistent • Unknown or not applicable (single study cannot be assessed) • Meta-analysis: • Use appropriate tests, such as Cochran’s Q test or I2 statistics

  10. Four Required Domains: Directness • Defined as whether the evidence being assessed: • Reflects a single, direct link between the interventions of interest and the ultimate health outcome under consideration • Relies on multiple links in a causal chain • If multiple links are involved, strength of evidence can be only as strong as the weakest link • Using analytic frameworks* is important *See the “Analytic Frameworks” module

  11. Four Required Domains: Aspects of Indirectness • Intermediate or surrogate outcomes instead of health or patient-centered outcomes • Example: laboratory test results or radiographic findings versus patient-reported functional outcomes or death • Indirect comparisons rather than direct, head-to-head comparisons • Direct (e.g., A vs. B, A vs. C, and B vs. C): • Head-to-head studies in the evidence base • Generally assumes use of health outcomes, not surrogate/proxy outcomes • Better strength of evidence • Indirect (e.g., A vs. B, B vs. C, but not A vs. C): • No head-to-head studies that cover all interventions or outcomes of interest • Problematic situation for all types of comparisons • Strength-of-evidence grades not as strong as with direct evidence

  12. Related Issue of Applicability* • Applicability is evaluated separately from directness for the Evidence-based Practice Center (EPC) program. • For decisionmakers, the applicability of evidence depends on the different interests of diverse groups. • A PICOS framework (patient populations, interventions, comparators, outcomes, and settings) is used for applicability assessment in the EPC program. • Although the EPC program separates applicability from strength-of-evidence grading, other systems that work with one decisionmaker may incorporate applicability issues into their evaluations of directness. *See the “Assessing Applicability” module

  13. Four Required Domains: Directness Scores • Only two possible scores for directness: • Direct: • Evidence is based on a single link between the intervention and health outcomes • Indirect: • Evidence relies on: • Surrogate/proxy outcomes • More than one body of evidence • Both situations

  14. Four Required Domains: Precision • Defined as the degree of certainty for estimate of effect with respect to a specific outcome • Is a complicated concept that: • Asks the question: • What can decisionmakers conclude about whether one treatment is, clinically speaking, inferior, superior, or equivalent (neither inferior nor superior) to another? • Includes considerations of: • Statistical significance for effect estimates • Confidence intervals for those effect estimates

  15. Four Required Domains: Precision Scores • Are rated separately for each important outcome or comparison, including for any summary estimate of effect size • Only two scores are possible • Precise: estimate allows a clinically useful conclusion • Imprecise: confidence interval is so wide it could include clinically distinct (even conflicting) conclusions

  16. Additional Domains • Four “discretionary” domains: • Dose-response association • Plausible confounders • Strength of association • Publication bias • Use when they are: • Applicable • Helpful in reaching conclusions about overall grades for strength of evidence

  17. Additional Domains: Dose-Response Association • Pattern of a larger effect with greater exposure (dose, duration, adherence) either across or within studies • Rate if studies give levels of exposure

  18. Additional Domains: Dose-Response Scores • Three scores are possible for dose-response: • Present: dose-response pattern observed • In such a case, Evidence-based Practice Center reviewers may want to upgrade the level of evidence. • Not present: no dose-response pattern observed (dose-response relationship not present) • Not applicable or not tested

  19. Additional Domains: Plausible Confounding • In an observational study, sometimes plausible confounding factors work in the direction opposite that of the observed effect. • Had such “effect-weakening” confounders not been present, the observed effect would have been even larger than the one observed. • In such a case, Evidence-based Practice Center reviewers may want to upgrade the level of evidence. • Consider whether or not plausible confounding exists that would decrease the observed effect.

  20. Additional Domains:Plausible Confounding Scores • Two scores are possible for plausible confounding: • Present: confounding factors that would decrease the observed effect may be present • Absent: confounding factors that would decrease the observed effect are not likely to be present

  21. Additional Domains: Strength of Association • Magnitude of effect: • Defined as the likelihood that the observed effect is large enough that it cannot have occurred solely as a result of bias from potential confounding factors • Consider when effect size is particularly large

  22. Additional Domains:Strength of Association Scores • Two scores are possible for strength of association: • Strong: large effect size that is unlikely to have occurred in the absence of a true effect of the intervention • In such a case, Evidence-based Practice Center reviewers may want to upgrade the level of evidence. • Weak: small enough effect size that it could have occurred solely as a result of bias from confounding factors

  23. Additional Domains: Publication Bias • Studies may have been published selectively. • Example: only a small proportion of relevant trials or other studies has been published. • Estimated effects of an intervention that are based on published studies do not reflect true effect. • Publication bias may undermine the overall robustness of a body of evidence.

  24. Additional Domains: Publication Bias Scores • Publication bias scores: • Need not be formally computed but can influence ratings of required domains • Should take these possible publication bias factors into account: • Rating for consistency • Calculating a summary confidence interval for an effect • Add comments on publication bias when circumstances suggest that relevant empirical findings, particularly negative or no-difference findings, have not been published or are not otherwise available.

  25. Procedures for Assessing Domains • Use two or more reviewers with the appropriate clinical and methodological expertise. • Assess separately: • Each required domain (or each optional domain, as relevant) • Each major outcome, including benefits and harms • Resolve differences by consensus or mediation by an additional expert; consensus scores should appear in tables. • Record and maintain records of each reviewer's individual judgments about domains as background documentation.

  26. Strength of Evidence Grades (I) • Reflect a global assessment that: • Takes the required domains directly into account • Incorporates judgments about the additional domains as needed • Aim to: • Provide “actionable” information for a variety of different users, readers, and stakeholders • Be transparent in how the strength-of-evidence grades are reached

  27. Strength of Evidence Grades (II) • For each comparison of interest, rate the strength of evidence for: • Each major benefit (e.g., positive effects on health outcomes such as physical function or quality of life, or effects on laboratory measures or other surrogate variables) • Each major harm (ranging from rare, serious, or life-threatening adverse events to common but bothersome effects) • For both benefits and harms: • Focus on the outcomes most relevant to patients, clinicians, and policymakers

  28. Strength of Evidence Grades and Definitions • High: High confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect. • Moderate: Moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate. • Low: Low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the estimate of effect and is likely to change the estimate. • Insufficient: Evidence either is unavailable or does not permit a conclusion.

  29. Strength of Evidence Grades: Additional Points (I) • Using the high, moderate, or low strength-of-evidence grade: • Implies that a body of evidence actually exists • Is intended to convey how confident reviewers are about decisions that may be made based on evidence graded one way or another • Requires the use of only one designation, not a range (e.g., not “low to moderate”)

  30. Strength of Evidence Grades:Additional Points (II) • The insufficient strength-of-evidence grade: • Is applied when: • Reviewers cannot draw conclusions about an outcome, comparison, or other question • Is appropriate when: • No evidence is available at all • Evidence is too insubstantial to permit conclusions to be drawn (e.g., opposing results from studies with a similar risk of bias; wide and overlapping confidence intervals)

  31. Scoring and Reporting: General Guidance • Use different approaches to incorporate multiple domains into an overall strength-of-evidence grade • GRADE algorithm • Weighting system of the Evidence-based Practice Center • Some qualitative approach • Use (at least) two reviewers • Assess resulting interrater reliability for each domain score, and keep records

  32. Guiding Principles: Risk of Bias • Risk of bias (given design and conduct of available studies) is the essential component in determining the strength-of-evidence grade. • First, consider which study design is most appropriate to reduce bias for each question. • Next, consider the risk of bias from available studies.

  33. Guiding Principles: Risk of Bias Example • Drug comparisons in randomized controlled trials (RCTs), with either placebo or an active comparator as an appropriate design: • Evidence from well-conducted RCTs will have less risk of bias than evidence based on observational studies. • For RCTs, reviewers can start with a rating of low for risk of bias and change the assessment if the RCTs have important flaws. • For observational data, reviewers can start with a rating of high for risk of bias and change the assessment, depending upon how well studies were conducted.

  34. Further Guidance: Principles for Scoring • Be explicit about how the evidence grade will be determined. • A point system for combining ratings of the domains • A qualitative consideration of the domains • Carefully document procedures. • Keep records of procedures and results for each review so that they may contribute to the overall expertise of the Evidence-based Practice Center and the science of grading evidence.

  35. Further Guidance: Principles for Reporting (I) • Explain the rationale for the approach used and identify which domains were important in upgrading or downgrading the strength of evidence. • Explain judgments about the degree to which any additional domains altered the overall strength-of-evidence grade. • Provide enough detail within the report to ensure that users can grasp the methods.

  36. Further Guidance: Principles for Reporting (II) • Use the terms high, moderate, low, or insufficient. • Do not use Roman numerals or other symbols. • Use or adapt the illustrative tabular approach to reporting (see the publications listed below for examples). • Owens DK, Lohr KN, Atkins D, et al. Grading the strength of a body of evidence when comparing medical interventions. In: Methods Guide for Comparative Effectiveness Reviews. Rockville, MD: Agency for Healthcare Research and Quality, Posted August 2009. Available at: http://effectivehealthcare. ahrq.gov/ ehc/products/60/318/2009_0805_grading.pdf. • Owens DK, Lohr KN, Atkins D, et al. Grading the strength of a body of evidence when comparing medical interventions —Agency for Healthcare Research and Quality and the Effective Health Care Program. J Clin Epidemiol 2010;63:531-523.

  37. Grading Strength of Evidence:Presentation of Results — Moderate and High Grades CI = confidence interval; RCT = randomized controlled trial

  38. Grading Strength of Evidence:Presentation of Results — Insufficient and Low CI = confidence interval; RCT = randomized controlled trial

  39. Comparison With the GRADE System • The grading system used by the Evidence-based Practice Centers (EPCs) is similar to the GRADE system. • The EPC grading system reflects the needs of AHRQ stakeholders for reviews on a wide variety of topics and not for recommendations or guidelines. • The main differences between the two grading systems: • The definitions of domains differ slightly; in the EPC system “directness” excludes “applicability,” which is handled separately. • In the EPC system, observational studies are considered to have less risk of bias for outcomes such as harms, which can raise the initial grade to “moderate.” • The definition of overall grade differs; the EPC system emphasizes confidence in estimate, whereas the GRADE system emphasizes effect of future research. • The EPC system permits three different ways to reach an overall strength-of -evidence grade; the GRADE formula has one.

  40. Summary: Grading Strength of Evidence • Is a critical last step in analysis and presentation • Is done after the quality of articles is rated by at least two independent reviewers • Helps users of systematic reviews understand the body of evidence and how much confidence they can have in making decisions based on that evidence • Uses scores on four primary (mandatory) domains and four additional (discretionary) domains • Focuses on major outcomes and comparisons • Is denoted in terms of high, moderate, or low strength or insufficient evidence • Presents strength-of-evidence grades in tabular form

  41. References • Atkins D, Best D, Briss PA, et al, for the GRADE Working Group. Grading quality of evidence and strength of recommendations. BMJ. 2004;328:1490. • Owens DK, Lohr KN, Atkins D, et al. Grading the strength of a body of evidence when comparing medical interventions. In: Agency for Healthcare Research and Quality. Methods Guide for Comparative Effectiveness Reviews [posted July 2009]. Rockville, MD. Available at: http://effectivehealthcare. ahrq.gov/healthInfo.cfm?infotype=rr&ProcessID=60. • Owens DK, Lohr KN, Atkins D, et al. Grading the strength of a body of evidence when comparing medical interventions —Agency for Healthcare Research and Quality and the Effective Health Care Program. J Clin Epidemiol 2010;63:513-523.

  42. Author • This presentation was prepared by Kathleen N. Lohr, Ph.D., a Distinguished Fellow at RTI International. • This module is based on an update of chapter 11 in version 1.0 of the Methods Guide for Comparative Effectiveness Reviews (updated chapter available at: http://effectivehealthcare.ahrq.gov/ehc/products/60/318/2009_0805_ grading.pdf ).

More Related