Diagnostic Testing. Ethan Cowan, MD, MS Department of Emergency Medicine Jacobi Medical Center Department of Epidemiology and Population Health Albert Einstein College of Medicine.

## Diagnostic Testing

**Diagnostic Testing**Ethan Cowan, MD, MS Department of Emergency Medicine Jacobi Medical Center Department of Epidemiology and Population Health Albert Einstein College of Medicine**The Provider Dilemma**• A 26 year old pregnant female presents after twisting her ankle. She has no abdominal or urinary complaints. The nurse sends a UA and uricult dipslide prior to you seeing the patient. What should you do with the results of these tests?**The Provider Dilemma**• Should a provider give antibiotics if either one or both of these tests come back positive?**Why Order a Diagnostic Test?**• When the diagnosis is uncertain • Incorrect diagnosis leads to clinically significant morbidity or mortality • Diagnostic test result changes management • Test is cost effective**Clinician Thought Process**• Clinician derives patient prior prob. of disease: • H & P • Literature • Experience • “Index of Suspicion” • 0% - 100% • “Low, Med., High”**Probability of Disease**0% 100% Testing Zone P(+) P(-) Threshold Approach to Diagnostic Testing • P < P(-) Dx testing & therapy not indicated • P(-) < P < P(+) Dx testing needed prior to therapy • P > P(+) Only intervention needed Pauker and Kassirer, 1980, Gallagher, 1998**Probability of Disease**0% 100% Testing Zone P(+) P(-) Threshold Approach to Diagnostic Testing • Width of testing zone depends on: • Test properties • Risk of excess morbidity/mortality attributable to the test • Risk/benefit ratio of available therapies for the Dx Pauker and Kassirer, 1980, Gallagher, 1998**Reliability**Inter observer Intra observer Correlation B&A Plot Simple Agreement Kappa Statistics Validity Sensitivity Specificity NPV PPV ROC Curves Test Characteristics**Reliability**• The extent to which results obtained with a test are reproducible.**Reliability**Not Reliable Reliable**Intra rater reliability**• Extent to which a measure produces the same result at different times for the same subjects**Inter rater reliability**• Extent to which a measure produces the same result on each subject regardless of who makes the observation**Correlation (r)**• For continuous data • r = 1 perfect • r = 0 none O1 O1 = O2 O2 Bland & Altman, 1986**Correlation (r)**• Measures relation strength, not agreement • Problem: even near perfect correlation may indicate significant differences between observations O1 r = 0.8 O1 = O2 O2 Bland & Altman, 1986**Bland & Altman Plot**O1 – O2 • For continuous data • Plot of observation differences versus the means • Data that are evenly distributed around 0 and are within 2 STDs exhibit good agreement 10 0 -10 [O1 + O2] / 2 Bland & Altman, 1986**a**b c d Simple Agreement Rater 1 Rater 2 • Extent to which two or more raters agree on the classifications of all subjects • % of concordance in the 2 x 2 table (a + d) / N • Not ideal, subjects may fall on diagonal by chance - + total - a + b + c + d total a + c b + d N**a**b c d Kappa Rater 1 Rater 2 • The proportion of the best possible improvement in agreement beyond chance obtained by the observers • K = (pa – p0)/(1-p0) • Pa = (a+d)/N (prop. of subjects along the main diagonal) • Po = [(a + b)(a+c) + (c+d)(b+d)]/N2 (expected prop.) - + total - a + b + c + d total a + c b + d N**K=1**K > 0.80 0.60 < K < 0.80 0.40 < K < 0.60 0 < K < 0.40 K = 0 K < 0 Perfect Excellent Good Fair Poor Chance (pa = p0) Less than chance Interpreting Kappa Values**n11**n12 ... n1C n21 n22 ... n2C . . . . ... ... . . nC1 nC2 ... nCC Weighted Kappa Rater 1 Rater 2 1 2 ... C total • Used for more than 2 observers or categories • Perfect agreement on the main diagonal weighted more than partial agreement off of it. 1 n1. 2 n2. . . . . C nC. total n.1 n.2 ... n.C N**Validity**• The degree to which a test correctly diagnoses people as having or not having a condition • Internal Validity • External Validity**Validity**Valid, not reliable Reliable and Valid**Internal Validity**• Performance Characteristics • Sensitivity • Specificity • NPV • PPV • ROC Curves**2 x 2 Table**Disease Status TP = True Positives FP = False Positives total noncases cases positives Test Result + TP FP negatives - FN TN total cases noncases N TN = True Negatives FN = False Negatives**Gold Standard**• Definitive test used to identify cases • Example: traditional agar culture • The dipstick and dipslide are measured against the gold standard**Sensitivity (SN)**Disease Status • Probability of correctly identifying a true case • TP/(TP + FN) = TP/ cases • High SN, Negative test result rules out Dx (SnNout) total noncases cases positives Test Result + TP FP negatives - FN TN total cases noncases N Sackett & Straus, 1998**Specificity (SP)**Disease Status • Probability of correctly identifying a true noncase • TN/(TN + FP) = TN/ noncases • High SP, Positive test result rules in Dx (SpPin) total noncases cases positives Test Result + TP FP negatives - FN TN total cases noncases N Sackett & Straus, 1998**Problems with Sensitivity and Specificity**• Remain constant over patient populations • But, SN and SP convey how likely a test result is positive or negative given the patient does or does not have disease • Paradoxical inversion of clinical logic • Prior knowledge of disease status obviates need of the diagnostic test Gallagher, 1998**Positive Predictive Value (PPV)**Disease Status • Probability that a labeled (+) is a true case • TP/(TP + FP) = TP/ total positives • High SP corresponds to very high PPV (SpPin) total noncases cases positives Test Result + TP FP negatives - FN TN total cases noncases N Sackett & Straus, 1998**Negative Predictive Value (NPV)**Disease Status • Probability that a labeled (-) is a true noncase • TN/(TN + FN) = TP/ total negatives • High SN corresponds to very high NPV (SnNout) total noncases cases positives Test Result + TP FP negatives - FN TN total cases noncases N Sackett & Straus, 1998**Vulnerable to Disease Prevalence (P) Shifts**Do not remain constant over patient populations As P PPV NPV As P PPV NPV Predictive Value Problems Gallagher, 1998**Flipping a Coin to Dx AMI for People with Chest Pain**ED AMI Prevalence 6% SN = 3 / 6 = 50%SP = 47 / 94 = 50% PPV= 3 / 50 = 6%NPV = 47 / 50 = 94% Worster, 2002**Flipping a Coin to Dx AMI for People with Chest Pain**CCU AMI Prevalence 90% SN = 45 / 90 = 50% SP = 5 / 10 = 50% PPV= 45 / 50 = 90%NPV = 5 / 50 = 10% Worster, 2002**1.0**Sensitivity (TPR) 0.0 0.0 1.0 1-Specificity (FPR) Receiver Operator Curve • Allows consideration of test performance across a range of threshold values • Well suited for continuous variable Dx Tests**Receiver Operator Curve**• Avoids the “single cutoff trap” Sepsis Effect No Effect WBC Count Gallagher, 1998**Area Under the Curve (θ)**1.0 • Measure of test accuracy • (θ) 0.5 – 0.7 no to low discriminatory power • (θ) 0.7 – 0.9 moderate discriminatory power • (θ) > 0.9 high discriminatory power Sensitivity (TPR) 0.0 0.0 1.0 1-Specificity (FPR) Gryzybowski, 1997**Problem with ROC curves**• Same problems as SN and SP “Reverse Logic” • Mainly used to describe Dx test performance**Physical Exam**+ OR CT Scan - - + No Appy Appy Appendicitis Example • Study design: • Prospective cohort • Gold standard: • Pathology report from appendectomy or CT finding (negatives) • Diagnostic Test: • Total WBC Cardall, 2004**Appendicitis Example**SN 76% (65%-84%) SP 52% (45%-60%) PPV 42% (35%-51%) NPV 82% (74%-89%) Cardall, 2004**Physical Exam**+ OR CT Scan - - + No Appy Appy Appendicitis Example • Patient WBC: • 13,000 • Management: • Get CT with PO & IV Contrast Cardall, 2004**Follow UP**• CT result: acute appendicitis • Patient taken to OR for appendectomy**But, was WBC necessary?**Answer given in talk on Likelihood Ratios

