實證醫學 Evidence-Based Medicine Pharmacy, and Nursing 2010

實證醫學Evidence-Based Medicine Pharmacy, and Nursing 2010 林口長庚紀念醫院實證醫學中心風濕過敏免疫科主任余光輝醫師 Chief, Center for Evidence-Based Medicine Chang-Gung Memorial Hospital, Taiwan gout@adm.cgmh.org.tw

The Best Evidence Depends on the Type of Question • What are the phenomena/problems? • Observation • What is frequency of the problem?(Frequency) • Random (or consecutive) sample • Does this person have the problem?(Diagnosis) • Random(orconsecutive) sample with gold standard • Who will get the problem?(Prognosis) • Follow-up of inception cohort • How can we alleviate the problem?(Therapy) • Randomized controlled trial (RCT)

Current Best Evidence

Are the results of the study Validity (效度)Diagnostic Accuracy Study (診斷試驗) • R: Was the diagnostic test evaluated in an representative spectrum of patients? • mild and severe, early and late, treated and untreated • A: Was the reference standard ascertained regardless of the index test result? • Mbo: Was there an independent, blind comparison with a gold standard of diagnosis? • Was the test validated in a second, independent group of patients?

Importance Diagnostic Test • Accuracy of the test in distinguishing patients with and without the target disorder • Sensitivity (Sn) • Specificity (Sp) • Positive predictive value (PPV) • Negative predictive value (NPV) • Likelihood ratio (LR) • Diagnostic tests that produce big changes from pre-test to post-test probabilities

Questions to answer in applying a valid diagnostic test to an individual patientDiagnostic Test “Applicability” • Is the diagnostic test available, affordable, accurate, and precise in our setting? • Can we generate a clinically sensible estimate of our patient’s pre-test probability? • From personal experience, prevalence statistics, practice databases, or primary studies • Are the study patients similar to our own • Is it unlikely that the disease possibilities have changed since the evidence was gathered • Will the resulting post-test probabilities affect our management and help our patient? • Could it move us across a test-treatment threshold • Would our patient be a willing partner in carrying it out • Would the consequences of the test help our patient reach his or her goals in all this

Critical Appraisal of Diagnostic Accuracy Study“診斷工具”的評讀 • Are the results of the study valid (效度如何？) • Was the diagnostic test evaluated in a representative spectrum of patients (是否經過有代表性的病人群測試過？) • Was the reference standard ascertained regardless of the index test result (標準診斷工具做確診時不知道指標診斷工具的結果？) • Was there an independent, blind comparison between the index test and an appropriate gold standard of diagnosis (標準診斷與指標診斷工具是在獨立且雙盲的情況下進行比較？) • What were the results (結果是甚麼)？ • Are test characteristics presented (呈現診斷工具的特性？) • Can we apply to our patient (可以應用到我的病人？) • Were the methods for performing the test described in sufficient detail to permit replication?

Was the diagnostic test evaluated in a representative spectrum of patients 是否經過具有代表性的病人群測試過？ □ 是　　　　　　　□ 否　　　　　　　□ 不清楚評論：＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿

Was the reference standard ascertained regardless of the index test result 標準診斷工具做確診時不知道指標診斷工具的結果？ □ 是　　　　　　　□ 否　　　　　　　□ 不清楚評論：＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿

Was there an independent, blind comparison between the index test and an appropriate gold standard of diagnosis標準診斷工具與指標診斷工具是在獨立且雙盲的情況下進行比較？ □ 是　　　　　　　□ 否　　　　　　　□ 不清楚評論：＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿

Can we apply the results to our patient ？ (可以應用到我的病人嗎) • Patients • Are your patients similar enough that the prevalence of the disease in the study population is similar to that in your patients? • Is the severity of the disease in the test population similar to patients you are likely to see? • Benefits • Are there risks associated with the tests? • Are these outweighed by the danger of an undiagnosed disease?

評讀有關診斷檢查的研究(STUDIES OF DIAGNOSTIC TESTS) • 1.該檢查是否被描述清楚? (包括被界定異常的點) • 2.是否已確立所有病人實際有、無疾病的標準 (金字標準)? • 只要能夠把資料填入2x2表中，就可以知道該檢查各方面的重要效能。 • 3.有、無疾病的受檢病人譜與檢查所要應用的病人之特徵是否配合? • 敏感度常會受疾病嚴重程度影響，特異度常受研究中無病者特徵的影響。 • 4.檢查與疾病狀態的評估是否無偏差? • 如果是在以之急病狀態下決定檢定結果，可能會出現偏差，反之亦然。 • 5.檢查效能是否以敏感度、特異度或概似比來摘要表示? • 這些是在決定選擇何種檢查方法時所需要的資料。 • 6.當檢查值是連續值時，移動分界點對檢查效能有甚麼影響? • 檢查結果是分界點於哪裡定出正常值與異常值的分界而定。 • 7.如果有提及預測值，其臨床上真正的盛行率是否有提出? • 預測值受盛行率及該檢查的敏感度與特異度有所影響，若有無疾病的受檢者是分開選出，與臨床發生的盛行率無關，則由此計算出來的預測值便無任何臨床意義。

K.H. Yu Taiwan

診斷檢驗Diagnostic testSensitivity. Specificity, PPV, NPV, LR, ROC curve

LR Diagnosis PPV a/a+b NPV d/c+d • Sensitivity(Sn) = a/a+c = 731/809 = 90% SnSp • Specificity (Sp) = d/b+d = 1500/1770 = 85% (敏感度會受疾病嚴重程度影響) • Positive predictive value (PPV) =a/a+b = 731/1001 = 73% (=post-test probability) • Negative predictive value (NPV) =d/c+d= 1500/1578 = 95% • 但 ♠ 診斷試驗的預測值 (predictive value)受疾病的盛行率(prevalence)影響。 • Positive predictive value (PPV) = Sen . P / [Sen . P + (1-Sp) . (1-P)] (貝氏定理) • P= 0.5, PPV= 0.8×0.5 / [0.8×0.5+0.2×0.5] = 0.8 = 80.0% • P= 0.05, PPV= 0.8×0.05 / [0.8×0.05+0.2×0.05] = 17.4% • P= 0.005, PPV= 0.8×0.005 / [0.8×0.005+0.2×0.005] = 0.2% • 同一診斷工具, 在不同盛行率情況下, 其 Predictive value 結果不同。~ LR概似比 Specificity 高，但運用在盛行率低的族群時，大部分陽性結果是假陽性。 Sensitivity 高，但運用在盛行率高的族群時，大部分陰性結果是假陰性。例：Ovarian cancerCA-125: PPV ~ 2% in screen vs. 97% in pelvic mass cases

Prevalence (different clinical situations)affect predictive value 例： Ovarian cancer CA-125: PPV ~ 2% in screen vs. 97% in pelvic mass cases • Increasing the Prevalence of Disease Before Testing :When the prevalence of disease in the population tested is relatively high － more than several percent － the test perform well. Community-wide HIV screening Test: 90% sensitivity, 99% specific Population 10,000 Prevalence 0.1% = 10/10,000 PPV = 9/(9+100) = 0.08 = 8% NPV= 9890/(1+9890) = 0.9999 = 99.99% PPV 同一診斷工具, 在不同盛行率情況下, 其 Predictive value 結果不同 Note: spectrum of patients, age, gender, risk factors, clinical findings (prevalence)

Increasing the Prevalence of Disease Before Testing~ Prevalence affect thePPV Community-wide HIV screening Test: 90% sensitivity, 99% specific Population: 10,000 Prevalence: 0.1%= 10/10,000 PPV = 9/(9+100) = 0.08 = 8% NPV = 9890/(1+9890) = 0.9999 = 100% Population = 10,000 Prevalence = 10%= 1000/10,000 Sensitivity = 900/1000 = 90% Specificity = 8910/9000 = 99% PPV = 900/990 = 0.91 = 91% NPV = 8910/9010 = 0.99 = 99%

Diagnosis LR: likelihood ratio(multi-levels odds) + PV - PV Sensitivity = TP/(TP+FN) = a/a+c = 731/809 = 90% Sn SpSNout, SpPin Specificity = TN/(FP+TN) =d/b+d = 1500/1770 = 85% Positive predictive value (PPV) = a/a+b = 731/1001 = 73%(=post-test probability) Negative predictive value (NPV) = TN/(FN+TN) =d/c+d = 1500/1578 = 95% LR+for a positive result = Sens/(1- Spec) = a/(a+c) / b/(b+d) = 90%/15% = 6 陽性概似比LR+=敏感度/(1-特異度) = TP/FP= 有病者與無病健康者,檢驗呈陽性的機率勝算比 LR-for a negative result = (1-sens)/spec =c/(a+c)/d /(b+d) = 10%/85% = 0.12 Pre-test probability (prevalence)= a+c/a+b+c+d= 31% Pre-test odds = prevalence / (1-prevalence) = 31%/69% = 0.45 * Post-test odds = Pre-test odds × LR = 0.45 × 6 =2.7 Posttest probability = Posttest odds / (odds + 1) = 2.7/(2.7+1) = 73%(= PPV 73%)

Likelihood Ratio (2, 5, 10, multi-levels odds) • LR+陽性概似比 (LR of a positive test result, LR+) = 敏感度 / (1-特異度) • LR + = 有病者與健康者檢驗呈陽性的機率比 (勝算 an odd) = TP / FP • (Benefit: multi-levels, not just binary of yes or no)

Receiver operating characteristic(ROC)curve ~ selecting a cutoff point for continuous data TN TP FN FP IIβ Iα Importance of cut-off value on test performance. As the cut-off is moved to left, sensitivity (true positive, TP) increases, but specificity (true negative, TN) decreases. FN = false negative Diagnosis: Cut-off PointTrade-offsbetween sensitivity and specificity敏感度與特異度之間的取捨 • Moving this point changes sensitivity and specificity of the test. (trade-offs between Sensitivity and Specificity) Without disease With disease

Receiver operating characteristic(ROC)curve ~ selecting a cutoff point • 以真陽性(敏感度)為縱軸，假陽性(1-特異度)為橫軸，即將有病和沒病健康者試驗結果呈陽性的機率做一比較。 • ROC curve: Select a cut-off point for continuous data. • If the area under the ROC curve is 0.5 (null hypothesis), the model has no discriminatory power. • 由檢驗結果，做診斷時Note: spectrum of patients (敏感度會受疾病嚴重程度影響), age, gender, risk factors, clinical findings (prevalence)

ROC curve 以真陽性(敏感度)為縱軸，假陽性(1-特異度)為橫軸，即將有病和沒病健康者試驗結果呈陽性的機率做一比較。 Trade-offsbetween sensitivity and specificity Optimum cutoff point correlated with the best Youden index = (sensitivity + specificity -1) TP Useless AUROC: >0.8 0.9 – 1.0 excellent 0.8 – 0.9 good 0.7 -0.8 fair 0.6 – 0.7 poor < 0.6 useless (null 0.5) ROC curve: selecting a cut-off point for continuous data. (Sensitivity vs. 1-Specificity) FP

A B Trade-offs between sensitivity and specificity

Diagnosis • Use of Sensitivity Test (高敏感度檢查的運用) • Treatable disease Screening: maximize sensitivity while optimizing specificity • 未被檢查出來會有嚴重後果者 treatable or transmissible • e.g. screen donated blood for HIV, Pap smear, mammograms • Rule out disease ~ (SNout) e.g. ANA • Use of Specificity Test (高特定度檢查的運用) • 當假陽性結果會傷害患者身體、情緒、財物時 • e.g. cancer chemotherapy • Rule in disease ~ (SPin) Diagnosis: maximize specificity while optimizing sensitivity. e.g. anti-ds DNA • ROC(receiver operating characteristic) curve • 以真陽性(TP, 敏感度)為縱軸，假陽性(FP, 1-特異度)為橫軸，即將有病及無病者檢驗結果呈陽性的機率做一比較。ROC curve下方的面積越大，診斷工具的準確度越好 • Optimum cutoff point correlated with the best Youden index = (sensitivity + specificity - 1) • c.f. LR+for a positive result = sensitivity / (1- specificity): The probability of that test result in people with the disease divided by the probability of the result in people without disease • Accuracy: proportion of correct results = (a+d) / (a+b+c+d) = (prevalence x sensitivity) + (1-prevalence) x (specificity) = TP+TN / (TP+TN+FP+FN)

Diagnosis Strategies Serial vs. parallel test • Serial Test (高特定度) • The result of test 1 are considered before test 2, and so on • In order to be considered positive, all test in the series must be positive • Highly specific but insensitive • Useful when false positive are undesirable • Such as treatment is highly invasive or toxice.g. cancer chemotherapy • Parallel Test (高敏感度) • Any positive is considered a positive • Sensitive but not specific • Useful when rapid diagnosis is necessary and a missed diagnosis in undesirable

Take Home Message*實證醫學五步驟EBM步驟一：形成一個可以回答的臨床問題(Ask) 試著將您的問題分成下列四個部分（PICO）： □　Patient or Problem 病人或問題 □　Intervention or Indicator 介入或指標 • 某種治療、檢查、危險因子等 □　Comparator 比較－該治療和甚麼相比？ □　Outcome 結果－您想要達成或避免甚麼？ *Question Log – A tool for just in time learning. Carl Heneghan, Paul Glasziou以下“Take home message”資料主要係參考由台北醫學大學．市立萬芳醫院實證醫學中心經授權作中文編譯，英國The Centre for Evidence-Based Medicine及The National Library for Health, NHS所出版由Carl Heneghan及Paul Glasziou撰寫之Questions Log: A tool for ‘just in time’ learning手冊。任何形式之引用請註明文獻出處。

EBM 步驟二：搜尋最佳證據(Acquire) • 利用問題的 PICO結構（如上述 Ask）設定搜尋策略。 • 回想問題中各PICO部份的每一個辭彙及同義詞。一次就單一PICO元素進行搜尋。如，從介入（Intervention）開始，但必須確定你已聯集（OR）所有的同義詞。 • 可以使用截斷字 (truncation)，並加上”*”，如以child*取代children搜尋文件：請試著從Cochrane開始；其他問題型態則建議試試PubMed: Clinical Queries 或National Library for Health (NLH)。 • Dr. YuKH ~ PubMed 快速搜尋技巧： (P) and (I) and (Cochrane or meta-analysis or systematic review)

步驟三 Appraisal：嚴格評讀證據之(a)效度與(b)重要性(效益大小) • (a) 效度 (Validity) 各種形式的問題都含以下三個共同項目 (RAM-bo) • 研究族群是否具有代表性(Representative)? • 隨機選擇（random selection）／連貫性／起始點病人群，或者如果是比較性的研究，組別間是否可以比較？隨機分派（random allocation）／調整 • 是否有足夠的確認和追蹤(Ascertainment/follow-up)? • 反應率／追蹤／確認＞ 80% • 結果的估計值測量(Measurement) 是否公正無偏？恰當？ • 結果以盲法（blinded）或客觀的（objective）估計 • 以上這些答案，通常可以在文章中的方法學（Method）部分和結果（Result）的第一、二段中找到。這樣的評讀，一開始可能會令您覺得困難重重（就像騎腳踏車一樣），但是，累積了一些經驗之後，您只要幾分鐘就能完成 • (b) 重要性－效益大小 • 看結果段 (Results section)中描述的主要結果。效果有多大？多重要？統計意義要看信賴區間及P值；相對危險（relative risk）、相對危險性降低度（relative risk reduction）、勝算比（odds ratio）代表生物學上的影響。效果的絕對估計值：絕對危險性降低（absolute risk reduction）、益一需治數（NNT, number needed to treat）則代表在臨床上對病人的影響。

EBM 步驟四：將證據與臨床專業經驗及病人期望結合(Apply) □ 您的病人是否與研究中的病人差別很大，以至於無法適用該研究結果？ □ 您期望您的病人從研究結果中獲得多大的好處？ □ 還有哪些替代方案？ □ 研究結果適用於您的病人嗎？ □ 病人的想法為何？研究效果需要因應個別病人做調整，如治療Patient NNT = 1/ (RRR × PEER)

EBM 步驟五：評估執行效果及效用－勤做紀錄，改善過程最後一個步驟，看在執行過程中，您的表現如何？您可能要問自己下列幾個問題： □ 您正在紀錄您的問題嗎？ □ 您是否正在廣大的資源中尋找有用的外部證據？ □ 您有能力將這些證據應用在適當的病人身上嗎？ □ 您是否依循這些新證據來改變您的診療習慣？

第十屆醫療品質獎

參考附件 Appendix • For further reading • A: critical appraisal checklist • B: statistical concepts

評讀附件: Critical Appraisal • Oxford - Critical Appraisal Sheets • 1. RCT (therapy study) • 2. Systematic review • 3. Diagnostic test • CASP • Critical Appraisal Skills Programme (CASP) appraisal tool • http://www.phru.nhs.uk/pages/PHD/resources.htm • 推薦中文版 Clinical Epidemiology: The Essentials. Third edition. Robert H. Fletcher, et al. Wagner. Williams & Wilkins. 1996

決定臨床研究效度的基本指引所有研究 (ALL STUDIES) • 1.本研究想要回答何種臨床問題? • 研究設計應配合臨床問題 • 2.研究的病人、變項及結果為何? • 這幾點決定研究結果的可類推性 • 3.研究結果是由偏差造成的可能性有多大? • 組間若有系統差異(譬如病人特徵、介入、危險因子、結果、測量方法等)，將會降低內在效度。 • 4.效果有多大? • 臨床決定不只考慮有無效果，還要考慮效果的大小。 • 5.研究結果是由機會造成的可能性有多大? • 須知道真實效果可能出現的範圍(信賴區間)，及(用處較小)觀察結果是單由機會造成的可能(「陽性」結果用p值，「陰性」結果用檢力)。

Critical Appraisal of Therapy Study“治療研究”的評讀 • Are the results of the trial valid (效度如何)？ • Was the assignment of patients to treatment randomized(是隨機分配嗎？) • Were the groups similar at the start of the trial (試驗開始時兩組條件是否相似？–表一) • Aside from the allocated treatment, were groups treatedequally(兩組的其他治療一樣嗎？) • Were all patients who entered the trial accounted for? - and were they analyzed in the groups to which they were randomized (所有進入試驗者皆列入統計，並依所分配的組別計算？ITT analysis) • Were measures objective or were the patients and clinicians keep blinded to the treatment (結果的測量客觀？受試者及醫師都不知道所接受的治療為何？盲性評估)

Critical Appraisal of Therapy Study“治療研究”的評讀 • What were the results (結果為何)？(important?) • How large was the treatment effect (治療的效果有多大？) • RR (relative risk) • ARR (absolute risk reduction) • RRR (relative risk reduction) • NNT (number needed to treat) • How precise was the estimate of the treatment effect (治療效果的預測有多準確？) • Point estimate 點估計 • CI (confidence interval) 95%信賴區間 (cover null hypothesis?) • Will the results help me in caring for my patient (結果適用於我的病人嗎)？ (applicability?)

Was the assignment of patients to treatment randomized是隨機分配嗎？ □ 是　　　　　　　□ 否　　　　　　　□ 不清楚評論：＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿

Were the groups similar at the start of the trial試驗開始時兩組條件是否相似？ □ 是　　　　　　　□ 否　　　　　　　□ 不清楚評論：＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿

Aside from the allocated treatment, were groups treated equally兩組其他的治療條件一樣？ □ 是　　　　　　　□ 否　　　　　　　□ 不清楚評論：＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿

Were all patients who entered the trial accounted for and were they analyzed in the groups to which they were randomized所有進入試驗者皆列入統計，並依所分配的組別計算？ □ 是　　　　　　　□ 否　　　　　　　□ 不清楚評論：＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿

Were measures objective or were the patients and clinicians were blinded結果的測量客觀，受試者及醫師都不知道所接受的治療為何？ □ 是　　　　　　　□ 否　　　　　　　□ 不清楚評論：＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿

How large was the treatment effect 治療效果有多大？ 某研究追蹤二年，對照組死亡率15%，治療組死亡率10%，結果呈現的方式有下列幾種：

How precise was the estimate of the treatment effect 治療效果的預測多準確？ • The true risk of the outcome in the population is not known • The best we can do is to estimate the true risk based on the sample of patients in the trial • We can gauge how close this estimate is to the true value by looking at the confidence intervals (CI) • Narrow CI represents a precise reflection of the population value • The CI also provides us with information about the statistical significance of the result • If the value corresponding to no effect falls outside the 95% CI then the result is statistically significant at the 0.05 level • If the CI includes the value corresponding to no effect then the results are not statistically significant (null hypothesis) • ITT vs. per protocol analysis:如果不是每位病人都實際接受被分派的治療，這時必須依不同目的與不同科學強度進行不同的分析。『意圖治療』分析是針對管理決策，所以是以分派的治療來分析；『per protocol』分析是要解釋介入本身的效果，所以要以實際接受的治療分析。

Will the Results Help Me in Caring for My Patients ? • Is our patient so different from those in the study that its results cannot apply ? (Are the people in the study like my patient? – note inclusion criteria) • Data from Taiwan, China, or Asia (種族差異)? Cost-effectiveness analysis (效益分析) • Do I miss any data?同時手動搜尋所選文章之參考文獻及專家回顧文獻與詢問專家 • Age, general state of health, type and severity of disease process, time in the course of the disease (i.e. applicability) • Is the treatment feasible in my setting? • Will the potential benefits of treatment outweigh the potential harms of treatment for my patients? • Did the study cover all aspects of problem? • Does it suggest a clear and useful plan of action? • help to clarify a patient’s prognosis • suggest a useful plan to improve patient’s state of health

統合分析(META-ANALYSES) • 1.是否已找出所有相關的研究 • 包括已出版及未出版的研究? • 目的是要綜合全體研究的結果，而非有偏差的樣本研究。 • 2.是否只包含合乎科學原則的研究 • 極少偏差的研究? • 研究必須依據最可靠的證據。 • 3.估計效果時: • a.研究是否具同性質(病人、介入及研究結果均類似)? • 從完全不相似的研究中找出整體效果的測量值是不合適的。 • b.是否以各研究的樣本大小加權計算(weighting)? • 加權計算時，樣本數大(精密度高)者的權值是否較樣本數小(精密度低)者為大? • 4.研究品質與結果有無相關? • 品質佳的研究較可信。

Critical Appraisal of Systematic Review“系統性回顧”的評讀 • Are the results of the review valid (效度如何?) • What question (PICO) did the systematic review address (回答什麼問題?) • Is it unlikely that important, relevant studies were missed (有沒有遺漏重要的文獻?) • Were the criteria used to select articles for inclusion appropriate (選擇文獻的條件準則適當?) • Were the included studies sufficiently valid for the type of question asked (選擇的文獻有效回答所問的問題?) • Were the results similar from study to study (各研究間結果相似?無異質性) • What were the results (結果為何?) • How are the results presented (結果如何呈現? Meta-analysis Forest plot, heterogeneity chi-square Cochran Q)

What question (PICO) did the systematic review addressed 系統性回顧想要回答什麼問題？ □ 是　　　　　　　□ 否　　　　　　　□ 不清楚評論：＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿

Is it unlikely that important, relevant studies were missed沒有遺漏重要的文獻？ □ 是　　　　　　　□ 否　　　　　　　□ 不清楚評論：＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿

Were the criteria used to select articles for inclusion appropriate選擇文獻的準則適當? □ 是　　　　　　　□ 否　　　　　　　□ 不清楚評論：＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿

實證醫學 Evidence-Based Medicine Pharmacy, and Nursing 2010