Xiaomei Yao, R. Brian Haynes, Nancy L. Wilczynski

Optimal sample sizes for testing search strategies for retrieving studies of treatment, diagnosis and prognosis from MEDLINE Xiaomei Yao, R. Brian Haynes, Nancy L. Wilczynski Health Information Research Unit, McMaster University

Background • MEDLINE covers 4800 journals and contains over 13 million articles in life sciences. • Clinicians and researchers need to retrieve the target articles quickly.

Background con’d • Unique optimal search filters (“hedges”) have been developed, based on the 161 core clinical journals published in 2000 from MEDLINE. (http://www.ncbi.nlm.nih.gov/entrez/query/static/clinical.html). • Determining optimal sample sizes for testing search strategies for retrieving studies of treatment, diagnosis and prognosis from MEDLINE.

Table 1 - Best search strategies for 3 categories retrieval from MEDLINE

Method • The search terms were treated as “diagnostic tests” and hand searching as a “gold standard”. • 49028 articles were categorized as “pass” or “fail” studies for clinical topic areas. • Precision analysis was used to calculate the sample size of “pass articles” needed to estimate sensitivity with 95% confidence intervals of ±0.01, ±0.02, ±0.025, ±0.03, ±0.04, ±0.05, ±0.10 (margin of error [ME]) .

Method con’d • The lowest sensitivity among 3 strategy types was chosen. • Random samples of journals for size 100, 50, 30, 20, 10, etc. were generated. • The order of the 161 journals was ranked in terms of the number of “pass articles”. • STATA 9.0 was used as the statistical software.

Results • sample size calculations based on search sensitivity were shown in Table 2. • Table 3 shows the numbers of “pass articles” for the 3 purpose categories in different random samples of journals from the 161 database.

Table 2 – Sample size needed for different margins of error (MEs)

Table 3 - The Numbers of “pass articles” (N) with their achieved MEs for random journal subsets

Results con’d • Based on table 2 and table 3, we can find the smallest journal subset for any ME. • The scatterplot of various MEs against random journal sample sizes for treatment is shown in Figure. • For the diagnosis and prognosis categories, we only can achieve a ME = ±0.10 because of low prevalence of “pass articles”.

Figure

Results con’d • We tested the previous optimal search strategies for the treatment category in different random journal subsets (Table 4). • 1, 2, and 6 top journals can achieve a ME = ±0.10, ±0.05, and ±0.025 for treatment, respectively. • The performances of the strategy in 1, 2, and 6 top journal subsets were very similar to the result of the 161 journals.

Table 4 - Performance characteristics in random sample 100, 45, 15, 10 journal subsets, and 161 journals for treatment category

Table 4 con’d

Conclusion • Our sample size calculations suggest that search strategies calibrated in small journal subsets will be as robust as those calibrated in larger collections for some categories that have relatively much more “pass articles”, such as treatment category with 3.24% of “pass articles”.

Conclusion con’d • The "top yielding journal" approach seems most economical. • We will develop and test the new search strategies in small random and top journal subsets to see how close the estimates are.

Acknowledgments • The Hedges Team included Angela Eady, Brian Haynes, Susan Marks, Ann McKibbon, Doug Morgan, Cindy Walker-Dilks, Stephen Walter, Stephen Werre, Nancy Wilczynski, and Sharon Wong. • Chris Cotoi (our computer programmer) helped us generate a journal randomizer.

Xiaomei Yao, R. Brian Haynes, Nancy L. Wilczynski

Xiaomei Yao, R. Brian Haynes, Nancy L. Wilczynski

Presentation Transcript

Yao Yao @ LSA 2010-1-7

Brian Wilczynski Director Enterprise Architecture and Standards Office of the DoD Deputy Chief Information Officer brian

Yao Li

Gavin Tiler Haynes

Charles B. Chang, Erin Haynes, Russell Rhodes, and Yao Yao University of California, Berkeley

Yao Wang

Nancy L. Rose MIT and NBER

Reforming BIOL 1108 Nancy L. Pencoe

Cupid Shuffle 4  8 Slide: R R R R Slide: L L L L Kick: R L R L Left ¼ turn

Charles B. Chang, Erin Haynes, Russell Rhodes, and Yao Yao University of California, Berkeley

Yao Ming

Presented by: Nancy L. Pirkey

Yao Ming

Brian R. Cahn & Associates, LLC

Nancy L. Reinsmoen, PhD., Chair

Brian L. Tierney BLTierney@lbl

Haynes Plumbing