Nicolas Fiorini , Zhiyong Lu NCBI/NLM/NIH Twitter: #AMIA2017

PubMed’s new relevance search Innovations in Information Retrieval Session 103 Nicolas Fiorini, Zhiyong Lu NCBI/NLM/NIH Twitter: #AMIA2017

Motivation • Improve PubMed’s relevance search • Providing more relevant results • Reducing users’ browsing time • Better understand user needs • Make use of state-or-the-art approaches • Machine learning AMIA 2017 | amia.org

Introduction • The previous relevance search on PubMed was based on weighted term frequencies • Integrate machine learning • Optimize first page • Very common process • Google • Yelp • Amazon • Ebay… Source: xkcd AMIA 2017 | amia.org

Introduction • The idea we propose is based on two layers (partly inspired from Liu 2009 and Dang et al 2013) • The first layer is a traditional information retrieval system • The second layer uses a learning-to-rank (L2R) technique Source: Wikipedia AMIA 2017 | amia.org First layer Second layer

Related work • Biomedical IR • Hersh, W., Information retrieval: a health and biomedical perspective. 2008: Springer Science & Business Media. • Jensen, L.J., J. Saric, and P. Bork, Literature mining for the biologist: from information retrieval to biological discovery. Nature reviews. Genetics, 2006. 7(2): p. 119-129. • Hersh, W., and Voorhees, E, TREC genomics special issue overview. Information Retrieval, 2009. 12(1): p. 1-15. • Jiang, J. and C. Zhai, An empirical study of tokenization strategies for biomedical information retrieval. Information Retrieval, 2007. 10(4-5): p. 341-363. • Query understanding • Lu, Z., W. Kim, and W.J. Wilbur, Evaluation of query expansion using MeSH in PubMed. Information retrieval, 2009. 12(1): p. 69-80. • Herskovic, J.R., et al., A day in the life of PubMed: analysis of a typical day's query log. Journal of the American Medical Informatics Association, 2007. 14(2): p. 212-220. • Ranking/filtering • Haynes, R.B., et al., Optimal search strategies for retrieving scientifically strong studies of treatment from Medline: analytical survey. Bmj, 2005. 330(7501): p. 1179. • Question answering • Cao, Y., et al., AskHERMES: An online question answering system for complex clinical questions. Journal of biomedical informatics, 2011. 44(2): p. 277-288. • Lee, M., et al. Beyond information retrieval—medical question answering. in AMIA annual 463 symposium proceedings. 2006. American Medical Informatics Association. • Roberts, K. and D. Demner-Fushman, Interactive use of online health resources: a comparison 465 of consumer and professional questions. Journal of the American Medical Informatics 466 Association, 2016. 23(4): p. 802-811. • Roberts, K., Simpson, M. S., Voorhees, E. M., and Hersh, W. R. Overview of the TREC 2016 Clinical Decision Support Track. In TREC, 2015. AMIA 2017 | amia.org

First layer • Traditional information retrieval system Index Search engine Parsing/indexing Fetching Scoring e.g., term frequencies Query Offline Online AMIA 2017 | amia.org

Second layer: learning-to-rank (L2R) • A class of machine learning techniques for ranking • Especially used in Information Retrieval • A ranking function (ranker) is created based on: • Gold standard: a query and a list of known relevant documents • A set of informative features for each query-document pair • Use the learned ranker to sort documents associated with unseen queries AMIA 2017 | amia.org

Learning-to-rank algorithms (2000 - ) LambdaMart (2008) Source: TY Liu AMIA 2017 | amia.org Source: TY Liu Source: TY Liu Source: TY Liu

Training dataset • We calculate a relevance score for each clicked article for each query from anonymized logs • relevance = abstract click + full text click + boost depending on full text availability • The more a paper has been clicked (abstract of full text) for a given query, the more relevant • We reorder articles according to their relevance scores AMIA 2017 | amia.org

Training dataset • For 36,000 queries, we have: • Clicked articles (from logs) with computed scores • The first layer’s top 500 search results • We normalize scores of relevant articles • The top 10 relevantdocuments have scores of 12-3 • The next 10 relevantdocuments have a score of 2 • The remaining relevantdocuments have a score of 1 • Documents that are not relevant have a score of 0 AMIA 2017 | amia.org

Feature engineering • 150+ numerical features that give clues on how relevant a document is w.r.t a query Document features Query-document features Query features Query length # of stopwords … Publication year Popularity Publication type … # of matches in various fields Query coverage … AMIA 2017 | amia.org

Detailed method workflow Live AMIA 2017 | amia.org

Offline results • Improved ranking performance measured by NDCG AMIA 2017 | amia.org

Assessment by click through (online) • Only queries returned two or more results are included • Single-item searches excluded • Zero-result searches excluded • Looking at first-page clicks (i.e., first 20 results) only AMIA 2017 | amia.org

Results • CTR 18% higher than date sort order • Especially, more clicks on top results for Solr-L2R • Increase in usage • +100% usage in 6 months • Positive user feedback “I ran a couple searches and like the new algorithm a lot. In fact, it presented me citations that I apparently missed.” “I appreciate the ranking with the best matches to the query. Very good point.” “You can let your group know that they are already having a positive impact in medical care” AMIA 2017 | amia.org

Best Match banner in PubMed AMIA 2017 | amia.org

PubMed Labs http://pubmed.gov/labs AMIA 2017 | amia.org

PubMed Labs – new search experience http://pubmed.gov/labs AMIA 2017 | amia.org

Limitations and future work • Relevance approximation using query logs • Better relevance extraction from logs • Use more relevance signals (seen counts, depth of clicks, etc.) • Deep learning • Mohan et al. Deep Learning for Biomedical Information Retrieval: Learning Textual Relevance from Limited Click Logs, ACL Workshop on BioNLP, 2017. • More relevance models • One model does not fit all – no free lunch theorem AMIA 2017 | amia.org

Acknowledgements PubMed PubMed Labs KathiCanese RafisIsmagilov EvgenyKireev David Lipman Udi Manber Vadim Miller Sunil Mohan Maxim Osipov Jim Ostell GrishaStarchenko RostBryzgunov KathiCanese AstaGindulyte EvgenyKireev Martin Latterner David Lipman Jim Ostell Jane Radestka AMIA 2017 | amia.org

Feature analysis AMIA 2017 | amia.org

Nicolas Fiorini , Zhiyong Lu NCBI/NLM/NIH Twitter: #AMIA2017