1 / 21

Proximity-based Ranking of Biomedical Texts

Proximity-based Ranking of Biomedical Texts. Rey-Long Liu * and Yi-Chih Huang * Dept. of Medical Informatics Tzu Chi University Taiwan. Outline. Research background Problem definition The proposed approach: PRE Empirical evaluation Conclusion. Research Background.

vine
Télécharger la présentation

Proximity-based Ranking of Biomedical Texts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proximity-based Ranking of Biomedical Texts Rey-Long Liu* and Yi-Chih Huang *Dept. of Medical Informatics Tzu Chi University Taiwan

  2. Outline • Research background • Problem definition • The proposed approach: PRE • Empirical evaluation • Conclusion A Proximity-based Ranker Enhancer

  3. Research Background A Proximity-based Ranker Enhancer

  4. Biomedical Information Need • Biomedical research requires relevant evidences in the huge and ever-growing biomedical literature • Retrieval of the evidences requires a system that • Accepts a natural language query for a biomedical information need, and • Ranks relevant texts higher for access or processing A Proximity-based Ranker Enhancer

  5. An Example Info Need • Query: urinary tract infection, criteria for treatment and admission (from OHSUMED) • A disease as the target concept (i.e., urinary tract infection) • Two concepts about the scenario of the information need (i.e., treatment and admission) • Neither special nor related to any disease A Proximity-based Ranker Enhancer

  6. Problem Definition A Proximity-based Ranker Enhancer

  7. Goals • Explore how text rankers may be improved by considering the completeness of query concepts appearing in a nearby area of the text being ranked • Develop a technique PRE (Proximity-based Ranker Enhancer) that • Measures contextual completeness of query concepts appearing in a nearby area in the text • Serves as a supplement to improve existing rankers A Proximity-based Ranker Enhancer

  8. Related Work • Biomedical text ranking • Using synonyms and considering diversity of passages, without considering term proximity • Text ranking • Individual text scoring techniques (e.g., BM25) and learning to rank techniques (e.g., Ranking SVM), without considering term proximity • Improving ranking by term proximity • Term proximity is employed, but contextual completeness was not considered A Proximity-based Ranker Enhancer

  9. The Proposed Approach: PRE A Proximity-based Ranker Enhancer

  10. Training Data Ranked Texts System Overview User PRE Underlying Ranker Text Ranker Development Training Testing Query (q) TF (Term Frequency) Assessment Text Ranking TF in d Text (d) A Proximity-based Ranker Enhancer

  11. TF Assessment • Three types of term proximity • Overall proximity (QTermTF) • Individual proximity (IndiP) • Collective proximity (CollP) • A term t may get a large TF increment in d, if • Many query terms appear frequently in d • Query terms are individually near to t at some places, and • Query terms collectively appear at a place near to t A Proximity-based Ranker Enhancer

  12. RTF(t,d,q) = TF(t,d)+TFincrement(t,d,q) • TFincrement(t,d,q) =QtermTF(d,q)IndiP(t,d,q)×CollP(t,d,q) • QtermTF(d,q) = Total TF of query terms in d • IndiP(t,d,q) =ΣmM-{t}SigmoidWeight(Mindist(t,m))/ MaxIndiP • Mindist(x,y) = shortest distance between x and y in d • SigmoidWeight(dt) = 1/(1+e-((|q|-1)-dt)) • CollP(t,d,q) = MaxkK{mM-{t} SigmoidWeight(dist(t,k,m))}/MaxCollP, where K is the set positions at which t appears in d • dist(t,k,m) = Distance between t (at position k) and m A Proximity-based Ranker Enhancer

  13. Empirical Evaluation A Proximity-based Ranker Enhancer

  14. Experimental Data • OHSUMED • A popular database of biomedical queries and references • 106 queries • 348,566 references • 16,140 query-reference pairs • Definitively relevant • Possibly relevant • Not relevant A Proximity-based Ranker Enhancer

  15. Underlying Rankers A Proximity-based Ranker Enhancer

  16. Baseline Ranker Enhancer • Three state-of-the-art techniques that enhanced text rankers by term proximity • The t-function • t() by [Tao & Zhai, 2007] • The p-function • p() by [Cummins & O’Riordan, 2009] • The proximity language model • PLM by [Zhao & Yun, 2009]. A Proximity-based Ranker Enhancer

  17. Evaluation Criteria • Evaluating how relevant references are ranked higher for users to access • Mean average precision (MAP) • Normalized discount cumulative gain at x (NDCG@X) A Proximity-based Ranker Enhancer

  18. Results A Proximity-based Ranker Enhancer

  19. A Proximity-based Ranker Enhancer

  20. Conclusion A Proximity-based Ranker Enhancer

  21. Term proximity may be comprehensively applied to improving various kinds of text rankers • It is helpful to integrate three types of term proximity • Overall proximity • Individual proximity • Collective proximity • Term proximity information may be encoded to re-assess TF of each term A Proximity-based Ranker Enhancer

More Related