1 / 12

Analyzing Mutual Information in Health Care: A Study on Boolean Operations in Information Retrieval

This study analyzes the mutual information and choice of using "AND" and "OR" operators in the context of Great Britain health care terms. By comparing their performance in Mean Average Precision (MAP), the study examines how these operators affect relevance and information retrieval. The relationships between the terms (G,B)-(H,C) are explored, and an advanced Boolean operation of combining terms is discussed. The study also evaluates the relationship between mutual information and the difference between MAP for OR and MAP for AND operations. Future work includes investigating broader queries and exploring the impact of MI variances on using AND or OR operators in information retrieval algorithms.

mateo
Télécharger la présentation

Analyzing Mutual Information in Health Care: A Study on Boolean Operations in Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mutual Information and Choice of AND and OR Dayu 18 Nov 2005

  2. An Example • Query No.605 Great Britain health care • We choose it because it consists of 4 terms • Performance in MAP

  3. Using Two terms • Based on the performance (in MAP) of “AND” and “OR” two terms, we guess the manner that these two terms affect relevance • Great Britain health care  G B H C

  4. What does “Yes” mean? • If “Yes” (i.e. MAPAND> MAPOR), it means that these two terms can complement or disambiguate each other to make more relevant information. • Denoted by term1-term2 • If “No” (i.e. MAPAND< MAPOR), it means that these two terms • (1) seldom co-occur or • (2) more or less synonyms • Denoted by (term1,term2) • If MAPAND≈ MAPOR, it means that these two terms always co-occur

  5. Overall Relationships In conclusion, relationships of each pair of the four terms are consistent. It’s (G,B)-(H,C)

  6. Advanced Boolean Operation • (G,B)-(H,C) • Could we use (G or B) and (H or C)? • Performance MAP=0.0762 • Compared with:

  7. A Method to estimate the relationship using MI • By mutual information. • MI=P(A,B)/P(A)P(B) • P(A,B)= # of IUs contains both A and B / total # of IUs • P(A)= # of IUs contains A / total # of IUs • P(B)= # of IUs contains A / total # of IUs Hypothesis: The MI is bigger, we have more confidence to use OR

  8. Relationship between MI and (MAPor-MAPand)/min(MAPand,MAPor)

  9. Social b 0.78 0.78 Tax 1.07 a 1.07 Securities 0.79 c 0.79 Variance of MI = 0.019

  10. Query: SDI Star Wars b 0. 8 a Variance of MI = 0.076 1.1 c 0.4

  11. Query: college education advantage b 0. 56 Variance of MI = 0.017 a 0.41 c 0.23

  12. Future Work • Investigate on more widespread queries. • Does the variance of MI between each pair affect to use AND or OR? • Should we additionally bring MI of two terms into the computation of allo-T edge?

More Related