1 / 30

Adaptive Context Features for Toponym Resolution in Streaming News

Adaptive Context Features for Toponym Resolution in Streaming News. Group 12 Hari Kishan Bandaru V S P V S K Kumar Parimi Sneha Anand Yeluguri. Paper. Adaptive Context Features for Toponym Resolution in Streaming News Michael D. Lieberman , Hanan Samet

Télécharger la présentation

Adaptive Context Features for Toponym Resolution in Streaming News

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive Context Features for Toponym Resolutionin Streaming News Group 12 HariKishanBandaru V S P V S K Kumar Parimi SnehaAnandYeluguri

  2. Paper • Adaptive Context Features for Toponym Resolution in Streaming News • Michael D. Lieberman , Hanan Samet • Venue: In SIGIR’12: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval

  3. Outline • Motivation • Related work • Problem Definition • Key concepts • Method • Validation/Results • Conclusion

  4. Motivation • Demand for ever growing volumes of news and information. • People strive to stay up-to-date. • Internet-enabled mobile devices require location-based services.

  5. Related work • Several commercial products for geotagging text are available, such as • MetaCarta’s Geotagger • Thomson Reuters’s OpenCalais • Yahoo!’s Placemaker

  6. Problem Definition • The problem of assigning each toponym its correct lat/long values in the process of Geotagging, called toponym resolution, is a classification problem, where each of the possible interpretations for each toponym is classified as correct or incorrect, can be solved using our adaptive context features.

  7. Introduction • News itself often has a strong geographic component. • Articles describing events that are relevant to geographic locations of interest to their readers. • Understand the geographic content present in the articles (Geotagging).

  8. Geotagging Steps • Toponym recognition • finding all textual references to geographic locations. • Toponym Resolution • choosing the correct location interpretation for each toponym.

  9. Key concepts • GEOTAGGING FRAMEWORK • Toponym Recognition • Toponym Resolution • Resolution Features • ADAPTIVE CONTEXT FEATURES • Proximity Features • Sibling Features • Feature Computation • Feature Propagation

  10. Toponym Recognition • Toponym recognition procedure is designed as a multifaceted process involving • both rule-based and statistics-based • Perform lookups into various tables of entity names including location names, abbreviations, business names, person names, as well as cue words

  11. Toponym Recognition • NLP tools, an NER package to recognize toponyms and other entities, and perform extensive post-processing on its output to ensure higher quality. • also perform • part-of-speech (POS) tagging to find phrases of proper • nouns, since names of locations (and other types of entities) • tend to be composed of proper nouns

  12. Toponym Resolution • Methods from supervised machine learning to implement toponym resolution were used. • For a given toponym/interpretation pair (t, lt), decision is correct or incorrect. • Location interpretations are drawn from a gazetteer

  13. Toponym Resolution • Decision tree-based ensemble classifier method random forests. • The random forests method constructs many decision trees based on different random subsets of the dataset, sampled with replacement. • Each decision tree is constructed using random subsets of features from the training feature vectors.

  14. Previous Methods • One early proposed method considered the use of SVM regression to estimate a distance function based on feature vector values that is intended to capture the distance between a given lt, and t’s ground truth interpretation.

  15. Resolution Features • Used several baseline toponym resolution features • I: Number of interpretations for t. • P: The population of lt, where a larger population indicates that lt is more well-known. • A: Number of alternate names for lt in various languages. More names indicates greater renown of lt. • D: Geographic distance of lt from an interpretation of a dateline toponym, which establishes a general location context for a news article. • L: Geographic distance of lt from the newspaper’s local lexicon, the expected location of its primary audience, expressed as a lat/long point.

  16. Adaptive Context Features • Features reflect two aspects of toponym co-ocurrence and the evidence that interpretations impart to each other • Proximate interpretations • Sibling interpretations

  17. Proximity Features • These are based on geographic distance. • Find for each other toponym o in the window around t the closest interpretation lo to lt. • The author computes the proximity feature for (t, lt) as the average of the geographic distances to the other interpretations. • The learning procedure can learn appropriate distance thresholds from its training data.

  18. Sibling Features • Capture the relationships between textually proximate toponyms that share the same country, state, or other administrative division. • For each toponym/interpretation pair (t, lt), sibling feature value the number of other toponyms o in the window around t with an interpretation that is a sibling of lt at a given resolution.

  19. Adaptive Features

  20. Feature Accuracy • Window breadth, corresponds to size of the window around t . • Window depth is the maximum number of interpretations to be considered for each toponym in the window. • Rank these interpretations using various factors like GeoNames, Population of the location, Geographic distance.

  21. Compute adaptive context features.

  22. Validation/Results • General difficulty of geotagging due to large gazetteer, large amount of toponym ambiguity. • The extensive experiments performed on adaptive method and competing geotagging methods: • Thomson Reuters’s OpenCalais, and • Yahoo!’s Placemaker • Vary the adaptive context parameters(window breadth and depth) and their affect on • feature computation time • accuracy of the Adaptive method

  23. Gazetteer Ambiguity Toponyms and the number of interpretations

  24. Datasets Breakdown of location types within each of test corpora

  25. Resolution Accuracy Resolution accuracy of various methods

  26. Resolution Accuracy(Contd.) Importance of features used in the Adaptive method

  27. Adaptive Parameters

  28. Conclusion And Future Work • Adaptive context features serve as a flexible, useful addition to geotagging algorithms for streaming news and other textual domains. • Test different toponyms weightings in window to judge their effect on resolution accuracy. • Consider clusters of news articles about the same topic and design other features using these clusters.

  29. Thank You

  30. Queries

More Related