1 / 18

A confidence-based framework for disambiguating geographic terms

A confidence-based framework for disambiguating geographic terms Erik Rauch, Michael Bukatin, and Kenneth Baker MetaCarta, Inc. ‘wine’ in Europe. Al Hamra. (= ‘red’ in Arabic). Local and non-local information. More non-local information -> too many states to get probabilities. Madison.

fcarroll
Télécharger la présentation

A confidence-based framework for disambiguating geographic terms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A confidence-based framework for disambiguating geographic terms Erik Rauch, Michael Bukatin, and Kenneth Baker MetaCarta, Inc.

  2. ‘wine’ in Europe

  3. Al Hamra (= ‘red’ in Arabic)

  4. Local and non-local information More non-local information -> too many states to get probabilities Madison ‘s downtown Wisconsin Milwaukee

  5. Deir az Zor • (32.10 N 41.11 E), 0.325 • (25.03 N 31.44 E), 0.151 • (….) confidence • 38 01'10.5"N 121 44'48.8"W • four miles south of Lusaka • (22.10 S 15.51 E) Candidate places

  6. Minister Ishihara Ishihara, Japan (32.36 N 147.21 E) Local context resident of Madison Madison, WI; Madison, ID; Madison, CT; Madison, KY…

  7. Context affects confidence • Increase or decrease c(p,n) based on strength of context words • “by Madison” vs. “President Madison” • can be added manually or automatically • and/or use HMM

  8. Local context problems Madison family attractions Milwaukee Madison, WI; Madison, ID; Madison, CT; Madison, KY…

  9. Using spatial patterns of geographic references

  10. Increase c(p,n) based on number of other references: Enclosing regions or nearby points Madison Wisconsin Milwaukee

  11. Ishihara, Japan’s leading epidemiologist, Ishihara, Japan (32.36 N 147.21 E) Pitfalls

  12. Training • “Philadelphia” is usually geographic; “Bend” usually isn’t • If name n often refers to point p in documents, give (n,p) high confidence to start with • Use average confidence in a large corpus

  13. Training cont’d • Extract local linguistic contexts that often occur with geographic names in tagged corpora • Or train HMM

  14. Relevance • Several dimensions to relevance: • Traditional textual relevance of query terms • Georelevance Query: “cheese” in France

  15. Georelevance • Depends on: • Attributes of the geotext, e.g. document frequency, font size, position • Geoconfidence • Aim: combination reflects user’s preferred balance between recall and correctness of the geographic reference • e.g. Georelevance = query term relevance * geoconfidence

  16. Conclusion • Ambiguity problem much worse with large gazetteers • Can use probabilistic methods where feasible (local information), combine with confidence-based heuristics

More Related