1 / 46

Augmenting Wikipedia with Named Entity Tags

Augmenting Wikipedia with Named Entity Tags. Wisam Dakka Columbia University. Silviu Cucerzan Microsoft Research. IJCNLP 2008. outline. 1 Introduction 2 Related Work 3 Classifying Wikipedia Pages 4 Features Used. Independent Views 5 Challenges 6 Experiments and Findings

quana
Télécharger la présentation

Augmenting Wikipedia with Named Entity Tags

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Augmenting Wikipedia with Named Entity Tags WisamDakka Columbia University SilviuCucerzan Microsoft Research IJCNLP 2008

  2. outline 1 Introduction 2 Related Work 3 Classifying Wikipedia Pages 4 Features Used. Independent Views 5 Challenges 6 Experiments and Findings 7 Conclusions and Future Work

  3. 1 Introduction

  4. 1 Introduction

  5. 1 Introduction

  6. outline 1 Introduction 2 Related Work 3 Classifying Wikipedia Pages 4 Features Used. Independent Views 5 Challenges 6 Experiments and Findings 7 Conclusions and Future Work

  7. the objective • assigning to each document in a collection one or several labels from a given set • algorithm • SVM • features • traditional • bag-of-words • Wikipedia-specific feature sets 2 Related Work

  8. outline 1 Introduction 2 Related Work 3 Classifying Wikipedia Pages 4 Features Used. Independent Views 5 Challenges 6 Experiments and Findings 7 Conclusions and Future Work

  9. Types of Wikipedia pages • Disambiguation Page (DIS) • Common Page (COMM) • Named Entity Page 3 Classifying Wikipedia Pages

  10. Entity Classes • Animated Entities (PER) • Organization Entities (ORG) • Location Entities (LOC) • Miscellaneous Entities (MISC) 3 Classifying Wikipedia Pages

  11. Animated Entities (PER) • Human entities • real person • in fictional works • mythological deities • Non-human entities • particular animal • alien 3 Classifying Wikipedia Pages

  12. Leonardo da Vinci • Human entities • real person • in fictional works • mythological deities • Non-human entities • particular animal • alien 3 Classifying Wikipedia Pages

  13. Leonhard Euler • Human entities • real person • in fictional works • mythological deities • Non-human entities • particular animal • alien 3 Classifying Wikipedia Pages

  14. Harry Potter • Human entities • real person • in fictional works • mythological deities • Non-human entities • particular animal • alien 3 Classifying Wikipedia Pages

  15. Sonny (I, robot) • Human entities • real person • in fictional works • mythological deities • Non-human entities • particular animal • alien 3 Classifying Wikipedia Pages

  16. Zeus • Human entities • real person • in fictional works • mythological deities • Non-human entities • particular animal • alien 3 Classifying Wikipedia Pages

  17. Apollo • Human entities • real person • in fictional works • mythological deities • Non-human entities • particular animal • alien 3 Classifying Wikipedia Pages

  18. Garfield • Human entities • real person • in fictional works • mythological deities • Non-human entities • particular animal • alien 3 Classifying Wikipedia Pages

  19. Alien • Human entities • real person • in fictional works • mythological deities • Non-human entities • particular animal • alien 3 Classifying Wikipedia Pages

  20. Organization Entities (ORG) • Typical examples are businesses • “Microsoft”, “Ford” • governmental bodies • “United States Congress” • non-governmental organizations • “Republican Party”, “American Bar Association” 3 Classifying Wikipedia Pages

  21. Organization Entities (ORG) • science and health units • “Massachusetts General Hospital” • sports organizations and teams • “Angolan Football Federation”, “San Francisco 49ers” • religious organizations • “Church of Christ” • entertainment organizations • “San Francisco Mime Troupe”, the rock band “The Police” 3 Classifying Wikipedia Pages

  22. Location Entities (LOC) • Geo-Political entities • “Hawaii”, “European Union”, “Australia”, and “Washington, D.C.” • Locations • “the Solar system”, “Mars”, “Hudson River”, and “Mount Rainier” • Facilities • airports, highways, streets, etc 3 Classifying Wikipedia Pages

  23. Miscellaneous Entities (MISC) • Events • “Olympic Games” • Art works • books, movies, TV programs • Artifacts • camera “Nikon D4“, the software “photoshop” • Processes • “Ettinghausen effect” • Formulas or Algorithms 3 Classifying Wikipedia Pages

  24. outline 1 Introduction 2 Related Work 3 Classifying Wikipedia Pages 4 Features Used. Independent Views 5 Challenges 6 Experiments and Findings 7 Conclusions and Future Work

  25. 4 Features Used. Independent Views

  26. 4 Features Used. Independent Views

  27. 4 Features Used. Independent Views 4.1 Page-Based Features 4.2 Context Features

  28. 4.1 Page-Based Features • Bag of Words (BOW) • Structured Data (STRUCT) • First Paragraph (FPAR) • Abstract (ABS) • Surface Forms and Disambiguations (SFD) 4 Features Used. Independent Views

  29. 4.2 Context Features • Unigram Context (UCON) • Bigram Context (BCON) 4 Features Used. Independent Views

  30. outline 1 Introduction 2 Related Work 3 Classifying Wikipedia Pages 4 Features Used. Independent Views 5 Challenges 6 Experiments and Findings 7 Conclusions and Future Work

  31. refer to entities that do not exist in Wikipedia • abstracts and structure features are only available for 68% and 79% of the pages, respectively • only had available several hundred labeled examples • feature space is very large, and many noise 5 Challenges

  32. outline 1 Introduction 2 Related Work 3 Classifying Wikipedia Pages 4 Features Used. Independent Views 5 Challenges 6 Experiments and Findings 7 Conclusions and Future Work

  33. 6 Experiments and Findings 6.1 Training Data 6.2 Classification 6.3 Results on Bag-of-words 6.4 Results on Other Feature Groups 6.5 Results for Co-training

  34. Human Judged Data (HJD) • Human Judged Data Extended (HJDE) 6.1 Training Data

  35. 6.1 Training Data

  36. 6.1 Training Data

  37. 6 Experiments and Findings 6.1 Training Data 6.2 Classification 6.3 Results on Bag-of-words 6.4 Results on Other Feature Groups 6.5 Results for Co-training

  38. algorithms • SVMs • Naïve Bayes 6.2 Classification

  39. report the results • binary classification • identify all the Wikipedia pages of type PER • 5-fold classification • PER, COM,ORG, LOC, and MISC 6.2 Classification

  40. 6 Experiments and Findings 6.1 Training Data 6.2 Classification 6.3 Results on Bag-of-words 6.4 Results on Other Feature Groups 6.5 Results for Co-training

  41. 6.3 Results on Bag-of-words

  42. 6 Experiments and Findings 6.1 Training Data 6.2 Classification 6.3 Results on Bag-of-words 6.4 Results on Other Feature Groups 6.5 Results for Co-training

  43. 6.4 Results on Other Feature Groups

  44. 6.4 Results on Other Feature Groups

  45. 6 Experiments and Findings 6.1 Training Data 6.2 Classification 6.3 Results on Bag-of-words 6.4 Results on Other Feature Groups 6.5 Results for Co-training

  46. outline 1 Introduction 2 Related Work 3 Classifying Wikipedia Pages 4 Features Used. Independent Views 5 Challenges 6 Experiments and Findings 7 Conclusions and Future Work

More Related