1 / 18

Collective Word Sense Disambiguation

Collective Word Sense Disambiguation. David Vickrey Ben Taskar Daphne Koller. Word Sense Disambiguation. Clues. The electricity plant supplies 500 homes with power . vs. A plant requires water and sunlight to survive. Clues. Tricky:. That plant produces bottled water.

whitby
Télécharger la présentation

Collective Word Sense Disambiguation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller

  2. Word Sense Disambiguation Clues The electricity plant supplies 500 homes with power. vs. A plant requires water and sunlight to survive. Clues Tricky: That plant produces bottled water.

  3. WSD as Classification • Senses s1, s2, …, sk correspond to classes c1, c2, …, ck • Features: properties of context of word occurrence • Subject or verb of sentence • Any word occurring within 4 words of occurrence • Document: set of features corresponding to an occurrence The electricity plant supplies 500 homes with power.

  4. Simple Approaches • Only features are what words appear in context • Naïve Bayes • Discriminative, e.g. SVM • Problems: • Feature set not rich enough • Data extremely sparse • space occurs 38 times in corpus with 200,000 words

  5. Available Data • WordNet – electronic thesaurus • Words grouped by meaning into synsets • Slightly over 100,000 synsets • For nouns and verbs, hierarchy over synsets Animal Bird Mammal Dog, Hound, Canine Retriever Terrier

  6. Available Data • Around 400,000 word corpus labeled with synsets from WordNet • Sample sentences from WordNet • Very sparse for most words

  7. What Hasn’t Worked • Intuition: context of “dog” similar to context of “retriever” • Use hierarchy to determine possibly useful data • Using cross-validation, learn what data is actually useful • This hasn’t worked out very well

  8. Why? • Lots of parameters (not even counting parameters estimated using MLE) • > 100K for one model, ~ 20K for another • Not much data (400K words) • a, the, and, of, to occur ~ 65K times (together) • Hierarchy may not be very useful • Hand-built; not designed for this task • Features not very expressive • Luke is looking at this more closely using an SVM

  9. Collective WSD Ideas: • Determine senses of all words in a document simultaneously • Allows for richer features • Train on unlabeled data as well as labeled • Lots and lots of unlabeled text available

  10. Model • Variables: • S1,S2, …, Sn – synsets • W1,W2, …, Wn – words, always observed S1 S2 S3 S4 S5 W1 W2 W3 W4 W5

  11. Model • Each synset generated from previous context – size of context a parameter (4) n ∏ P(Wi | Si) * P(Si | Si-3,Si-2,Si-1) P(S,W) = i = 1 P(Si=s | Si-3,Si-2,Si-1) = Z(si-3,si-2,si-1) exp(λs(si-3)+λs(si-2)+λs(si-1)+λs) P(W) = Σ P(S,W)

  12. Learning • Two sets of parameters • P(Wi | Si) – Given current estimates of marginals P(Si), expected counts • λs(s’) – For s’  Domain(Si-1), s  Domain(Si), gradient descent on log likelihood gives: [ P(w,si-3,si-2,s’,s) – P(w,si-3,si-2,s’) * P(s | si-3,si-2,s’)] λs(s’) +=Σ Si-3,Si-2

  13. Efficiency • Only need to calculate marginals over contexts • Forwards-backwards • Issue: some words have many possible synsets (40-50) – want very fast inference • Possibly prune values?

  14. WordNet and Synsets • Model uses WordNet to determine domain of Si • Synset information should be more reliable • This allows us learn without any labeled data • Consider synsets {eagle,hawk}, {eagle (golf shot)}, and {hawk(to sell)} • Since parameters depend only on synset, even without labeled data, can find correct clustering

  15. Richer Features • Heuristic: “One sense per discourse” = usually, within a document any given word only takes one of its possible senses • Can capture this using long-range links • Could assume each word independent of all occurrences besides the ones immediately before and after • Or, could use approximate inference (Kikuchi)

  16. Richer Features • Can reduce feature sparsity using hierarchy (e.g., replace all occurrences of “dog” and “cat” with “animal”) • Need collective classification to do this • Could add “global” hidden variables to try to capture document subject

  17. Advanced Parameters • Lots of parameters • Regularization likely helpful • Could tie parameters together based on similarity in the WordNet hierarchy • Ties in what I was working on before • More data in this situation (unlabeled)

  18. Experiments • Soon

More Related