1 / 28

Word Sense Disambiguation

Word Sense Disambiguation. Many words have multiple meanings E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text Applications: Machine translation Information retrieval Semantic interpretation of text. Tagging?.

Télécharger la présentation

Word Sense Disambiguation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Word Sense Disambiguation • Many words have multiple meanings • E.g, river bank, financial bank • Problem: Assign proper sense to each ambiguous word in text • Applications: • Machine translation • Information retrieval • Semantic interpretation of text

  2. Tagging? • Idea: Treat sense disambiguation like POS tagging, just with “semantic tags” • The problems differ: • POS tags depend on specific structural cues (mostly neighboring tags) • Senses depend on semantic context – less structured, longer distance dependency

  3. Approaches • Supervised learning: Learn from a pretagged corpus • Dictionary-Based Learning Learn to distinguish senses from dictionary entries • Unsupervised Learning Automatically cluster word occurrences into different senses

  4. Evaluation • Train and test on pretagged texts • Difficult to come by • Artificial data: ‘merge’ two words to form an ‘ambiguous’ word with two ‘senses’ • E.g, replace all occurrences of door and of window with doorwindow and see if the system figures out which is which

  5. Performance Bounds • How good is (say) 83.2%?? • Evaluate performance relative to lower and upper bounds: • Baseline performance: how well does the simplest “reasonable” algorithm do? • Human performance: what percentage of the time do people agree on classification?

  6. Supervised Learning • Each ambiguous word token wi = wk in the training is tagged with a sense from sk1,…,sknk • Each word token occurs in a context ci (usually defined as a window around the word occurrence – up to ~100 words long) • Each context contains a set of words used as features vij

  7. Bayesian Classification • Bayes decision rule: Classify s(wi) = arg maxs P(s | ci) • Minimizes probability of error • How to compute? Use Bayes’ Theorem:

  8. Bayes’ Classifier (cont.) • Note that P(c) is constant for all senses, therefore:

  9. Naïve Bayes • Assume: • Features are conditionally independent, given the example class, • Feature order doesn’t matter (bag of words model – repetition counts)

  10. Naïve Bayes Training • For all senses sk of w, do: • For all words vj in the vocabulary, do: • For all senses sk of w, do:

  11. Naïve Bayes Classification • For all senses sk of wi, do: • score(sk) = log P(sk) • For all words vj in the context window ci, do: • score(sk) += log P(vj | sk) • Choose s(wi) = arg maxsk score(sk)

  12. Significant Features • Senses of drug (Gale et al. 1992): ‘medication’ prices, prescription, patent, increase, consumer, pharmaceutical ‘illegal substance’ abuse, paraphernalia, illicit, alcohol, cocaine, traffickers

  13. Dictionary-Based Disambiguation Idea: Choose between senses of a word given in a dictionary based on the words in the definitions Cone: • A mass of ovule-bearing or pollen-bearing scales in trees of the pine family or in cycads that are arranged usually on a somewhat elongatedaxis • Something that resembles a cone in shape: as a crisp cone-shapedwafer for holding ice cream

  14. Algorithm (Lesk 1986) Define Di(w) as the bag of words in the ith definition for w Define E(w) as i Di(w) • For all senses sk of w, do: • score(sk) = similarity(Dk(w),[vj in c E(vj)]) • Choose s = arg maxsk score(sk)

  15. Similarity Metrics similarity(X, Y) =

  16. Simple Example ash: s1: a tree of the olive family s2: the solid residue left when combustible material is burned This cigar burns slowly and creates a stiff ash2 The ash1is one of the last trees to come into leaf After being struck by lightning the olivetree was reduced to ash?

  17. Some Improvements • Lesk obtained results of 50-70% accuracy Possible improvements: • Run iteratively, each time only using definitions of “appropriate” senses for context words • Expand each word to a set of synonyms, using also a thesaurus

  18. Thesaurus-Based Disambiguation • Thesaurus assigns subject codes to different words, assigning multiple codes to ambiguous words • t(sk) = subject code of sense sk for word w in the thesaurus • (t,v) = 1 iff t is a subject code for word v

  19. Simple Algorithm • Count up number of context words with same subject code: for each sense sk of wi, do: score(sk) = vj in ci(t(sk),vj) s(wi) = arg maxsk score(sk)

  20. Some Issues • Domain-dependence: In computer manuals, “mouse” will not be evidence for topic “mammal” • Coverage: “Michael Jordan” will not likely be in a thesaurus, but is an excellent indicator for topic “sports”

  21. Tuning for a Corpus • Use a naïve-Bayes formulation: P(t | c) = [P(t) vj in c P(vj | t) ] / [vj in c P(vj) ] • Initialize probabilities as uniform • Re-estimate P(t) and P(vj | t) for each topic t and each word vj by evaluating all contexts in the corpus, assuming the context has topic t if P(t | c) >  (where  is a predefined threshhold) • Disambiguate by choosing the highest probability topic

  22. Training (based on the paper): for all topics tl, let Wl be the set of words listed under the topic for all topics tl, let Tl = { c(w) | w  Wl } for all words vj , let Vj = { c | vj  c } for all words vj , topics tl, let: P(vj | tl) = |Vj Tl | / (j |Vj Tl | ) (with smoothing) for all topics tl, let: P(tl) = (j|Vj Tl | ) / (lj |Vj Tl | ) for all words vj , let: P(vj) = |Vj | / Nc (where Ncis the total number of contexts)

  23. Using a Bilingual Corpus • Use correlations between phrases in two languages to disambiguate E.g, interest = ‘legal share’ (acquire an interest) ‘attention’ (show interest) In German Beteiligung erwerben Interesse zeigen • Depending on where the translations of related words occur, determine which sense applies

  24. Scoring • Given a context c in which a syntactic relation R(w,v) holds between w and a context word v: • Score of sense sk is the number of contexts c’ in the second language such that R(w’,v’)c’ where w’ is a translation of sk and v’ is a translation of v. • Choose highest-scoring sense

  25. Global Constraints • One sense per discourse: The sense of a word tends to be consistent within a given document • One sense per collocation: Nearby words provide strong and consistent clues for sense, depending on distance, order, and syntactic relationship

  26. Examples D1living existence of plant and animal life living classified as either plant or animal ??? bacterial and plant cells are enclosed D2 living contains varied plant and animal life living the most common plant life factory protected by plant parts remaining

  27. Collocations • Collocation: a pair of words that tend to occur together • Rank collocations that occur with different senses by: rank(f) = P(sk1 | f) / P(sk2 | f) • A higher rank for a collocation f means it is more highly indicative of its sense

  28. for all senses sk of wdo: Fk = all collocations in sk’s definition Ek = { } while at least one Ek changed, do: for all senses sk of wdo: Ek = { ci | exists fm in ci & fmin Fk } for all senses sk of wdo: Fk = { fm | P(sk | fm) / P(sn | fm) > a } for all documents ddo: determine the majority sense s* of w in d assign all occurrences of w to s*

More Related