1 / 32

CS4705

CS4705. Relationships among Words, Semantic Roles, and Word-Sense Disambiguation. Today. Lexical Relations Wordnet Semantic Role Review: Semantic Roles Selectional Restrictions Selectional Association Word-Sense Disambiguation Supervised Unsupervised Evaluation. Lexical Relations.

giacomo
Télécharger la présentation

CS4705

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS4705 Relationships among Words, Semantic Roles, and Word-Sense Disambiguation CS 4705

  2. Today • Lexical Relations • Wordnet • Semantic Role • Review: Semantic Roles • Selectional Restrictions • Selectional Association • Word-Sense Disambiguation • Supervised • Unsupervised • Evaluation

  3. Lexical Relations • Semantic Networks: Used to represent lexical relationships • e.g. WordNet (George Miller et al) • Most widely used hierarchically organized lexical database for English • Synset: set of synonyms, a dictionary-style definition (or gloss), and some examples of uses --> a concept • Databases for nouns, verbs, and modifiers • Applications can traverse network to find synonyms, antonyms, hyper- and hyponyms… • Available for download or online use • http://www.cogsci.princeton.edu/~wn

  4. Homonymy • Homonyms: Words with same form – orthography and pronunciation -- but different, unrelated meanings, or senses • A bank1holds investments in a custodial account in the client’s name. • As agriculture is burgeoning on the east bank2, the river will shrink even more

  5. http://www.etymonline.com/ • bank1 "financial institution," 1474, from either O.It. banca or M.Fr. banque (itself from the O.It. term), both meaning "table" (the notion is of the moneylender's exchange table), from a Gmc. source (cf. O.H.G. bank "bench"); see bank (2). The verb meaning "to put confidence in" (U.S. colloquial) is attested from 1884. Bank holiday is from 1871, though the tradition is as old as the Bank of England. Bankroll (v.) "to finance" is 1920s. To cry all the way to the bank was coined 1956 by flamboyant pianist Liberace, after a Madison Square Garden concert that was packed with patrons but panned by critics. • bank2 "earthen incline, edge of a river," c.1200, probably in O.E., from O.N. banki, from P.Gmc. *bangkon "slope," cognate with P.Gmc. *bankiz "shelf."

  6. Related Phenomena • Homophones (same pron/different orth) Read/red • Homographs (same orth/different pron) Bass/bass

  7. Polysemy • Words with multiple but related meanings • They rarely serve red meat. • He served as U.S. ambassador. • He might have served his time in prison. • idea bank, sperm bank, blood bank, bank bank • Can the two candidate senses be conjoined? ?He served his time and as ambassador to Norway. • Same etymology • Often a domain-dependent specialization

  8. Synonymy • Substitutability: different words, same meaning • Old/aged, pretty/attractive, food/sustenance, money How big is that plane? How large is that plane? How big are you? How large are you? • What makes words substitutable – and not? • Polysemy (large vs. old sense) • register: He’s really cheap/?parsimonious. • collocational constraints: roast beef, ?baked beef economy fare ?economy price

  9. How could we find Synonyms and Collocations automatically? • Synonyms: Identify words appearing frequently in similar contexts Blast victims were helped by civic-minded passersby. Public-spirited passersby came to the aid of this bombing victim. • Collocations: Identify synonyms or closely related words that do and don’t appear in similar contexts Flu victims, flu sufferers vs. ?Cold victims, cold sufferers… Roast turkey vs. Baked turkey

  10. Hyponomy • General: hypernym (super…ordinate) • dog is a hypernym of poodle • Test: ‘That is a poodle’ implies ‘that is a dog’ • Specific: hyponym (under..neath) • poodle is a hyponym of dog • Test: ‘That is a poodle’ implies ‘that is a dog’ • Ontology: set of domain objects • Taxonomy: Specification of relations between those objects • Object hierarchy: Structured hierarchy that supports feature inheritance (e.g. poodle inherits some properties of dog)

  11. Tropes, or Figures of Speech • Metaphor: one entity is given the attributes of another (tenor/vehicle/ground) • Life is a bowl of cherries. Don’t take it serious…. • We are the eyelids of defeated caves. ?? • GM killed the Fiero. (conventional metaphor: corp. as person) • Metonymy: one entity used to stand for another (replacive) • GM killed the Fiero. • The ham sandwich wants his check. (deferred reference) • Both extend existing sense to new meaning • Metaphor: completely different concept • Metonymy: related concepts

  12. Sum • Many definable word relations useful to NLP in different ways • Homonymy, polysemy, synonymy, hypernymy • Homography, homophony • Metaphor, metonymy • Collocations • Resources available to aid in processing • WordNet, FrameNet, online dictionaries,…. • A Huge Problem for NLP?

  13. Ambiguity and Word Sense Disambiguation • Recall: For semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’? Flies [V] vs. Flies [N] He robbed the bank. He sat on the bank. • How do we determine the correct sense of the word? • Machine Learning • Supervised methods • Lightly supervised and Unsupervised Methods • Bootstrapping • Dictionary-based techniques • Selectional Association

  14. Supervised WSD • Approaches: • Tag a corpus with correct senses of particular words (lexical sample) or all words (all-words task) • E.g. SENSEVAL corpora • Lexical sample: • Extract features which might predict word sense • POS? Word identity? Punctuation after? Previous word? Its POS? • Use Machine Learning algorithm to produce a classifier which can predict the senses of one word or many • All-words • Use semantic concordance: each open class word labeled with sense from dictionary or thesaurus

  15. E.g. SemCor (Brown Corpus), tagged with WordNet senses

  16. What Features Are Useful? • “Words are known by the company they keep” • How much ‘company’ do we need to look at? • What do we need to know about the ‘friends’? • POS, lemmas/stems/syntactic categories,… • Collocations: words that frequently appear with the target, identified from large corpora federal government, honor code, baked potato • Position is key • Bag-of-words: words that appear somewhere in a context window I want to play a musical instrument so I chose the bass. • Ordering/proximity not critical

  17. Punctuation, capitalization, formatting

  18. Rule Induction Learners and WSD • Given a feature vector of values for independent variables associated with observations of values for the training set • Top-down greedy search driven by information gain: how will entropy of (remaining) data be reduced if we split on this feature? • Produce a set of rules that perform best on the training data, e.g. • bank2 if w-1==‘river’ & pos==NP & src==‘Fishing News’… • … • Easy to understand result but many passes to achieve each decision, susceptible to over-fitting

  19. Naïve Bayes • ŝ = p(s|V), or • Where s is one of the senses S possible for a word w and V the input vector of feature values for w • Assume features independent, so probability of V is the product of probabilities of each feature, given s, so • p(V) same for any ŝ • Then

  20. How do we estimate p(s) and p(vj|s)? • p(si) is max. likelihood estimate from a sense-tagged corpus (count(si,wj)/count(wj)) – how likely is bank to mean ‘financial institution’ over all instances of bank? • P(vj|s) is max. likelihood of each feature given a candidate sense (count(vj,s)/count(s)) – how likely is the previous word to be ‘river’ when the sense of bank is ‘financial institution’ • Calculate for each possible sense and take the highest scoring sense as the most likely choice

  21. Decision List Classifiers • Transparent • Like case statements applying tests to input in turn fish within window --> bass1 striped bass --> bass1 guitar within window --> bass2 bass player -->bass1 • Yarowsky ‘96’s approach orders tests by individual accuracy on entire training set based on log-likelihood ratio

  22. Lightly Supervised Methods: Bootstrapping • Bootstrapping I • Start with a few labeled instances of target item as seeds to train initial classifier, C • Use high confidence classifications of C on unlabeled data as training data • Iterate • Bootstrapping II • Start with sentences containing words strongly associated with each sense (e.g. sea and music for bass), either intuitively or from corpus or from dictionary entries, and label those automatically • One Sense per Discourse hypothesis

  23. Dictionary Approaches • Problem of scale for all ML approaches • Building a classifier for each word with multiple senses • Machine-Readable dictionaries with senses identified and examples • Simplified Lesk: • Retrieve all content words occurring in context of target (e.g. Sailors love to fish for bass.) • Compute overlap with sense definitions of target entry • bass1: a musical instrument… • bass2: a type of fish that lives in the sea…

  24. bass1    /beɪs/ Pronunciation Key - Show Spelled Pronunciation[beys] Pronunciation Key - Show IPA PronunciationMusic. –adjective 1.low in pitch; of the lowest pitch or range: a bass voice; a bass instrument. 2.of or pertaining to the lowest part in harmonic music. –noun 3.the bass part. 4.a bass voice, singer, or instrument. 5.double bass. [Origin: 1400–50; late ME, var. of base2 with ss of basso ] bass2    /bæs/ Pronunciation Key - Show Spelled Pronunciation[bas] Pronunciation Key - Show IPA Pronunciation –noun, plural (especially collectively ) bass, (especially referring to two or more kinds or species ) bass·es. 1.any of numerous edible, spiny-finned, freshwater or marine fishes of the families Serranidae and Centrarchidae. 2.(originally) the European perch, Perca fluviatilis. [Origin: 1375–1425; late ME bas, earlier bærs, OE bærs (with loss of r before s as in ass2, passel, etc.); c. D baars, G Barsch, OSw agh-borre ]

  25. Choose sense with most content-word overlap • Original Lesk: • Compare dictionary entries of all content-words in context with entries for each sense • But….dictionary entries are short • Expand with entries of ‘related’ words that appear in the original entry • If tagged corpus available, collect all the words appearing in context of each sense of target word • e.g. all words appearing in sentences with bass1 added to signature for bass1 • Weight each by frequency of occurrence of word with that sense tagged in corpus (e.g. all senses of bass) to capture how discriminating a word is for the target word’s senses • Corpus Lesk performs best of all Lesk approaches

  26. Disambiguation via Selectional Restrictions • “Verbs are known by the company they keep” • Different verbs select for different thematic roles wash the dishes(takes washable-thing as patient) serve delicious dishes(takes food-type as patient) • Method: another semantic attachment in grammar • Semantic attachment rules are applied as sentences are syntactically parsed, e.g. VP --> V NP V serve <theme> {theme:food-type} • Selectional restriction violation: no parse

  27. But this means we must: • Write selectional restrictions for each sense of each predicate – or use FrameNet • Serve alone has 15 verb senses • Obtain hierarchical type information about each argument (using WordNet) • How many hypernyms does dish have? • How many words are hyponyms of dish? • But also: • Sometimes selectional restrictions don’t restrict enough (Which dishes do you like?) • Sometimes they restrict too much (Eatdirt, worm! I’ll eat my hat!) • Can we take a statistical approach?

  28. Selectional Association (Resnik ‘97) • Selectional Preference Strength: how much does a predicate tell us about the word class of its argument? George is a monster, George cooked a steak • SR(v): How different is p(c), the probability that any direct object will be a member of some class c, from p(c|v), the probability that a direct object of a specific verb will fall into that class? • Estimate conditional probabilities of word senses from a parsed corpus, counting how often each predicate occurs with an object argument • e.g. How likely is dish to be an object of served? • Jane served/V the dish/Obj • Then estimate the strength of association between each predicate and the super-class (hypernym) of the argument in Wordnet

  29. E.g. For each object x of serve (e.g. ragout, Mary, dish) • Look up all x’s hypernym classes in WordNet (e.g dish isa piece of crockery, dish isa food item, ragout isa food item, Mary isa person…) • Distribute “credit” for each of x’s senses occurring with serve among all hypernym classes (≈sense) to which x belongs (1/n for n classes) • Pr(c|v) is estimated at count(c,v)/count(v) • Why does this work? • Ambiguous words have many superordinate classes John served food/the dish/tuna/curry • The most common sense across all objects of the verb should eventually dominate the likelihood score

  30. How can we use this in wsd? • Choose the class (sense) of the direct object with the highest probability, given the verb Mary served the dish proudly. • Results: • Baselines: • random choice of word sense is 26.8% • choose most frequent sense (NB: requires sense-labeled training corpus) is 58.2% • Resnik’s: 44% correct from corpus only pred/arg relations labeled

  31. Evaluating WSD • In vivo/end-to-end/task-based/extrinsic vs. in vitro/stand-alone/intrinsic: evaluation in some task (parsing? q/a? IVR system?) vs. application independent • In vitro metrics: classification accuracy on held-out test set or precision/recall/f-measure if not all instances must be labeled • Baseline: • Most frequent sense? • Lesk algorithms • Ceiling: human annotator agreement

  32. Summing Up • Word relations: how can we identify different types? • Disambiguating among word senses • Next time: Ch 17: 3-5

More Related