1 / 25

Corpora and Statistical Methods Lecture 5

Corpora and Statistical Methods Lecture 5. Albert Gatt. Application 3: Verb selectional restrictions . Observation. Some verbs place high restrictions on the semantic category of the NPs they take as arguments. Assumption : we’re focusing attention on Direct Objects only

morgan
Télécharger la présentation

Corpora and Statistical Methods Lecture 5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Corpora and Statistical MethodsLecture 5 Albert Gatt

  2. Application 3: Verb selectional restrictions

  3. Observation • Some verbs place high restrictions on the semantic category of the NPsthey take as arguments. • Assumption: we’re focusing attention on Direct Objects only • e.g. eat selects for FOOD DOs: • eat cake • eat some fresh vegetables • grow selects for LEGUME DOs: • grow potatoes

  4. Not all verbs are equally constraining • Some verbs seem to place fewer restrictions than others: • see doesn’t seem too restrictive: • see John • see the potato • see the fresh vegetables • …

  5. Problem definition • For a given verb and a potential set of arguments (nouns), we want to learn to what extent the verb selects for those arguments • rather than individual nouns, we’re better off using noun classes (FOOD etc), since these allow us to generalise more • can obtain these using a standard resource, e.g. WordNet

  6. A short detour: Kullback-Leibler divergence

  7. Kullback-Leibler divergence • We are often in a position where we estimate a probability distribution from (incomplete) data • This problem is inherent in sampling. • We end up with a distribution P, which is intended as a model of distribution Q. • How good is P as a model? • Kullback-Leiblerdivergence tells us how well our model matches the actual distribution.

  8. Motivating example • Suppose I’m interested in the semantic type or class to which a noun belongs, e.g.: • cake, meat, cauliflower are types of FOOD (among other things) • potato, carrot are types of LEGUME (among other things) • How do I infer this? • It helps if I know that certain predicates, like grow select for some types of DO, not others • *grow meat, *grow cake • grow potatoes, grow carrots

  9. Motivating example cont/d • Ingredients • C: the class of interest (e.g. LEGUME) • v: the verb of interest (e.g. grow) • P(C) = probability of class C • prior probability of finding some element of C as DO of any verb • P(C|v) = probability of C given that we know that a noun is a DO of grow • this is my posterior probability • More precise way of asking the question: • Does the probability distribution of C change given the info about v?

  10. Ingredients for KL Divergence • some prior distribution P • some posterior distribution Q • Intuition: KL-Divergence measures how much information we gain about P, given that we know Q • if it’s 0, then we gain no info • Given two probability distributions P and Q, with probability mass functions p(x) and q(x), KL-Divergence is denoted D(p||q)

  11. Calculating KL-Divergence divergence between prior and posterior probability distributions

  12. More on the interpretation of KL-Divergence • If probability distribution P is interpreted as “the truth” and distribution Q is my approximation, then: • D(p||q) tells me how much extra info I need to add to Q to get to the actual truth

  13. Back to our problem: Applying KL-divergence to selectional restrictions

  14. Resnik’s model (Resnik 1996) • 2 main ingredients: • Selectional Preference Strength (S): how strongly a verb constrains its direct object (a global estimate) • Selectional Association (A): how much a verb v is associated with a given noun class (a specific estimate for a given class)

  15. Notation • v = a verb of interest • S(v) = the selectional preference strength of v • c = a noun class • C = the set of all the noun classes • A(v,c) = the selectional association between v and class c

  16. Selectional Preference Strength • S(v) is the KL-Divergence between: • the overall prior distribution of all noun classes • the posterior distribution of noun classes in the direct object position of v • how much info we gain from knowing the probability that members of a class occur as DO of v • works as a global estimate of how much v constrains its arguments semantically • the more it constrains them, the more info we stand to gain from knowing that an argument occurs as DO of v

  17. S(grow): prior vs. posterior Source: Resnik 1996, p. 135

  18. Calculating S(v) This quantifies the extent to which our prior and posterior probability estimates diverge.  how much info do we gain about C by knowing it’s the object of v?

  19. Some more examples • How much info do we gain if we know what a noun is DO of? • quite a lot if it’s an argument of eat • not much if it’s an argument of find • none if it’s an argument of see • Source: Manning and Schutze 1999, p. 290

  20. Selectional association • This is estimated based on selectional preference strength • tells us how much a verb is associated with a specific class, given the extent to which it constrains its arguments • given a class c, A(v,c) tells us how much of S(v) is contributed by c

  21. Calculating A(v,c) this is part of our summation for S(v) dividing by S(v) gives the proportion of S(v) which is caused by class c

  22. From A(v,c) to A(v,n) • We know how to estimate the association strength of a class with v • Problem: • some nouns can occur in more than one class • Let classes(n) be the classes in which noun n belongs:

  23. Example • Susan interrupted the chair. • chair is in class FURNITURE • chair is in class PEOPLE • A(interrupt,PEOPLE) > A(interrupt,FURNITURE) • A(interrupt,chair) = A(interrupt,PEOPLE) • Note that this is a kind of word-sense disambiguation!

  24. Some results from Resnik 1996 • There are some fairly atypical examples: • these are due to the disambiguation method • e.g. tragedy can be in COMM class, and so is assigned A(answer,COMM) as it’s a(v,n)

  25. Overall evaluation • Resnik’s results were shown to correlate very well with results from a psycholinguistic study • The method is promising: • seems to mirror human intuitions • may have some psychological validity • Possibly an alternative, data-driven account of the semantic bootstrapping hypothesis of Pinker 1989?

More Related