1 / 68

LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing

LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Lecture 16: Lexical Semantics, Wordnet, etc November 23, 2004 Dan Jurafsky Thanks to Jim Martin for many of these slides! Meaning Traditionally, meaning in language has been studied from three perspectives

liam
Télécharger la présentation

LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 138/238 SYMBSYS 138Intro to Computer Speech and Language Processing Lecture 16: Lexical Semantics, Wordnet, etc November 23, 2004 Dan Jurafsky Thanks to Jim Martin for many of these slides! LING 138/238 Autumn 2004

  2. Meaning • Traditionally, meaning in language has been studied from three perspectives • The meanings of individual words • How those meanings combine to make meanings for individual sentences or utterances • How those meanings combine to make meanings for a text or discourse • We are going to focus today on word meaning, also called lexical semantics. LING 138/238 Autumn 2004

  3. Outline: Lexical Semantics • Concepts about word meaning including: • Homonymy, Polysemy, Synonymy • Thematic roles • Two computational areas • Enabling resource: • WordNet • Enabling technology: • Word sense disambiguation LING 138/238 Autumn 2004

  4. Preliminaries • What’s a word? • Definitions we’ve used over the quarter: Types, tokens, stems, roots, inflected forms, etc... • Lexeme: An entry in a lexicon consisting of a pairing of a form with a single meaning representation • Lexicon: A collection of lexemes LING 138/238 Autumn 2004

  5. Relationships between word meanings • Homonymy • Polysemy • Synonymy • Antonymy • Hypernomy • Hyponomy • Meronomy LING 138/238 Autumn 2004

  6. Homonymy • Homonymy: • Lexemes that share a form • Phonological, orthographic or both • But have unrelated, distinct meanings • Clear example: • Bat (wooden stick-like thing) vs • Bat (flying scary mammal thing) • Or bank (financial institution) versus bank (riverside) • Can be homophones, homographs, or both: • Homophones: • Write and right • Piece and peace LING 138/238 Autumn 2004

  7. Homonymy causes problems for NLP applications • Text-to-Speech • Same orthographic form but different phonological form • bass vs bass • Information retrieval • Different meanings same orthographic form • QUERY: bat care • Machine Translation • Speech recognition • Why? LING 138/238 Autumn 2004

  8. Polysemy • The bankis constructed from red brickI withdrew the money from the bank • Are those the same sense? • Or consider the following WSJ example • While some banks furnish sperm only to married women, others are less restrictive • Which sense of bank is this? • Is it distinct from (homonymous with) the river bank sense? • How about the savings bank sense? LING 138/238 Autumn 2004

  9. Polysemy • A single lexeme with multiple related meanings (bank the building, bank the finantial institution) • Most non-rare words have multiple meanings • The number of meanings is related to its frequency • Verbs tend more to polysemy • Distinguishing polysemy from homonymy isn’t always easy (or necessary) LING 138/238 Autumn 2004

  10. Metaphor and Metonymy • Specific types of polysemy • Metaphor: • Germany will pull Slovenia out of its economic slump. • I spent 2 hours on that homework. • Metonymy • The White House announced yesterday. • This chapter talks about part-of-speech tagging • Bank (building) and bank (financial institution) LING 138/238 Autumn 2004

  11. How do we know when a word has more than one sense? • ATIS examples • Which flights serve breakfast? • Does America West serve Philadelphia? • The “zeugma” test: • ?Does United serve breakfast and San Jose? LING 138/238 Autumn 2004

  12. Synonyms • Word that have the same meaning in some or all contexts. • filbert hazelnut • youth adolescent • big large • automobile car • Two lexemes are synonyms if they can be successfully substituted for each other in all situations LING 138/238 Autumn 2004

  13. Synonyms • But there are few (or no) examples of perfect synonymy. • Why should that be? • Even if many aspects of meaning are identical • Still may not preserve the acceptability based on notions of politeness, slang, register, genre, etc. • Example: • Big and large? • That’s my big sister • That’s my large sister LING 138/238 Autumn 2004

  14. Antonyms • Words that are opposites with respect to one feature of their meaning • Otherwise, they are very similar! • Dark light • Boy girl • Hot cold • Up down • In out LING 138/238 Autumn 2004

  15. Word Similarity Computation • For various computational applications it’s useful to find words which are similar to another word. • Machine translation (to find near-synonyms) • Information retrieval (to do “query expansion”) • Two ways to do this: • Automatic computation based on distributional similarity • Lsa.colorado.edu • Use a thesaurus which lists similar words. • WordNet (we’ll return to this in a few slides) LING 138/238 Autumn 2004

  16. Hyponymy • Hyponymy: the meaning of one lexeme is a subset of the meaning of another • Since dogs are canids • Dog is a hyponym of canid and • Canid is a hypernym of dog • Similarly, • Car is a hyponym of vehicle • Vehicle is a hypernym of car LING 138/238 Autumn 2004

  17. WordNet • A hierarchically organized lexical database • On-line thesaurus + aspects of a dictionary • Versions for other languages are under development LING 138/238 Autumn 2004

  18. WordNet • Demo: • http://www.cogsci.princeton.edu/cgi-bin/webwn LING 138/238 Autumn 2004

  19. Format of Wordnet Entries LING 138/238 Autumn 2004

  20. WordNet Noun Relations LING 138/238 Autumn 2004

  21. WordNet Verb and Adj Relations LING 138/238 Autumn 2004

  22. WordNet Hierarchies LING 138/238 Autumn 2004

  23. How is “sense” defined in WordNet? • The critical thing to grasp about WordNet is the notion of a synset; it’s their version of a sense or a concept • Example: table as a verb to mean defer • > {postpone, hold over, table, shelve, set back, defer, remit, put off} • For WordNet, the meaning of this sense of tableis this list. • Another example: give has 45 senses in WN; one of them is: {supply, provide, render, furnish}. LING 138/238 Autumn 2004

  24. The Internal Structure of Words • Previous section talked about relations between words • Now we’ll look at “internal structure” of word meaning. We’ll look at only a few aspects: • Thematic roles in predicate-bearing lexemes • Selection restrictions on thematic roles • Decompositional semantics of predicates • Feature-structures for nouns LING 138/238 Autumn 2004

  25. Thematic Roles • Thematic roles: • Thematic roles are semantic generalizations over the specific roles that occur with specific verbs. • I.e. Takers, givers, eaters, makers, doers, killers, all have something in common • -er • They’re all the agents of the actions • We can generalize across other roles as well to come up with a small finite set of such roles LING 138/238 Autumn 2004

  26. Thematic Roles LING 138/238 Autumn 2004

  27. Thematic Role Examples LING 138/238 Autumn 2004

  28. Thematic Roles • Takes some of the work away from the verbs. • It’s not the case that every verb is unique and has to completely specify how all of its arguments uniquely behave. • Provides a locus for organizing semantic processing • It permits us to distinguish near surface-level semantics from deeper semantics LING 138/238 Autumn 2004

  29. Linking • Thematic roles, syntactic categories and their positions in larger syntactic structures are all intertwined in complicated ways. For example… • AGENTS are often subjects • In a VP->V NP NP rule, the first NP is often a GOAL and the second a THEME LING 138/238 Autumn 2004

  30. More on Linking • John opened the door • AGENT THEME • The door was opened by John • THEME AGENT • The door opened • THEME • John opened the door with the key • AGENT THEME INSTRUMENT LING 138/238 Autumn 2004

  31. Inference • Given an event expressed by a verb that expresses a transfer, what can we conclude about the thing labeled THEME with respect to the thing labeled GOAL? LING 138/238 Autumn 2004

  32. Deeper Semantics • From the WSJ… • He melted her reserve with a husky-voiced paean to her eyes. • If we label the constituents He and her reserve as the Melter and Melted, then those labels lose any meaning they might have had. • If we make them Agent and Theme then we can do more inference. LING 138/238 Autumn 2004

  33. Problems • What exactly is a role? • What’s the right set of roles? • Are such roles universals? • Are these roles atomic? • I.e. Agents • Animate, Volitional, Direct causers, etc • Can we automatically label syntactic constituents with thematic roles? LING 138/238 Autumn 2004

  34. Selection Restrictions • I want to eat someplace near campus • Using thematic roles we can now say that eat is a predicate that has an AGENT and a THEME • What else? • And that the AGENT must be capable of eating and the THEME must be something typically capable of being eaten LING 138/238 Autumn 2004

  35. As Logical Statements • For eat… • Eating(e) ^Agent(e,x)^ Theme(e,y)^Isa(y, Food) (adding in all the right quantifiers and lambdas) LING 138/238 Autumn 2004

  36. Back to WordNet • Use WordNet hyponyms (type) to encode the selection restrictions LING 138/238 Autumn 2004

  37. Selectional Restrictions as loose approximation of deeper semantics • Unfortunately, verbs are polysemous and language is creative… WSJ examples… • … ate glass on an empty stomach accompanied only by water and tea • you can’t eat gold for lunch if you’re hungry • … get it to try to eat Afghanistan LING 138/238 Autumn 2004

  38. Solutions • Eat glass • This is actually about an eating event, so might change our model of “Edible”. • Eat gold • Also about eating, and the can’t creates a scope that permits the THEME to not be edible • Eat Afghanistan • This is harder, its not really about eating at all LING 138/238 Autumn 2004

  39. Word Sense Disambiguation (WSD) • Given a word in context, decide which sense of the word this is. LING 138/238 Autumn 2004

  40. Selection Restrictions for WSD • Semantic selection restrictions can be used to help in WSD. • They can help disambiguate: • Ambiguous arguments to unambiguous predicates • Ambiguous predicates with unambiguous arguments • Ambiguity all around LING 138/238 Autumn 2004

  41. WSD and Selection Restrictions • Ambiguous arguments • Prepare a dish • Wash a dish • Ambiguous predicates • Serve Denver • Serve breakfast • Both • Serves vegetarian dishes LING 138/238 Autumn 2004

  42. Problems • As we saw earlier, selection restrictions are violated all the time. • This doesn’t mean that the sentences are ill-formed or preferred less than others. • This approach needs some way of categorizing and dealing with the various ways that restrictions can be violated LING 138/238 Autumn 2004

  43. Supervised Machine Learning Approaches • Supervised machine learning approach: • a training corpus of words tagged in context with their sense • used to train a classifier that can tag words in new text • Just as we saw for part-of-speech tagging, speech recognition, statistical MT. LING 138/238 Autumn 2004

  44. WSD Tags • What’s a tag? • A dictionary sense? • For example, for WordNet an instance of “bass” in a text has 8 possible tags or labels (bass1 through bass8). LING 138/238 Autumn 2004

  45. WordNet Bass The noun ``bass'' has 8 senses in WordNet • bass - (the lowest part of the musical range) • bass, bass part - (the lowest part in polyphonic music) • bass, basso - (an adult male singer with the lowest voice) • sea bass, bass - (flesh of lean-fleshed saltwater fish of the family Serranidae) • freshwater bass, bass - (any of various North American lean-fleshed freshwater fishes especially of the genus Micropterus) • bass, bass voice, basso - (the lowest adult male singing voice) • bass - (the member with the lowest range of a family of musical instruments) • bass -(nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes) LING 138/238 Autumn 2004

  46. Representations • Most supervised ML approaches require a very simple representation for the input training data. • Vectors of sets of feature/value pairs • I.e. files of comma-separated values • So our first task is to extract training data from a corpus with respect to a particular instance of a target word • This typically consists of a characterization of the window of text surrounding the target LING 138/238 Autumn 2004

  47. Representations • This is where ML and NLP intersect • If you stick to trivial surface features that are easy to extract from a text, then most of the work is in the ML system • If you decide to use features that require more analysis (say parse trees) then the ML part may be doing less work (relatively) if these features are truly informative LING 138/238 Autumn 2004

  48. Surface Representations • Collocational and co-occurrence information • Collocational • Features about words at specific positions near target word • Often limited to just word identity and POS • Co-occurrence • Features about words that occur anywhere in the window (regardless of position) • Typically limited to frequency counts LING 138/238 Autumn 2004

  49. Examples • Example text (WSJ) • An electric guitar and bass player stand off to one side not really part of the scene, just as a sort of nod to gringo expectations perhaps • Assume a window of +/- 2 from the target LING 138/238 Autumn 2004

  50. Examples • Example text • An electric guitar andbassplayer stand off to one side not really part of the scene, just as a sort of nod to gringo expectations perhaps • Assume a window of +/- 2 from the target LING 138/238 Autumn 2004

More Related