1 / 78

Putting Meaning Into Your Trees

Putting Meaning Into Your Trees. Martha Palmer Collaborators: Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Scott Cotton Karin Kipper, Hoa Dang, Szuting Yi, Edward Loper, Jinying Chen, Tom Morton, William Schuler, Fei Xia, Joseph Rosenzweig, Dan Gildea, Christiane Fellbaum September 8, 2003.

Télécharger la présentation

Putting Meaning Into Your Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Putting Meaning Into Your Trees Martha Palmer Collaborators: Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Scott Cotton Karin Kipper, Hoa Dang, Szuting Yi, Edward Loper, Jinying Chen, Tom Morton, William Schuler, Fei Xia, Joseph Rosenzweig, Dan Gildea, Christiane Fellbaum September 8, 2003

  2. Elusive nature of “meaning” Natural Language Understanding Natural Language Processing or Natural Language Engineering Empirical techniques rule!

  3. Statistical Machine Translation results • CHINESE TEXT • The japanese court before china photo trade huge & lawsuit. • A large amount of the proceedings before the court dismissed workers. • japan’s court, former chinese servant industrial huge disasters lawsuit. • Japanese Court Rejects Former Chinese Slave Workers’ Lawsuit for Huge Compensation.

  4. Leverage from shallow techniques? • Still need an approximation of meaning for accurate MT, IR, Q&A, IE • Sense tagging • Labeled dependency structures • What do we have as available resources? • What can we do with them?

  5. Outline • Introduction – need for semantics • Sense tagging Issues highlighted by Senseval1 • VerbNet • Senseval2 – groupings, impact on ITA • Automatic WSD, impact on scores • Proposition Bank • Framesets, automatic role labellers • Hierarchy of sense distinctions • Mapping VerbNet to PropBank

  6. WordNet - Princeton • On-line lexical reference (dictionary) • Words organized into synonym sets <=> concepts • Hypernyms (ISA), antonyms, meronyms (PART) • Useful for checking selectional restrictions • (doesn’t tell you what they should be) • Typical top nodes - 5 out of 25 • (act, action, activity) • (animal, fauna) • (artifact) • (attribute, property) • (body, corpus)

  7. WordNet – president, 6 senses • president-- (an executive officer of a firm or corporation) -->CORPORATE EXECUTIVE, BUSINESS EXECUTIVE… LEADER 2. President of the United States, President, Chief Executive -- (the person who holds the office of head of state of the United States government; "the President likes to jog every morning")-->HEAD OF STATE, CHIEF OF STATE 3. president -- (the chief executive of a republic) -->HEAD OF STATE, CHIEF OF STATE 4. president, chairman, chairwoman, chair, chairperson -- (the officer who presides at the meetings of an organization; "address your remarks to the chairperson")--> PRESIDING OFFICER  LEADER 5. president -- (the head administrative officer of a college or university)--> ACADEMIC ADMINISTRATOR  …. LEADER 6. President of the United States, President, Chief Executive -- (the office of the United States head of state; "a President is elected every four years") --> PRESIDENCY, PRESIDENTSHIP  POSITION

  8. Limitations to WordNet • Poor inter-annotator agreement (73%) • Just sense tags - no representations • Very little mapping to syntax • No predicate argument structure • no selectional restrictions • No generalizations about sense distinctions • No hierarchical entries

  9. SIGLEX98/SENSEVAL • Workshop on Word Sense Disambiguation • 54 attendees, 24 systems, 3 languages • 34 Words (Nouns, Verbs, Adjectives) • Both supervised and unsupervised systems • Training data, Test data • Hector senses - very corpus based (mapping to WordNet) • lexical samples - instances, not running text • Inter-annotator agreement over 90% ACL-SIGLEX98,SIGLEX99, CHUM00

  10. Hector - bother, 10 senses • 1. intransitive verb, - (make an effort), after negation, usually with to infinitive; (of a person) to take the trouble or effort needed (to do something). Ex. “About 70 percent of the shareholders did not bother to vote at all.” • 1.1 (can't be bothered), idiomatic, be unwilling to make the effort needed (to do something), Ex. ``The calculations needed are so tedious that theorists cannot be bothered to do them.'' • 2. vi; after neg; with `about" or `with"; rarely cont – (of a person) to concern oneself (about something or someone) “He did not bother about the noise of the typewriter because Danny could not hear it above the sound of the tractor.” • 2.1 v-passive; with `about" or `with“ - (of a person) to be concerned about or interested in (something) “The only thing I'm bothered about is the well-being of the club.”

  11. Mismatches between lexicons:Hector - WordNet, shake

  12. Levin classes (3100 verbs) • 47 top level classes, 193 second and third level • Based on pairs of syntactic frames. John broke the jar. / Jars break easily. / The jar broke. John cut the bread. / Bread cuts easily. / *The bread cut. John hit the wall. / *Walls hit easily. / *The wall hit. • Reflect underlying semantic components contact, directed motion, exertion of force, change of state • Synonyms, syntactic patterns (conative), relations

  13. Confusions in Levin classes? • Not semantically homogenous • {braid, clip, file, powder, pluck, etc...} • Multiple class listings • homonymy or polysemy? • Alternation contradictions? • Carry verbs disallow the Conative, but include • {push,pull,shove,kick,draw,yank,tug} • also in Push/pull class, does take the Conative

  14. Intersective Levin classes

  15. Regular Sense Extensions • John pushed the chair. +force, +contact • John pushed the chairs apart. +ch-state • John pushed the chairs across the room. +ch-loc • John pushed at the chair. -ch-loc • The train whistled into the station. +ch-loc • The truck roared past the weigh station. +ch-loc AMTA98,ACL98,TAG98

  16. Intersective Levin Classes • More syntactically and semantically coherent • sets of syntactic patterns • explicit semantic components • relations between senses • VERBNET www.cis.upenn.edu/verbnet

  17. VerbNet • Computational verb lexicon • Clear association between syntax and semantics • Syntactic frames (LTAGs) and selectional restrictions (WordNet) • Lexical semantic information – predicate argument structure • Semantic components represented as predicates • Links to WordNet senses • Entries based on refinement of Levin Classes • Inherent temporal properties represented explicitly • during(E), end(E), result(E) TAG00, AAAI00, Coling00

  18. VerbNet Class entries: • Verb classes allow us to capture generalizations about verb behavior • Verb classes are hierarchically organized • Members have common semantic elements, thematic roles, syntactic frames and coherent aspect Verb entries: • Each verb can refer to more than one class (for different senses) • Each verb sense has a link to the appropriate synsets in WordNet (but not all senses of WordNet may be covered) • A verb may add more semantic information to the basic semantics of its class

  19. Hit class – hit-18.1 MEMBERS:[bang(1,3),bash(1),... hit(2,4,7,10), kick (3),...] THEMATIC ROLES: Agent, Patient, Instrument SELECT RESTRICTIONS: Agent(int_control), Patient(concrete), Instrument(concrete) FRAMES and PREDICATES:

  20. VERBNET

  21. VerbNet/WordNet

  22. Mapping WN-Hector via VerbNet SIGLEX99, LREC00

  23. SENSEVAL2 –ACL’01 Adam Kilgarriff, Phil Edmond and Martha Palmer All-words task Lexical sample task Czech Basque Dutch Chinese English English Estonian Italian Japanese Korean Spanish Swedish

  24. English Lexical Sample - Verbs • Preparation for Senseval 2 • manual tagging of 29 highly polysemous verbs (call, draw, drift, carry, find, keep, turn,...) • WordNet (pre-release version 1.7) • To handle unclear sense distinctions • detect and eliminate redundant senses • detect and cluster closely related senses NOT ALLOWED

  25. WordNet – call, 28 senses • name, call -- (assign a specified, proper name to; "They named their son David"; "The new school was named after the famous Civil Rights leader") -> LABEL 2. call, telephone, call up, phone, ring -- (get or try to get into communication (with someone) by telephone; "I tried to call you all night"; "Take two aspirin and call me in the morning") ->TELECOMMUNICATE 3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; "She called her children lazy and ungrateful") -> LABEL

  26. WordNet – call, 28 senses 4. call, send for -- (order, request, or command to come; "She was called into the director's office"; "Call the police!") -> ORDER 5. shout, shout out, cry, call, yell, scream, holler, hollo, squall -- (utter a sudden loud cry; "she cried with pain when the doctor inserted the needle"; "I yelled to her from the window but she couldn't hear me") -> UTTER 6. visit, call in, call -- (pay a brief visit; "The mayor likes to call on some of the prominent citizens") -> MEET

  27. Groupings Methodology • Double blind groupings, adjudication • Syntactic Criteria (VerbNet was useful) • Distinct subcategorization frames • call him a bastard • call him a taxi • Recognizable alternations – regular sense extensions: • play an instrument • play a song • play a melody on an instrument

  28. Groupings Methodology (cont.) • Semantic Criteria • Differences in semantic classes of arguments • Abstract/concrete, human/animal, animate/inanimate, different instrument types,… • Differences in the number and type of arguments • Often reflected in subcategorization frames • John left the room. • I left my pearls to my daughter-in-law in my will. • Differences in entailments • Change of prior entity or creation of a new entity? • Differences in types of events • Abstract/concrete/mental/emotional/…. • Specialized subject domains

  29. WordNet: - call, 28 senses WN2 , WN13,WN28 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN5 WN 16 WN6 WN23 WN12 WN17 , WN 11 WN10, WN14, WN21, WN24

  30. WordNet: - call, 28 senses, groups WN2, WN13,WN28 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN5 WN 16 WN6 WN23 WN12 WN17 , WN 11 WN10, WN14, WN21, WN24, Phone/radio Bird or animal cry Request Label Call a loan/bond Challenge Visit Loud cry Bid

  31. WordNet – call, 28 senses, Group1 • name, call -- (assign a specified, proper name to; "They named their son David"; "The new school was named after the famous Civil Rights leader") --> LABEL 3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; "She called her children lazy and ungrateful") --> LABEL 19. call-- (consider or regard as being; "I would not call her beautiful")--> SEE 22. address, call -- (greet, as with a prescribed form, title, or name; "He always addresses me with `Sir'"; "Call me Mister"; "She calls him by first name") --> ADDRESS

  32. Sense Groups: verb ‘develop’ WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20

  33. Results – averaged over 28 verbs

  34. Maximum Entropy WSDHoa Dang (in progress) • Maximum entropy framework • combines different features with no assumption of independence • estimates conditional probability that W has sense X in context Y, (where Y is a conjunction of linguistic features • feature weights are determined from training data • weights produce a maximum entropy probability distribution

  35. Features used • Topical contextual linguistic feature for W: • presence of automatically determined keywords in S • Local contextual linguistic features for W: • presence of subject, complements • words in subject, complement positions, particles, preps • noun synonyms and hypernyms for subjects, complements • named entity tag (PERSON, LOCATION,..) for proper Ns • words within +/- 2 word window

  36. Grouping improved sense identification for MxWSD • 75% with training and testing on grouped senses vs. 43% with training and testing on fine-grained senses • Most commonly confused senses suggest grouping: • (1) name, call--assign a specified proper name to; ``They called their son David'' • (2) call--ascribe a quality to or give a name that reflects a quality; ``He called me a bastard''; • (3) call--consider or regard as being; ``I would not call her beautiful'' • (4) address, call--greet, as with a prescribed form, title, or name; ``Call me Mister''; ``She calls him by his first name''

  37. Results – averaged over 28 verbs

  38. Results - first 5 Senseval2 verbs

  39. Summary of WSD • Choice of features is more important than choice of machine learning algorithm • Importance of syntactic structure (English WSD but not Chinese) • Importance of dependencies • Importance of an hierarchical approach to sense distinctions, and quick adaptation to new usages.

  40. Outline • Introduction – need for semantics • Sense tagging Issues highlighted by Senseval1 • VerbNet • Senseval2 – groupings, impact on ITA • Automatic WSD, impact on scores • Proposition Bank • Framesets, automatic role labellers • Hierarchy of sense distinctions • Mapping VerbNet to PropBank

  41. Powell met Zhu Rongji battle wrestle join debate Powell and Zhu Rongji met consult Powell met with Zhu Rongji Proposition:meet(Powell, Zhu Rongji) Powell and Zhu Rongji had a meeting Proposition Bank:From Sentences to Propositions meet(Somebody1, Somebody2) . . . When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane))

  42. Capturing semantic roles* • Charles broke [ ARG1 the LCD Projector.] • [ARG1 The windows] were broken by the hurricane. • [ARG1 The vase] broke into pieces when it toppled over. SUBJ SUBJ SUBJ *See also Framenet, http://www.icsi.berkeley.edu/~framenet/

  43. (S (NP-SBJ Analysts) • (VP have • (VP been • (VP expecting • (NP (NP a GM-Jaguar pact) • (SBAR (WHNP-1that) • (S (NP-SBJ *T*-1) • (VP would • (VP give • (NP the U.S. car maker) • (NP (NP an eventual (ADJP 30 %) stake) • (PP-LOC in (NP the British company)))))))))))) VP have been VP expecting SBAR NP a GM-Jaguar pact WHNP-1 that VP give NP Analysts have been expecting a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company. NP the US car maker NP an eventual 30% stake in the British company A TreeBanked Sentence S VP NP-SBJ Analysts NP S VP NP-SBJ *T*-1 would NP PP-LOC

  44. (S Arg0 (NP-SBJ Analysts) • (VP have • (VP been • (VP expecting • Arg1 (NP (NP a GM-Jaguar pact) • (SBAR (WHNP-1that) • (S Arg0 (NP-SBJ *T*-1) • (VP would • (VP give • Arg2 (NP the U.S. car maker) • Arg1 (NP (NP an eventual (ADJP 30 %) stake) • (PP-LOC in (NP the British company)))))))))))) a GM-Jaguar pact Arg0 that would give Arg1 *T*-1 an eventual 30% stake in the British company Arg2 the US car maker expect(Analysts, GM-J pact) give(GM-J pact, US car maker, 30% stake) The same sentence, PropBanked have been expecting Arg1 Arg0 Analysts

  45. English PropBankhttp://www.cis.upenn.edu/~ace/ • 1M words of Treebank over 2 years, May’01-03 • New semantic augmentations • Predicate-argument relations for verbs • label arguments: Arg0, Arg1, Arg2, … • First subtask, 300K word financial subcorpus (12K sentences, 29K predicates,1700 lemmas) • Spin-off: Guidelines • FRAMES FILES - (necessary for annotators) • 3500+ verbs with labeled examples, rich semantics, 118K predicates

  46. Frames Example: expect Roles: Arg0: expecter Arg1: thing expected Example: Transitive, active: Portfolio managers expect further declines in interest rates. Arg0: Portfolio managers REL: expect Arg1: further declines in interest rates

  47. Frames File example: give Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefsa standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation

  48. How are arguments numbered? • Examination of example sentences • Determination of required / highly preferred elements • Sequential numbering, Arg0 is typical first argument, except • ergative/unaccusative verbs (shake example) • Arguments mapped for "synonymous" verbs

  49. Trends in Argument Numbering • Arg0 = agent • Arg1 = direct object / theme / patient • Arg2 = indirect object / benefactive / instrument / attribute / end state • Arg3 = start point / benefactive / instrument / attribute • Arg4 = end point

  50. Additional tags (arguments or adjuncts?) • Variety of ArgM’s (Arg#>4): • TMP - when? • LOC - where at? • DIR - where to? • MNR - how? • PRP -why? • REC - himself, themselves, each other • PRD -this argument refers to or modifies another • ADV -others

More Related