1 / 40

A Statistical Semantic Parser that Integrates Syntax and Semantics

A Statistical Semantic Parser that Integrates Syntax and Semantics. Ruifang Ge and Raymond J. Mooney. June 29, 2005. [ giver John] gave [ entity given to Mary] [ thing given a pen]. Motivation.

jorryn
Télécharger la présentation

A Statistical Semantic Parser that Integrates Syntax and Semantics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Statistical Semantic Parser that Integrates Syntax and Semantics Ruifang Ge and Raymond J. Mooney June 29, 2005

  2. [giver John] gave [entity given to Mary] [thing given a pen] Motivation • Most recent work in semantic parsing has focused on semantic role labeling [Gildea & Jurafsky,2002] • Identifying the semantic relations, or semantic roles, filled by constituents of a sentence within a semantic frame of a target word • Deep semantic parsing • NL sentence  complete formal Meaning Representation (MR) • Directly Executable

  3. NL NL If the ball is in our penalty area, all our players except player 4 should stay in our half If the ball is in our penalty area, all our players except player 4 should stay in our half MR MR ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) NL If the ball is in our penalty area, all our players except player 4 should stay in our half MR ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) CLang: the RoboCup Coach Language • RoboCup is a simulated robot soccer competition • Coachable teams can take advice on how to play the game • Coaching instructions are provided in a formal meaning representation language called CLang

  4. Outline • System overview • Integrated Parsing Model • Experimental Evaluation • Related Work • Future Work and Conclusion

  5. S-bowner NP-player VP-bowner PRP$-team NN-player CD-unum VB-bowner NP-null our player 2 has DT-null NN-null the ball S-bowner NP-player VP-bowner PRP$-team NN-player CD-unum VB-bowner NP-null our player 2 has DT-null NN-null the ball SCISSOR: Semantic Composition that Integrates Syntax and Semantics to get Optimal Representations • Based on a fairly standard approach to compositional semantics [Jurafsky and Martin, 2000] • A statistical parser is used to generate a semantically augmented parse tree (SAPT) • Augment Collins’ head-driven model 2 (Bikel’s implementation, 2004) to incorporate semantic labels • Translate SAPT into a complete formal meaning representation (MR) MR: bowner(player(our,2))

  6. NL Sentence learner SAPT Training Examples SAPT TRAINING TESTING ComposeMR MR Overview of SCISSOR Integrated Semantic Parser

  7. ComposeMR bowner player bowner null team player unum bowner 2 null null our player has the ball

  8. ComposeMR bowner(_) player(_,_) bowner(_) null team player(_,_) unum bowner(_) 2 null null our player has the ball

  9. player(team,unum) bowner(player) ComposeMR bowner(player(our,2)) bowner(_) bowner(_) bowner(_) bowner(_) player(our,2) player(_,_) player(_,_) null null team player(_,_) unum bowner(_) 2 null null our player has the ball

  10. Outline • System overview • Integrated Parsing Model • Experimental Evaluation • Related Work • Future Work and Conclusion

  11. S(has) NP(player) VP(has) NP(ball) PRP$ NN CD VB DT NN our player 2 has the ball Collins’ Head-Driven Model 2 • A generative, lexicalized model • Each node on the tree has a syntactic label, it is also lexicalized with its head word

  12. Modeling Rule Productions as Markov Processes S(has) VP(has) Ph(VP | S, has)

  13. Modeling Rule Productions as Markov Processes S(has) VP(has) {NP } { } Ph(VP | S, has) × × Prc({} | S, VP, has) Plc({NP} | S, VP, has)

  14. Modeling Rule Productions as Markov Processes S(has) NP(player) VP(has) {NP } { } Ph(VP | S, has) × × Prc({} | S, VP, has) × Plc({NP} | S, VP, has) Pd(NP(player) | S, VP, has, LEFT, {NP})

  15. Modeling Rule Productions as Markov Processes S(has) NP(player) VP(has) { } { } Ph(VP | S, has) × × Prc({} | S, VP, has) × Plc({NP} | S, VP, has) Pd(NP(player) | S, VP, has, LEFT, {NP})

  16. Modeling Rule Productions as Markov Processes S(has) NP(player) VP(has) STOP { } { } Ph(VP | S, has) × × Prc({} | S, VP, has) × Plc({NP} | S, VP, has) Pd(NP(player) | S, VP, has, LEFT, {NP}) × Pd(STOP | S, VP, has, LEFT, {})

  17. Modeling Rule Productions as Markov Processes S(has) NP(player) VP(has) STOP STOP { } { } Ph(VP | S, has) × × Prc({} | S, VP, has) × Plc({NP} | S, VP, has) Pd(NP(player) | S, VP, has, LEFT, {NP}) × Pd(STOP | S, VP, has, LEFT, {}) × Pd(STOP | S, VP, has, RIGHT, {})

  18. S-bowner(has) S(has) NP-player(player) VP-bowner(has) NP(player) VP(has) NP-null(ball) NP(ball) PRP$-team NN-player CD-unum VB-bowner NN-null DT-null PRP$ NN CD VB DT NN our player 2 has the ball our player 2 has the ball Integrating Semantics into the Model • Use the same Markov processes • Add a semantic labelto each node • Add semantic subcat frames • Give semantic subcategorization preferences • bowner takes a player as its argument

  19. Adding Semantic Labels into the Model S-bowner(has) VP-bowner(has) Ph(VP-bowner | S-bowner, has)

  20. Adding Semantic Labels into the Model S-bowner(has) VP-bowner(has) {NP}-{player} { }-{ } Ph(VP-bowner | S-bowner, has) × Plc({NP}-{player} | S-bowner, VP-bowner, has) × Prc({}-{}| S-bowner, VP-bowner, has)

  21. Adding Semantic Labels into the Model S-bowner(has) NP-player(player) VP-bowner(has) {NP}-{player} { }-{ } Ph(VP-bowner | S-bowner, has) × Plc({NP}-{player} | S-bowner, VP-bowner, has) × Prc({}-{}| S-bowner, VP-bowner, has) × Pd(NP-player(player) | S-bowner, VP-bowner, has, LEFT, {NP}-{player})

  22. Adding Semantic Labels into the Model S-bowner(has) NP-player(player) VP-bowner(has) { }-{ } { }-{ } Ph(VP-bowner | S-bowner, has) × Plc({NP}-{player} | S-bowner, VP-bowner, has) × Prc({}-{}| S-bowner, VP-bowner, has) × Pd(NP-player(player) | S-bowner, VP-bowner, has, LEFT, {NP}-{player})

  23. Adding Semantic Labels into the Model S-bowner(has) NP-player(player) VP-bowner(has) STOP { }-{ } { }-{ } Ph(VP-bowner | S-bowner, has) × Plc({NP}-{player} | S-bowner, VP-bowner, has) × Prc({}-{}| S-bowner, VP-bowner, has) × Pd(NP-player(player) | S-bowner, VP-bowner, has, LEFT, {NP}-{player}) × Pd(STOP | S-bowner, VP-bowner, has, LEFT, {}-{})

  24. Adding Semantic Labels into the Model S-bowner(has) NP-player(player) VP-bowner(has) STOP STOP { }-{ } { }-{ } Ph(VP-bowner | S-bowner, has) × Plc({NP}-{player} | S-bowner, VP-bowner, has) × Prc({}-{}| S-bowner, VP-bowner, has) × Pd(NP-player(player) | S-bowner, VP-bowner, has, LEFT, {NP}-{player}) × Pd(STOP | S-bowner, VP-bowner, has, LEFT, {}-{}) × Pd(STOP | S-bowner, VP-bowner, has, RIGHT, {}-{})

  25. Smoothing • Each label in SAPT is the combination of a syntactic label and a semantic label • Increases data sparsity • Use Bayes rule to break the parameters down Ph(H | P, w) = Ph(Hsyn, Hsem | P, w) = Ph(Hsyn | P, w) × Ph(Hsem | P, w, Hsyn) • Details in the paper

  26. Outline • System overview • Integrated Parsing Model • Experimental Evaluation • Related Work • Future Work and Conclusion

  27. Experimental Corpora • CLang • 300 randomly selected pieces of coaching advice from the log files of the 2003 RoboCup Coach Competition • Formal instruction (MR)  NL sentences (4 annotators)  SAPT • 22.52 words on average • Geoquery [Zelle & Mooney, 1996] • 250 queries on the given U.S. geography database • NL sentences  MR  SAPT • 6.87 words on average

  28. Experimental Methodology • Evaluated using standard 10-fold cross validation • Correctness • CLang: it exactly matches the correct representation • Geoquery: the resulting query retrieved the same answer as the correct representation when submitted to the database • Metrics Our player 2 has the ball bowner(player(our,2)) bowner(player(our,4))

  29. Compared Systems • CHILL [Tang & Mooney, 2001] • Learn control rules for parsing based on Inductive Logic Programming (ILP) • SILT [Kate et al., 2005] • Learn pattern-based transformation rules • SILT-string • SILT-tree • GEOBASE • The original hand-built parser on Geoquery

  30. Precision Learning Curve for CLang

  31. Recall Learning Curve for CLang

  32. F1 Measure Learning Curve for CLang

  33. Precision Learning Curve for Geoquery

  34. Recall Learning Curve for Geoquery

  35. F1 Measure Learning Curve for Geoquery

  36. Related Work • PRECISE [Popescu, 2003] • Designed to work specially on NL database interfaces • [Miller et al., 1996; Miller et al., 2000] use a similar approach to train a statistical parser integrating syntax and semantics. • Does not utilize subcat information • Task: Information Extraction

  37. Outline • System overview • Integrated Parsing Model • Experimental Evaluation • Related Work • Future Work and Conclusion

  38. Future Work and Conclusion • Explore methods that can automatically generate SAPT to minimize the annotation effort • By augmenting a state-of-the-art statistical parsing model to include semantics, SCISSOR learns a statistical parser that produces a SAPT that is then used to compositionally generate a formal MR • Experimental results in Clang and Geoquery showed that SCISSOR generally produces more accurate semantic representations than several previous approaches.

  39. The End Data: http://www.cs.utexas.edu/users/ml/nldata.html

  40. MR size in two domains • Measure the average number of tokens in MR per sentence • CLang: 14.24 • Geoquery: 5.32

More Related