1 / 34

Generation

Generation. Aims of this talk. Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other work in progress: RMRS SEM-I. Outline of talk. Towards modular generation Why MRS? MRS and chart generation Data-driven techniques

casper
Télécharger la présentation

Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generation

  2. Aims of this talk • Discuss MRS and LKB generation • Describe larger research programme: modular generation • Mention some interactions with other work in progress: • RMRS • SEM-I

  3. Outline of talk • Towards modular generation • Why MRS? • MRS and chart generation • Data-driven techniques • SEM-I and documentation

  4. Modular architecture Language independent component Meaning representation Language dependent realization string or speech output

  5. Desiderata for a portable realization module • Application independent • Any well-formed input should be accepted • No grammar-specific/conventional information should be essential in the input • Output should be idiomatic

  6. Architecture (preview) External LF SEM-I Internal LF specialization modules Chart generator control modules String

  7. Why MRS? • Flat structures • independence of syntax: conventional LFs partially mirror tree structure • manipulation of individual components: can ignore scope structure etc • lexicalised generation • composition by accumulation of EPs: robust composition • Underspecification

  8. An excursion: Robust MRS • Deep Thought: integration of deep and shallow processing via compatible semantics • All components construct RMRSs • Principled way of building robustness into deep processing • Requirements for consistency etc help human users too

  9. Extreme flattening of deep output some every y dog1 every some x cat x y chase cat y dog1 chase x y x e x y e x y lb1:every_q(x), RSTR(lb1,h9), BODY(lb1,h6), lb2:cat_n(x), lb5:dog_n_1(y), lb4:some_q(y), RSTR(lb4,h8), BODY(lb4,h7), lb3:chase_v(e),ARG1(lb3,x), ARG2(lb3,y), h9 qeq lb2,h8 qeq lb5

  10. Extreme Underspecification • Factorize deep representation to minimal units • Only represent what you know • Robust MRS • Separating relations • Separate arguments • Explicit equalities • Conventions for predicate names and sense distinctions • Hierarchy of sorts on variables

  11. Chart generation with the LKB • Determine lexical signs from MRS • Determine possible rules contributing EPs (`construction semantics’: compound rule etc) • Instantiate signs (lexical and rule) according to variable equivalences • Apply lexical rules • Instantiate chart • Generate by parsing without string position • Check output against input

  12. Lexical lookup for generation • _like_v_1(e,x,y) – return lexical entry for sense 1 of verb like • temp_loc_rel(e,x,y) – returns multiple lexical entries • multiple relations in one lexical entry: e.g., who, where • entries with null semantics: heuristics

  13. Instantiation of entries • _like_v_1(e,x,y) & named(x,”Kim”) & named(y,”Sandy”) • find locations corresponding to `x’s in all FSs • replace all `x’s with constant • repeat for `y’s etc • Also for rules contributing construction semantics • `Skolemization’ (misleading name ...)

  14. Lexical rule application • Lexical rules that contribute EPs only used if EP is in input • Inflectional rules will only apply if variable has the correct sort • Lexical rule application does morphological generation (e.g., liked, bought)

  15. Chart generation proper • Possible lexical signs added to a chart structure • Currently no indexing of chart edges • chart generation can use semantic indices, but current results suggest this doesn’t help • Rules applied as for chart parsing: edges checked for compatibility with input semantics (bag of EPs)

  16. Root conditions • Complete structures must consume all the EPs in the input MRS • Should check for compatibility of scopes • precise qeq matching is (probably) too strict • exactly same scopes is (probably) unrealistic and too slow

  17. Generation failures due to MRS issues • Well-formedness check prior to input to generator (optional) • Lexical lookup failure: predicate doesn’t match entry, wrong arity, wrong variable types • Unwanted instantiations of variables • Missing EPs in input: syntax (e.g., no noun), lexical selection • Too many EPs in input: e.g., two verbs and no coordination

  18. Improving generation via corpus-based techniques • CONTROL: e.g. intersective modifier order: • Logical representation does not determine order • wet(x) & weather(x) & cold(x) • UNDERSPECIFIED INPUT: e.g., • Determiners: none/a/the/ • Prepositions: in/on/at

  19. Constraining generation for idiomatic output • Intersective modifier order: e.g., adjectives, prepositional phrases • Logical representation does not determine order • wet(x) & weather(x) & cold(x)

  20. Adjective ordering • Constraints / preferences • big red car • * red big car • cold wet weather • wet cold weather (OK, but dispreferred) • Difficult to encode in symbolic grammar

  21. Corpus-derived adjective ordering • ngrams perform poorly • Thater: direct evidence plus clustering • positional probability • Malouf (2000): memory-based learning plus positional probability: 92% on BNC

  22. Underspecified input to generation We bought a car on Friday Accept: pron(x) & a_quant(y,h1,h2) & car(y) & buy(epast,x,y) & on(e,z) & named(z,Friday) and: pron(x) & general_q(y,h1,h2) & car(y) & buy(epast,x,y) & temploc(e,z) & named(z,Friday) And maybe: pron(x1pl) & car(y) & buy(epast,x,y) & temp_loc(e,z) & named(z,Friday)

  23. Guess the determiner • We went climbing in _ Andes • _ president of _ United States • I tore _ pyjamas • I tore _ duvet • George doesn’t like _ vegetables • We bought _ new car yesterday

  24. Determining determiners • Determiners are partly conventionalized, often predictable from local context • Translation from Japanese etc, speech prosthesis application • More `meaning-rich’ determiners assumed to be specified in the input • Minnen et al: 85% on WSJ (using TiMBL)

  25. Preposition guessing • Choice between temporal in/on/at • in the morning • in July • on Wednesday • on Wednesday morning • at three o’clock • at New Year • ERG uses hand-coded rules and lexical categories • Machine learning approach gives very high precision and recall on WSJ, good results on balanced corpus (Lin Mei, 2004, Cambridge MPhil thesis)

  26. SEM-I: semantic interface • Meta-level: manually specified `grammar’ relations (constructions and closed-class) • Object-level: linked to lexical database for deep grammars • Definitional: e.g. lemma+POS+sense • Linked test suites, examples, documentation

  27. SEM-I development • SEM-I eventually forms the `API’: stable, changes negotiated. • SEM-I vs Verbmobil SEMDB • Technical limitations of SEMDB • Too painful! • `Munging’ rules: external vs internal • SEM-I development must be incremental

  28. Role of SEM-I in architecture • Offline • Definition of `correct’ (R)MRS for developers • Documentation • Checking of test-suites • Online • In unifier/selector: reject invalid RMRSs • Patching up input to generation

  29. Goal: semi-automated documentation [incr tsdb()] and semantic test-suite Lex DB ERG Documentation strings Object-level SEM-I Auto-generate examples semi-automatic Documentation examples, autogenerated on demand Meta-level SEM-I autogenerate appendix

  30. Robust generation • SEM-I an important preliminary • check whether generator input is semantically compatible with grammars • Eventually: hierarchy of relations outside grammars, allowing underspecification • `fill-in’ of underspecified RMRS • exploit work on determiner guessing etc

  31. Architecture (again) External LF SEM-I Internal LF specialization modules Chart generator control modules String

  32. Interface • External representation • public, documented • reasonably stable • Internal representation • syntax/semantics interface • convenient for analysis • External/Internal conversion via SEM-I

  33. Guaranteed generation? • Given a well-formed input MRS/RMRS, with elementary predications found in SEM-I (and dependencies) • Can we generate a string? with input fix up? negotiation? • Semantically bleached lexical items: which, one, piece, do, make • Defective paradigms, negative polarity, anti-collocations etc?

  34. Next stages • SEM-I development • Documentation and test suite integration • Generation from RMRSs produced by shallower parser (or deep/shallow combination) • Partially fixed text in generation (cogeneration) • Further statistical modules: e.g., locational prepositions, other modifiers • More underspecification • Gradually increase flexibility of interface to generation

More Related