1 / 57

Deep Grammars in Hybrid Machine Translation

Deep Grammars in Hybrid Machine Translation. Helge Dyvik. University of Bergen. Lexicon, Lexical Semantics, Grammar, and Translation for Norwegian. A 4-year project (2002 - 2006) involving groups at: The University of Oslo The University of Bergen NTNU (The University of Trondheim)

moe
Télécharger la présentation

Deep Grammars in Hybrid Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deep Grammars in Hybrid Machine Translation Helge Dyvik University of Bergen

  2. Lexicon, Lexical Semantics, Grammar, and Translation for Norwegian • A 4-year project (2002 - 2006) involving groups at: • The University of Oslo • The University of Bergen • NTNU (The University of Trondheim) • Cooperation with PARC (John Maxwell) and others

  3. The LOGON system Schematic architecture

  4. XLE: Xerox Linguistic Environment • A platform developed over more than 20 years • at Xerox PARC (now PARC) • Developer: John Maxwell • LFG grammar development • Parsing • Generation • Transfer • Stochastic parse selection • Interaction with shallow methods

  5. An LFG analysis: Det regnet 'It rained'

  6. ParGram: The Parallel Grammar Project A long-term project (1993-) • Develops parallel grammars on XLE: • English, French, German, Norwegian, Japanese, Urdu, Welsh, Malagasy, Arabic, Hungarian, Chinese, Vietnamese • ‘Parallel grammars’ means parallel f-structures: A common inventory of features Common principles of analysis

  7. LOGON Analysis Modules Output-input Supporting knowledge base Input string •Tokenization •Named ent. •Compounds •Morphology Norsk ordbank lexicon NorGram String of stems and tags c-structures LFG lexicons: •NKL-derived •Hand coded Lexical templates f-structures XLE Parser Syntactic rules MRSs Rule templates

  8. Scope of NorGram • Lexicon: about 80 000 lemmas. In addition: Automatically analyzed compounds Automatically recognized proper names "Guessed" nouns • Syntax: 229 complex rules, giving rise to about 48 000 arcs • Semantics: Minimal Recursion Semantics projections for all readings

  9. Coverage • Performance on an unknown corpus of newspaper text: • 17 randomly selected pieces of text, limited to coherent text, • comprising 1000 sentences • taken from 9 newspapers • Adresseavisen, Aftenposten, Aftenposten nett, Bergens Tidende, • Dagbladet, Dagens Næringsliv, Dagsavisen, Fædrelandsvennen, Nordlys, • from the editions on November 11th 2005.

  10. The LOGON challenge: From a resource grammar based on independent linguistic principles, derive MRS structures harmonized with the MRS structures of the HPSG English Resource Grammar.

  11. Semantics for translation: • Two issues • The representational subset problem- Desirable: normalization to flat structures with unordered elements. • Complete and detailed semantic analyses may be unnecessary.- Desirable: rich possibilities of underspecification

  12. Basics of • Minimal Recursion Semantics • Developers: A. Copestake, D. Flickinger, R. Malouf, S. Rieheman, I. Sag • A framework for the representation of semantic information • Developed in the context of HPSG and machine translation (Verbmobil) • Sources of inspiration: - Quasi-Logical Form (H. Alshawi): underspecification, e.g. of quantifier scope - Shake-and-bake translation (P. Whitelock): a bag of words as interface structure

  13. An MRS representation • is a bag of semantic entities (some corresponding to words, some not), each with a handle, • plus a bag of handle constraints allowing the underspecification of scope, • plus a handle and an index. • Each semantic entity is referred to as an Elementary Predication (EP). • Relations among EPs are captured by means of shared variables. • There are three elementary variable types: - handles (or 'labels') (h) - events (e) - referential indices (x)

  14. From standard logical form to MRS «Every ferry crosses some fjord» Two readings: Replace operators with generalized quantifiers: every(variable, restriction, body) some(variable, restriction, body) The first reading (wide-scope every): var restriction body

  15. Make the structure flat: • give each EP a handle • replace embedded EPs by their handles • collect all EPs on the same level (understood as conjunction)

  16. Make the structure flat: • give each EP a handle • replace embedded EPs by their handles • collect all EPs on the same level (understood as conjunction) Wide scope: every Wide scope: some Underspecified scope by means of handle constraints:

  17. Norwegian translation: «Hver ferge krysser en fjord» MRS as feature structure (also adding event variables):

  18. Projecting MRS representations from f-structures «Katten sover» 'The cat sleeps'

  19. Projecting MRS representations from f-structures «Katten sover» 'The cat sleeps'

  20. mrs::

  21. mrs:: mrs::

  22. Composition: Top-level MRS with unions of HCONS and RELS:  

  23. Post-processing this structure brings us back to the LOGON MRS format: http://decentius.aksis.uib.no/logon/xle-mrs.xml

  24. Examples

  25. bil 'car' (as in "Han kjøpte bil" 'He bought [a] car') No SPEC

  26. disse hans mange spørsmål 'these his many questions' Multiple SPECs

  27. Han jaget barnet ut nakent 'He chased the child out naked'

  28. The Transfer Component Developer of the formalism: Stephan Oepen

  29. Example of transfer Source sentence: Henter han bilen sin? fetches he car.DEF POSS.REFL.SG.MASC 'Does he fetch his car?' Alternative reading: 'Does he fetch the one of the car?'

  30. Parse output:

  31. Choosing the first reading of Henter han bilen sin?

  32. Choosing the first reading of Henter han bilen sin? The variables have features. Interrogative is coded as [SF ques] on the event variable.

  33. Two of four transfer outputs

  34. Norwegian transfer input One of four English transfer outputs

  35. Generator output from the chosen transfer output

  36. Transfer formalism (Stephan Oepen) The form of a transfer rule: C = context I = input F = filter O = output

  37. Simple example: Lexical transfer rule, transferring bekk into creek No context, no filter, only the predicate is replaced.

  38. Example with a context restriction: gå en tur (lit. 'go a trip') is transferred into the light-verb construction take a trip. In the context of _tur_n as its second argument, _gå_v is transferred to _take_v.

  39. The SEM-I (Semantic Interface) A documentation of the external semantic interface for a grammar, crucial for the writer of transfer rules. In order to enforce the maintaining of a SEM-I, LOGON parsing returns fail if every parse contains at least one predicate not in the SEM-I.

  40. A small section of the verb part of the NorGram SEM-I Size of the Norwegian SEM-I: slightly less than 6000 entries

  41. Parse Selection Parsing, transfer and generation may each give many solutions, leading to a fanout tree: The outputs at each of the three stages are statistically ranked.

  42. The Parsebanker Efficient treebank building by discriminants Developer: Paul Meurer, Bergen Predecessors in discriminant analysis: David Carter (1997) Stephan Oepen, Dan Flickinger & al. (2003) Example of a four-way ambiguity: Det regnet 'It rained'/'It calculated'/'That one calculated'/'That rain'

  43. 1 2

  44. 3 4

  45. Packed representations and discriminants (Paul Meurer)

  46. Clicking on one discriminant is in this case sufficient to select a unique solution:

More Related