1 / 28

An Intuitive Representation of Human Languages for Translation

Dive into linguistic representation problems and translation methods discussed at the Kalmár Workshop 2003. Explore syntax, semantics, open classes, and more to understand the complexity of NL grammars.

Télécharger la présentation

An Intuitive Representation of Human Languages for Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic & Faculty of Information Technology, Pázmány University Kalmár Workshop Szeged, October 1-2, 2003

  2. Contents • Some words on Prof. Kalmár’s activity in computational linguistics • Problems of human language description with formal tools • A new representation with patterns • Introduction to machine translation methods • Application of patterns to translation Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  3. Kalmár & languages • Kalmár’s paper in formal language theory: „An Intuitive Representation of Context-Free Languages” • Kalmár’s activity in machine translation (conference in 1962): „Representation of Languages with the Help of Mathematical Structures” Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  4. Linguistic representation problems of the 60’s • Dependency structure • Constituent structure • X-bar theory: X’  (P) X (Q) • Related structures • Using transformations Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  5. Structured symbols • Linguistic categories: atomic symbols • Not enough: subcategorization • Semantic features: ± alive, ... • Syntactic features: ± countable, ... • Rule sets instead of rules • ID/LP Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  6. Feature structures • DAGs • Unification problems • Feature geometry, typed features • LFG, GPSG, HPSG • Parsing: CF-skeleton + features or feature structures only? Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  7. Complexity of NL grammars • RG/FSA: not enough • CF/RTN: not enough • CS ? • 0/ATN: Turing Machine • Transformations and metarules • Arguments for and against Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  8. NL grammar formalisms • Competence and performance? • Kornai number (left-recursion, center-embedding, “respectively” construction) • Gradually from unrestricted to regular • (i) anbn ->a*b* (n is lost!) • (ii) anbn ->{ε,ab,aabb,aaabbb} • “Finitization” by length • No structure in FSA; finite systems, however, can produce structural output Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  9. Syntax and semantics • Logical representations(e.g. λx.dog(x), λx.run(x)) • World-knowledge representations(e.g. IS-A, PART-OF, INSTANCE-OF) • Categorial grammar: early logical representations of syntax (Kalmár) • DCG: interpretation & representation • Rule-to-rule hypothesis Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  10. Conflict handling • Lexicon meets syntax: who is right? • Lexicon: off-line info coming from past experiences • Which is more important in a specific situation? Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  11. Open classes • Open vs. closed classes:that is, features can or cannot be overridden • Proper names, jabbers, folk etymology, loanwords, ... • Grammar of closed classes:minimal grammar Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  12. Finite morphology • Finite patterns • Finite number of entries • Descriptions assigned to entries • Finite & open vs. infinite & closed • Underspecified entries for guessing Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  13. Finite syntax • “Item and arrangement” (as in morphology) • “Arrangement” describes a rather free constituent-order • Metawords in a meta-dictionary, e.g. ‘(Det (Adj (N)))’  ‘DAN’ • Cascades without loop Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  14. The „plastic box” • John is a boy. • ”John” is a noun. • Go is a verb. • ”Go” is a verb. •  is a sign. • ”” is a sign. •  is a . (where  is a ”plastic box”) Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  15. Real examples (a) Unusual use:Go is a verb.POS [np]  POS [v] (b) Metaphor:My car drinks a lot.ANIMATE [+] ANIMATE [-] (c) Unknown entry:Kalmár is a family name.POS [np] Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  16. Linguistic frames • Psychology: ”Gestalt” • Morphological complex structures treated as frames by humans • Frames in AI: ‘shopping’, ‘walking’, ... • As ‘high-level parsing’ relates to ‘detailed on-line analysis’ Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  17. Translation of human languages • old problems (50’s) • direct (60’s) • interlingual (70’s) • transfer (80’s) • examples (90’s) Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  18. Patterns: general linguistic informationin lexicalized form • Short, fully specified patterns are: lexical entries • Longer, fully specified entries are: multi-word expressions • Partially underspecified patterns are: collocations, phrasal verbs, idioms • Totally underspecified patterns are: linguistic rules • Pattern/interpretation pairs: Translation Description Language Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  19. The MetaMorpho principles • No single words but contextual expressions (in form of patterns) only • Pattern pairs: input/interpretation structure pairs • Single pass: no separate transfer steps • Target structure generation: by-product of parsing Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  20. Jabberwocky ‘Twas brillig, and the slightytovesDid gyre and gimble in the wabe:All mimsy were the borogroves,And the moneraths outgrabe. Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  21.  ‘Twas , and the sDid  and  in the :All  were the s,And the s . Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  22. Translation rules for Jabberwocky • ‘twas   volt • , and  , és  • the s did  a ok tak •  and  és  • in the   a ban • all  teljesen  •  were the s k voltak az ok • the s   a ok tek Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  23.  • ‘Twas , and the sDid  and  in the :All  were the s,And the s . •  volt, és a oktak és tek a ben:teljesen  voltak a okés a ok tek. Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  24. Translation of Jabberwocky DzsebervokiBrillig volt, és a szlájtitóvokgájertak és gimbeltek a vébben:teljesen mimszik voltak a borogróvokés a mónrátok autgrébtek. Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  25. An intuitive representation... • X-bar based structures • Feature-based descriptions • Metarules (used off-line) • Rule-to-rule principle • Lexicon should be finite but open • Closed classes belong to the minimal grammar • Minimal grammar describes ”basically” linguistic elements Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  26. An intuitive representation...(cont’d) • Linguistic constructions can be described by finite patterns • A huge & finite description set is used rather than a limited & infinite grammar • In case of conflict, lexical information is either redundant or contradicting to the actual description • Known constructions need no real-time analysis (Gestalt, frame) Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  27. An intuitive representation...(cont’d) • ”Broken” frames are analyzed real-time • Structural (source/target) pattern pair is assigned to every frame to be translated • Target structure is computed while parsing source structure Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

  28. Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

More Related