280 likes | 304 Vues
Dive into linguistic representation problems and translation methods discussed at the Kalmár Workshop 2003. Explore syntax, semantics, open classes, and more to understand the complexity of NL grammars.
E N D
An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic & Faculty of Information Technology, Pázmány University Kalmár Workshop Szeged, October 1-2, 2003
Contents • Some words on Prof. Kalmár’s activity in computational linguistics • Problems of human language description with formal tools • A new representation with patterns • Introduction to machine translation methods • Application of patterns to translation Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Kalmár & languages • Kalmár’s paper in formal language theory: „An Intuitive Representation of Context-Free Languages” • Kalmár’s activity in machine translation (conference in 1962): „Representation of Languages with the Help of Mathematical Structures” Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Linguistic representation problems of the 60’s • Dependency structure • Constituent structure • X-bar theory: X’ (P) X (Q) • Related structures • Using transformations Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Structured symbols • Linguistic categories: atomic symbols • Not enough: subcategorization • Semantic features: ± alive, ... • Syntactic features: ± countable, ... • Rule sets instead of rules • ID/LP Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Feature structures • DAGs • Unification problems • Feature geometry, typed features • LFG, GPSG, HPSG • Parsing: CF-skeleton + features or feature structures only? Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Complexity of NL grammars • RG/FSA: not enough • CF/RTN: not enough • CS ? • 0/ATN: Turing Machine • Transformations and metarules • Arguments for and against Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
NL grammar formalisms • Competence and performance? • Kornai number (left-recursion, center-embedding, “respectively” construction) • Gradually from unrestricted to regular • (i) anbn ->a*b* (n is lost!) • (ii) anbn ->{ε,ab,aabb,aaabbb} • “Finitization” by length • No structure in FSA; finite systems, however, can produce structural output Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Syntax and semantics • Logical representations(e.g. λx.dog(x), λx.run(x)) • World-knowledge representations(e.g. IS-A, PART-OF, INSTANCE-OF) • Categorial grammar: early logical representations of syntax (Kalmár) • DCG: interpretation & representation • Rule-to-rule hypothesis Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Conflict handling • Lexicon meets syntax: who is right? • Lexicon: off-line info coming from past experiences • Which is more important in a specific situation? Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Open classes • Open vs. closed classes:that is, features can or cannot be overridden • Proper names, jabbers, folk etymology, loanwords, ... • Grammar of closed classes:minimal grammar Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Finite morphology • Finite patterns • Finite number of entries • Descriptions assigned to entries • Finite & open vs. infinite & closed • Underspecified entries for guessing Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Finite syntax • “Item and arrangement” (as in morphology) • “Arrangement” describes a rather free constituent-order • Metawords in a meta-dictionary, e.g. ‘(Det (Adj (N)))’ ‘DAN’ • Cascades without loop Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
The „plastic box” • John is a boy. • ”John” is a noun. • Go is a verb. • ”Go” is a verb. • is a sign. • ”” is a sign. • is a . (where is a ”plastic box”) Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Real examples (a) Unusual use:Go is a verb.POS [np] POS [v] (b) Metaphor:My car drinks a lot.ANIMATE [+] ANIMATE [-] (c) Unknown entry:Kalmár is a family name.POS [np] Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Linguistic frames • Psychology: ”Gestalt” • Morphological complex structures treated as frames by humans • Frames in AI: ‘shopping’, ‘walking’, ... • As ‘high-level parsing’ relates to ‘detailed on-line analysis’ Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Translation of human languages • old problems (50’s) • direct (60’s) • interlingual (70’s) • transfer (80’s) • examples (90’s) Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Patterns: general linguistic informationin lexicalized form • Short, fully specified patterns are: lexical entries • Longer, fully specified entries are: multi-word expressions • Partially underspecified patterns are: collocations, phrasal verbs, idioms • Totally underspecified patterns are: linguistic rules • Pattern/interpretation pairs: Translation Description Language Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
The MetaMorpho principles • No single words but contextual expressions (in form of patterns) only • Pattern pairs: input/interpretation structure pairs • Single pass: no separate transfer steps • Target structure generation: by-product of parsing Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Jabberwocky ‘Twas brillig, and the slightytovesDid gyre and gimble in the wabe:All mimsy were the borogroves,And the moneraths outgrabe. Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
‘Twas , and the sDid and in the :All were the s,And the s . Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Translation rules for Jabberwocky • ‘twas volt • , and , és • the s did a ok tak • and és • in the a ban • all teljesen • were the s k voltak az ok • the s a ok tek Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
• ‘Twas , and the sDid and in the :All were the s,And the s . • volt, és a oktak és tek a ben:teljesen voltak a okés a ok tek. Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Translation of Jabberwocky DzsebervokiBrillig volt, és a szlájtitóvokgájertak és gimbeltek a vébben:teljesen mimszik voltak a borogróvokés a mónrátok autgrébtek. Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
An intuitive representation... • X-bar based structures • Feature-based descriptions • Metarules (used off-line) • Rule-to-rule principle • Lexicon should be finite but open • Closed classes belong to the minimal grammar • Minimal grammar describes ”basically” linguistic elements Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
An intuitive representation...(cont’d) • Linguistic constructions can be described by finite patterns • A huge & finite description set is used rather than a limited & infinite grammar • In case of conflict, lexical information is either redundant or contradicting to the actual description • Known constructions need no real-time analysis (Gestalt, frame) Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
An intuitive representation...(cont’d) • ”Broken” frames are analyzed real-time • Structural (source/target) pattern pair is assigned to every frame to be translated • Target structure is computed while parsing source structure Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation
Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation