1 / 61

Linguistics 187/287 Week 2

Linguistics 187/287 Week 2. Engineering and Linguistic Generalizations. Homework: Due Friday Can discuss in class or via email or ask us for office hours Last assignment: How much time? Trouble: access, procedure? Issues: XLE, LFG, grammar? . Topics for this week.

odin
Télécharger la présentation

Linguistics 187/287 Week 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linguistics 187/287 Week 2 Engineering and Linguistic Generalizations

  2. Homework: • Due Friday • Can discuss in class or via email or ask us for office hours • Last assignment: • How much time? • Trouble: access, procedure? • Issues: XLE, LFG, grammar?

  3. Topics for this week • Notation in LFG (more background) • Templates • Lexical rules • Configurations • Feature declaration • Metarulemacro

  4. Grammar engineering for deep processing • Draws on theoretical linguistics, software engineering • Theoretical linguistics => papers • Generalizations, universality, idealization (competence) • Software engineering => programs • Coverage, interface, QA, maintainability, efficiency, practicality • Grammar engineering • Grammar::Theory = Program::Programming language • Reflect linguistic generalizations • Respect special cases of ordinary language • Deal with large-scale interactions • Theory/practice trade-offs

  5. Grammar Engineering and Linguistic Theory • Description vs. representation • Program vs. data • Expressiveness of notation • Regular predicates for c-structure • Boolean combinations (esp. disjunction) • Equality, set-membership • Defaults and marking conventions • Constraining vs. defining, existentials, defaults • Abbreviation and factoring • Templates, macros, lexical rules • Configuration management • Combining rules, templates, lexicons… • Priority of core/specializations/extensions

  6. Description vs. Representation • Complexity trades (program vs. data) • Simplify descriptions but complicate representations • Complicate descriptions but simplify representations • Example: Arguments and adjuncts • Different behavior • Arguments selected by predicate, unique • Adjuncts modify predicate, multiple instances • Similar behavior: Can both be questioned • Representation solution (HPSG) ARG ADJ DEP = ARG  ADJ (new type) • Description solution (LFG) ARG ADJ ARG | ADJ

  7. Description vs. Representation • External constraints on representation • Linguistic theory • Applications • Multilingual/cross-grammar similarity

  8. NP --> N NP -> Pron NP --> { N | Pron} disjunction NP --> N NP --> Det N NP --> Pron NP --> { (Det) N | Pron } Expressiveness of notation Regular predicates for c-structure Simple context-free rules Compact notation NP --> N NP --> Det N NP --> (Det) N optionality

  9. Expressiveness of notation and Representation • Equality: attribute values • Set-membership: sets and elements • Adjuncts: PP: (^ ADJUNCT)=! PP*: ! $ (^ ADJUNCT) • Coordination (more next week) NP --> NP: ! $ ^; CONJ NP: ! $ ^. • Semantic forms • (^ PRED)=‘kick<(^ SUBJ)(^ OBJ)>’ • Semantic relations, instantiation, subcategorization

  10. Defaults and Marking Conventions • Constraining vs. defining • Must be assigned nom: (^ SUBJ CASE)=c nom • Is nom: (^ SUBJ CASE)=nom • Existentials • Must have case: (^ CASE) • Defaults • NTYPE proper pronoun common • { (^ NTYPE) (^ NTYPE)~=common | (^ NTYPE)=common } (make choices disjoint)

  11. Abbreviations and Factoring • Templates • Capture generalizations of annotations • Maintainability: changes, mistakes • Compare: HPSG type hierarchy • Macros • Capture generalizations of rules • Lexical Rules • Theoretical proposal to manipulate predicates • Implemented to expand lexicons consistently

  12. Example: The verb bakes • Belongs to several classes • Third-person, singular, present-tense verb • Transitive or intransitive • Shares • Some properties with falls • Other properties with cooked

  13. The lexicon à la Kiparsky A dumping ground for exceptions “A kind of appendix to the grammar, whose function is to list what is unpredictable and irregular about the words of a language”

  14. The lexicon à la Bresnan A repository of linguistic generalizations • Active and passive forms are related by lexical rules, not syntactic transformations (^ SUBJ)  (^ OBL-AG) (^ OBJ)  (^ SUBJ) • Rules relating lexical items are a prime locus of syntactic generalizations

  15. The lexicon à la Flickinger A hierarchical structure of classes • Each class represents some piece of syntactic information bakes belongs to: • the third-person singular present-tense class (like appears) • the transitive/intransitive class (like cooked) • and others • Classes may be subclasses of other classes • Classes may partition other classes along several dimensions

  16. LFG: Relations between descriptions LFG can encode linguistic generalizations asrelations between descriptions of structures • LFG functional description is a collection of equations • These can be named • This name can stand for those equations in linguistic descriptions • Named descriptions are referred to as templates • Interpretation: Simple substitution Template-description is substituted for template-name that appears in (is invoked by) another description

  17. 3SG and PRESENT templates 3SG = (^SUBJPERSON) = 3 (^SUBJNUM) = SG. “3SG names (^SUBJPERSON)=3 (^SUBJNUM)=SG” PRESENT = (^TENSE) = PRES. @ marks invocation (in lexicon, rules, templates) Substitute (^ TENSE)=PRES for @PRESENT in other descriptions

  18. Templates enable hierarchical generalizations • Template definitions can refer to other templates by name • E.g. further divide 3SG into: 3PERS = (^SUBJPERSON) = 3. SING = (^SUBJNUM) = SG. then 3SG = @3PERS @SING. • Hierarchy of references represents inclusion hierarchy of named descriptions • Frequently repeated subdescriptions • specified in one place • effective in many

  19. PRESNOT3SG PRESNOT3SG = ~@3SG @PRESENT. ⇒~[@SING @3PERS] ⇒~[(^ SUBJ NUM)=SG (^ SUBJ PERS=3 ] Hierarchy of template invocations Sharing in verb agreement SING 3PERS 3SG PRESENT PRES3SG • Boolean combinations of template references • (just like ordinary descriptions) • Sharing is distinct from mode of combination

  20. Functional description for bakes {(^PRED)=‘bake<SUBJ,OBJ>’ |(^PRED)=‘bake<SUBJ>’ } (^TENSE)=PRES (^SUBJPERS)=3 (^SUBJNUM)=SG With agreement template: { (^PRED)=‘bake<SUBJ,OBJ>’ |(^PRED)=‘bake<SUBJ>’ } @PRES3SG Agreement template invoked by other verbs

  21. Templates with parameters: Valency Pargram convention: Parameters begin with _ • TRANS-OR-INTRANS(_p) = { (^ PRED) = ‘_p<SUBJ, OBJ>’ | (^ PRED) = ‘_p<SUBJ>’ }. • PRED value as a parameter of the template @TRANS-OR-INTRANS(bake) ⇒ { (^ PRED) = ‘bake<SUBJ, OBJ>’ | (^ PRED) = ‘bake<SUBJ>’ } • Arguments can substitute for any part of an f-description • Attributes • Values • Semantic relation-names • Descriptions

  22. Valency hierarchy TRANS-OR-INTRANS(p) = { @INTRANSITIVE(p) | @TRANSITIVE(p) }. INTRANSITIVE(p) = (^PRED)=‘p<SUBJ> TRANSITIVE(p) = (^PRED)=‘p<SUBJ, OBJ>’. INTRANSITIVE TRANSITIVE TRANS-OR-INTRANS

  23. Templates and generalizations: bakes • bakes: @TRANS-OR-INTRANS(bake) @PRES3SG • TRANS-OR-INTRANS(p): shared by eat, cooked,… • PRES3SG: shared by appears, goes, cooks,… • PRESENT: • used by PRES3SG template • shared by bake, laugh, etc.

  24. Lexical sharing 3PERS SING PRESENT 3SG INTRANSITIVE TRANSITIVE PRES3SG TRANS-OR-INTRANS falls bakes cooked

  25. Type hierarchy vs. templates • Templates can play the same role as hierarchical type systems in theories like HPSG • A notational device for factoring descriptions • Interpreted as simple substitution • Not part of a formal ontology • Do not require an elaborate mathematical characterization

  26. Templates also invoked by Rules • Rule annotations can also call templates • Global changes, typo prevention • Example: adjunct annotation PP: ! $ (^ ADJUNCT) (! ADJ-TYPE)=VP ADVP: ! $ (^ ADJUNCT) (! ADJ-TYPE)=VP ADJ(_T) = ! $ (^ ADJUNCT) (! ADJ-TYPE)=_T. PP: @(ADJ VP) PP: @(ADJ NP) ADVP: @(ADJ VP) ADVP: @(ADJ S)

  27. Templates: Rules Example: null pronouns Push it! They left (in order) to be on time. NULL-PRON(_P) = (_P PRED)=‘pro’ (_P PRON-TYPE)=null. VPimp --> VP: @(NULL-PRON (^ SUBJ)). VPimp --> VP: (^ SUBJ PRED)=‘pro’ (^ SUBJ PRON-TYPE)=null.

  28. Templates: Extend notation DEFAULT(D V) = { D D~=V | D=V }. e.g. @(DEFAULT (^ NTYPE) common) IF(P1 P2) = { ~P1 | P2 } IFF(P1 P2) = { P1 P2 | ~P1 ~P2 }.

  29. Templates and “Principles” • Subject principle: every verb has a subject. • Implementaton: VERB = (^ SUBJ). • Put @VERB in every verbal entry. or • Put @VERB in the templates called by the verbal entries.

  30. Lexical Rules • Theoretical construct • Templates can often achieve the same result • Disjunction of several templates • Parameterization of a complex template

  31. Lexical Rules: Example • Active: They ate the cake. (^ PRED)=‘eat<(^SUBJ)(^OBJ)>' • Passive: The cake was eaten. (^ PRED)='eat<NULL (^SUBJ)>' • Could have VTRANS have two disjuncts Or: manipulate PRED with lexical rule

  32. Lexical Rules: Example • Passive lexical rule _SCHEMA is a subcategorization frame PASSIVE(_SCHEMA) = { _SCHEMA (^ PASSIVE)=- | _SCHEMA (^ SUBJ) --> NULL (^ OBJ) --> (^ SUBJ) (^ PASSIVE)=c +}. • Example calls • TRANS(_P) = @(PASSIVE (^ PRED)='_P<(^SUBJ)(^OBJ)>'). • DITRANS(_P) = @(PASSIVE (^ PRED)='_P<(^SUBJ)(^OBJ)(^OBJ2)>').

  33. Lexical Rules: Summary • Lexical rules manipulate arguments of predicates • capture systematic alternations like active-passive • Rename and remove roles • No good implementation for adding roles • causative • complex predicates • benefactives

  34. Configuration Management • Combining rules, templates, lexicons, … • System needs to know where everything is • For large grammars, need modularization (multiple grammar rule files, multiple lexicons) • Priority of core/specializations/extentions • Want to specialize a grammar • No questions in instruction manuals • Loosen subj-V agreement • Have lexicons of varying quality

  35. Combining Rules, Templates, Lexicons • XLE: configuration section • Specify what files are called • Specify which rule, template, and lexicon sections are used RULES (TOY ENGLISH). RULES (CORE ENGLISH) (SPECIAL ENGLISH). • Other grammar information

  36. Configurations and Declarations • Configurations • File management • Priority • Declarations • Governable relations and semantics • Features • Global Operators • METARULEMACRO

  37. Files • Priority ordered; rules/entries in later files override those in earlier ones • Example: FILES standard-english-rules.lfg eureka-english-rules.lfg standard-english-lexicon.lfg eureka-english-lexicon.lfg.

  38. Eureka vs. Standard rules STANDARD ENGLISH RULES (1.0) N --> { @NOUN-COMMON |@NOUN-PROPER}. NOUN-COMMON -> … NOUN-PROPER -> … EUREKA ENGLISH RULES (1.0) N --> { @NOUN-COMMON |@NOUN-PROPER |@NOUN-EUREKA | N PL }. NOUN-EUREKA --> { EUR-PART | EUR-NUM }.

  39. Sections Used • All lexicon, rule, and template sections have names and versions*. • These are called in priority order in the config. • Use with the file order to create overrides. RULES (STANDARD RULES) (EUREKA RULES). LEXENTRIES (all all). *Versions allow for future XLE upgrades

  40. Multiple Lexicon Sections LEXENTRIES (AUTOMATIC ENGLISH) (CORRECTED ENGLISH). AUTOMATIC ENGLISH LEXICON (1.0) appear V XLE {@(V-TRANS appear) |@(V-INTRANS appear)}. CORRECTED ENGLISH LEXICON (1.0) appear V XLE {@(V-INTRANS appear) |@(V-SUBJ-XCOMP appear)}.

  41. Other Configuration Information • ROOTCAT: default top level category • Standard: ROOT, Eureka: FIELD • Nondistributives for coordination • External attributes for applications • Character encoding • Reparse category and Optimality order for robustness • See XLE documentation for complete list

  42. Declarations • Must declare grammatical and semantic functions for each grammar. • Used for completeness and coherence • GOVERNABLERELATIONS • Functions (features) that must be subcategorized for in the PRED • SUBJ OBJ OBL-?* ?COMP etc. • SEMANTICFUNCTIONS • Functions that must have a PRED • ADJUNCT NMOD

  43. Feature Declaration • List of all the features • GGF and semantic functions need not be listed • all other features must be listed • List of their possible values • atomic • f-structure • Multiple feature declarations • multilingual setting • grammar specialization

  44. Why a feature declaration? • Good engineering practice • Catch typos and old analyses • Grammar easier to read NB: Theory doesn’t have typos

  45. Declaration format STANDARD LANGUAGE FEATURES (1.0) feature1: -> $ { val1 val2 val3 }. feature2: -> $ {val4 val 5 }. feature3: -> << [ feature1 feature2 ]. feature4. ----

  46. Sample feature declaration TOY ENGLISH FEATURES (1.0) NUM: -> $ { sg pl }. PERS: -> $ { 1 2 3 }. TNS-ASP: -> << [ TENSE MOOD ASPECT ]. TENSE. MOOD: -> $ { indicative subjunctive }. ASPECT: -> << [ PERF PROG ]. PERF: -> $ { + - }. PROG: -> $ {+ - }.

  47. XLE and the feature declaration • XLE will not load a grammar with a violation of the feature declaration. • To catch violations in the lexicon, the generator must be loaded. • regenerate “some-sentence-to-parse” • parse, then choose “generate” in f-str window • create-generator grammar-name.lfg • print-unused-feature-declarations

  48. Multiple feature declarations • List in priority order in the configuration • FEATURES (STANDARD COMMON) (STANDARD ENGLISH). • New features are listed as usual • Changes to features use edit operators + add a new value & intersect the values ! replace the feature entirely

  49. Multiple feature declarations STANDARD COMMON FEATURES (1.0) NUM: -> $ { sg pl dual }. CASE: -> $ { nom acc }. TENSE: -> << [ PAST FUTURE ]. PAST: -> $ { + - }. FUTURE: -> $ { + - }. STANDARD ENGLISH FEATURES (1.0) PERS: -> $ { 1 2 3 }. PERS: -> $ { 1 2 3 }. &NUM: -> $ { sg pl }. NUM: -> $ { sg pl }. +CASE: -> $ { gen }. CASE: -> $ { nom acc gen }. !TENSE: -> $ { pres past fut }. TENSE: -> $ {pres past fut }. !PAST: -> $ { }. !FUTURE: -> $ { }.

  50. Using Multiple Feature Decl. • Multilingual contexts • Language universal features • Customize to particular language • Grammar specialization • Add new features for odd constructions • Remove unused choices

More Related