610 likes | 758 Vues
Linguistics 187/287 Week 2. Engineering and Linguistic Generalizations. Homework: Due Friday Can discuss in class or via email or ask us for office hours Last assignment: How much time? Trouble: access, procedure? Issues: XLE, LFG, grammar? . Topics for this week.
E N D
Linguistics 187/287 Week 2 Engineering and Linguistic Generalizations
Homework: • Due Friday • Can discuss in class or via email or ask us for office hours • Last assignment: • How much time? • Trouble: access, procedure? • Issues: XLE, LFG, grammar?
Topics for this week • Notation in LFG (more background) • Templates • Lexical rules • Configurations • Feature declaration • Metarulemacro
Grammar engineering for deep processing • Draws on theoretical linguistics, software engineering • Theoretical linguistics => papers • Generalizations, universality, idealization (competence) • Software engineering => programs • Coverage, interface, QA, maintainability, efficiency, practicality • Grammar engineering • Grammar::Theory = Program::Programming language • Reflect linguistic generalizations • Respect special cases of ordinary language • Deal with large-scale interactions • Theory/practice trade-offs
Grammar Engineering and Linguistic Theory • Description vs. representation • Program vs. data • Expressiveness of notation • Regular predicates for c-structure • Boolean combinations (esp. disjunction) • Equality, set-membership • Defaults and marking conventions • Constraining vs. defining, existentials, defaults • Abbreviation and factoring • Templates, macros, lexical rules • Configuration management • Combining rules, templates, lexicons… • Priority of core/specializations/extensions
Description vs. Representation • Complexity trades (program vs. data) • Simplify descriptions but complicate representations • Complicate descriptions but simplify representations • Example: Arguments and adjuncts • Different behavior • Arguments selected by predicate, unique • Adjuncts modify predicate, multiple instances • Similar behavior: Can both be questioned • Representation solution (HPSG) ARG ADJ DEP = ARG ADJ (new type) • Description solution (LFG) ARG ADJ ARG | ADJ
Description vs. Representation • External constraints on representation • Linguistic theory • Applications • Multilingual/cross-grammar similarity
NP --> N NP -> Pron NP --> { N | Pron} disjunction NP --> N NP --> Det N NP --> Pron NP --> { (Det) N | Pron } Expressiveness of notation Regular predicates for c-structure Simple context-free rules Compact notation NP --> N NP --> Det N NP --> (Det) N optionality
Expressiveness of notation and Representation • Equality: attribute values • Set-membership: sets and elements • Adjuncts: PP: (^ ADJUNCT)=! PP*: ! $ (^ ADJUNCT) • Coordination (more next week) NP --> NP: ! $ ^; CONJ NP: ! $ ^. • Semantic forms • (^ PRED)=‘kick<(^ SUBJ)(^ OBJ)>’ • Semantic relations, instantiation, subcategorization
Defaults and Marking Conventions • Constraining vs. defining • Must be assigned nom: (^ SUBJ CASE)=c nom • Is nom: (^ SUBJ CASE)=nom • Existentials • Must have case: (^ CASE) • Defaults • NTYPE proper pronoun common • { (^ NTYPE) (^ NTYPE)~=common | (^ NTYPE)=common } (make choices disjoint)
Abbreviations and Factoring • Templates • Capture generalizations of annotations • Maintainability: changes, mistakes • Compare: HPSG type hierarchy • Macros • Capture generalizations of rules • Lexical Rules • Theoretical proposal to manipulate predicates • Implemented to expand lexicons consistently
Example: The verb bakes • Belongs to several classes • Third-person, singular, present-tense verb • Transitive or intransitive • Shares • Some properties with falls • Other properties with cooked
The lexicon à la Kiparsky A dumping ground for exceptions “A kind of appendix to the grammar, whose function is to list what is unpredictable and irregular about the words of a language”
The lexicon à la Bresnan A repository of linguistic generalizations • Active and passive forms are related by lexical rules, not syntactic transformations (^ SUBJ) (^ OBL-AG) (^ OBJ) (^ SUBJ) • Rules relating lexical items are a prime locus of syntactic generalizations
The lexicon à la Flickinger A hierarchical structure of classes • Each class represents some piece of syntactic information bakes belongs to: • the third-person singular present-tense class (like appears) • the transitive/intransitive class (like cooked) • and others • Classes may be subclasses of other classes • Classes may partition other classes along several dimensions
LFG: Relations between descriptions LFG can encode linguistic generalizations asrelations between descriptions of structures • LFG functional description is a collection of equations • These can be named • This name can stand for those equations in linguistic descriptions • Named descriptions are referred to as templates • Interpretation: Simple substitution Template-description is substituted for template-name that appears in (is invoked by) another description
3SG and PRESENT templates 3SG = (^SUBJPERSON) = 3 (^SUBJNUM) = SG. “3SG names (^SUBJPERSON)=3 (^SUBJNUM)=SG” PRESENT = (^TENSE) = PRES. @ marks invocation (in lexicon, rules, templates) Substitute (^ TENSE)=PRES for @PRESENT in other descriptions
Templates enable hierarchical generalizations • Template definitions can refer to other templates by name • E.g. further divide 3SG into: 3PERS = (^SUBJPERSON) = 3. SING = (^SUBJNUM) = SG. then 3SG = @3PERS @SING. • Hierarchy of references represents inclusion hierarchy of named descriptions • Frequently repeated subdescriptions • specified in one place • effective in many
PRESNOT3SG PRESNOT3SG = ~@3SG @PRESENT. ⇒~[@SING @3PERS] ⇒~[(^ SUBJ NUM)=SG (^ SUBJ PERS=3 ] Hierarchy of template invocations Sharing in verb agreement SING 3PERS 3SG PRESENT PRES3SG • Boolean combinations of template references • (just like ordinary descriptions) • Sharing is distinct from mode of combination
Functional description for bakes {(^PRED)=‘bake<SUBJ,OBJ>’ |(^PRED)=‘bake<SUBJ>’ } (^TENSE)=PRES (^SUBJPERS)=3 (^SUBJNUM)=SG With agreement template: { (^PRED)=‘bake<SUBJ,OBJ>’ |(^PRED)=‘bake<SUBJ>’ } @PRES3SG Agreement template invoked by other verbs
Templates with parameters: Valency Pargram convention: Parameters begin with _ • TRANS-OR-INTRANS(_p) = { (^ PRED) = ‘_p<SUBJ, OBJ>’ | (^ PRED) = ‘_p<SUBJ>’ }. • PRED value as a parameter of the template @TRANS-OR-INTRANS(bake) ⇒ { (^ PRED) = ‘bake<SUBJ, OBJ>’ | (^ PRED) = ‘bake<SUBJ>’ } • Arguments can substitute for any part of an f-description • Attributes • Values • Semantic relation-names • Descriptions
Valency hierarchy TRANS-OR-INTRANS(p) = { @INTRANSITIVE(p) | @TRANSITIVE(p) }. INTRANSITIVE(p) = (^PRED)=‘p<SUBJ> TRANSITIVE(p) = (^PRED)=‘p<SUBJ, OBJ>’. INTRANSITIVE TRANSITIVE TRANS-OR-INTRANS
Templates and generalizations: bakes • bakes: @TRANS-OR-INTRANS(bake) @PRES3SG • TRANS-OR-INTRANS(p): shared by eat, cooked,… • PRES3SG: shared by appears, goes, cooks,… • PRESENT: • used by PRES3SG template • shared by bake, laugh, etc.
Lexical sharing 3PERS SING PRESENT 3SG INTRANSITIVE TRANSITIVE PRES3SG TRANS-OR-INTRANS falls bakes cooked
Type hierarchy vs. templates • Templates can play the same role as hierarchical type systems in theories like HPSG • A notational device for factoring descriptions • Interpreted as simple substitution • Not part of a formal ontology • Do not require an elaborate mathematical characterization
Templates also invoked by Rules • Rule annotations can also call templates • Global changes, typo prevention • Example: adjunct annotation PP: ! $ (^ ADJUNCT) (! ADJ-TYPE)=VP ADVP: ! $ (^ ADJUNCT) (! ADJ-TYPE)=VP ADJ(_T) = ! $ (^ ADJUNCT) (! ADJ-TYPE)=_T. PP: @(ADJ VP) PP: @(ADJ NP) ADVP: @(ADJ VP) ADVP: @(ADJ S)
Templates: Rules Example: null pronouns Push it! They left (in order) to be on time. NULL-PRON(_P) = (_P PRED)=‘pro’ (_P PRON-TYPE)=null. VPimp --> VP: @(NULL-PRON (^ SUBJ)). VPimp --> VP: (^ SUBJ PRED)=‘pro’ (^ SUBJ PRON-TYPE)=null.
Templates: Extend notation DEFAULT(D V) = { D D~=V | D=V }. e.g. @(DEFAULT (^ NTYPE) common) IF(P1 P2) = { ~P1 | P2 } IFF(P1 P2) = { P1 P2 | ~P1 ~P2 }.
Templates and “Principles” • Subject principle: every verb has a subject. • Implementaton: VERB = (^ SUBJ). • Put @VERB in every verbal entry. or • Put @VERB in the templates called by the verbal entries.
Lexical Rules • Theoretical construct • Templates can often achieve the same result • Disjunction of several templates • Parameterization of a complex template
Lexical Rules: Example • Active: They ate the cake. (^ PRED)=‘eat<(^SUBJ)(^OBJ)>' • Passive: The cake was eaten. (^ PRED)='eat<NULL (^SUBJ)>' • Could have VTRANS have two disjuncts Or: manipulate PRED with lexical rule
Lexical Rules: Example • Passive lexical rule _SCHEMA is a subcategorization frame PASSIVE(_SCHEMA) = { _SCHEMA (^ PASSIVE)=- | _SCHEMA (^ SUBJ) --> NULL (^ OBJ) --> (^ SUBJ) (^ PASSIVE)=c +}. • Example calls • TRANS(_P) = @(PASSIVE (^ PRED)='_P<(^SUBJ)(^OBJ)>'). • DITRANS(_P) = @(PASSIVE (^ PRED)='_P<(^SUBJ)(^OBJ)(^OBJ2)>').
Lexical Rules: Summary • Lexical rules manipulate arguments of predicates • capture systematic alternations like active-passive • Rename and remove roles • No good implementation for adding roles • causative • complex predicates • benefactives
Configuration Management • Combining rules, templates, lexicons, … • System needs to know where everything is • For large grammars, need modularization (multiple grammar rule files, multiple lexicons) • Priority of core/specializations/extentions • Want to specialize a grammar • No questions in instruction manuals • Loosen subj-V agreement • Have lexicons of varying quality
Combining Rules, Templates, Lexicons • XLE: configuration section • Specify what files are called • Specify which rule, template, and lexicon sections are used RULES (TOY ENGLISH). RULES (CORE ENGLISH) (SPECIAL ENGLISH). • Other grammar information
Configurations and Declarations • Configurations • File management • Priority • Declarations • Governable relations and semantics • Features • Global Operators • METARULEMACRO
Files • Priority ordered; rules/entries in later files override those in earlier ones • Example: FILES standard-english-rules.lfg eureka-english-rules.lfg standard-english-lexicon.lfg eureka-english-lexicon.lfg.
Eureka vs. Standard rules STANDARD ENGLISH RULES (1.0) N --> { @NOUN-COMMON |@NOUN-PROPER}. NOUN-COMMON -> … NOUN-PROPER -> … EUREKA ENGLISH RULES (1.0) N --> { @NOUN-COMMON |@NOUN-PROPER |@NOUN-EUREKA | N PL }. NOUN-EUREKA --> { EUR-PART | EUR-NUM }.
Sections Used • All lexicon, rule, and template sections have names and versions*. • These are called in priority order in the config. • Use with the file order to create overrides. RULES (STANDARD RULES) (EUREKA RULES). LEXENTRIES (all all). *Versions allow for future XLE upgrades
Multiple Lexicon Sections LEXENTRIES (AUTOMATIC ENGLISH) (CORRECTED ENGLISH). AUTOMATIC ENGLISH LEXICON (1.0) appear V XLE {@(V-TRANS appear) |@(V-INTRANS appear)}. CORRECTED ENGLISH LEXICON (1.0) appear V XLE {@(V-INTRANS appear) |@(V-SUBJ-XCOMP appear)}.
Other Configuration Information • ROOTCAT: default top level category • Standard: ROOT, Eureka: FIELD • Nondistributives for coordination • External attributes for applications • Character encoding • Reparse category and Optimality order for robustness • See XLE documentation for complete list
Declarations • Must declare grammatical and semantic functions for each grammar. • Used for completeness and coherence • GOVERNABLERELATIONS • Functions (features) that must be subcategorized for in the PRED • SUBJ OBJ OBL-?* ?COMP etc. • SEMANTICFUNCTIONS • Functions that must have a PRED • ADJUNCT NMOD
Feature Declaration • List of all the features • GGF and semantic functions need not be listed • all other features must be listed • List of their possible values • atomic • f-structure • Multiple feature declarations • multilingual setting • grammar specialization
Why a feature declaration? • Good engineering practice • Catch typos and old analyses • Grammar easier to read NB: Theory doesn’t have typos
Declaration format STANDARD LANGUAGE FEATURES (1.0) feature1: -> $ { val1 val2 val3 }. feature2: -> $ {val4 val 5 }. feature3: -> << [ feature1 feature2 ]. feature4. ----
Sample feature declaration TOY ENGLISH FEATURES (1.0) NUM: -> $ { sg pl }. PERS: -> $ { 1 2 3 }. TNS-ASP: -> << [ TENSE MOOD ASPECT ]. TENSE. MOOD: -> $ { indicative subjunctive }. ASPECT: -> << [ PERF PROG ]. PERF: -> $ { + - }. PROG: -> $ {+ - }.
XLE and the feature declaration • XLE will not load a grammar with a violation of the feature declaration. • To catch violations in the lexicon, the generator must be loaded. • regenerate “some-sentence-to-parse” • parse, then choose “generate” in f-str window • create-generator grammar-name.lfg • print-unused-feature-declarations
Multiple feature declarations • List in priority order in the configuration • FEATURES (STANDARD COMMON) (STANDARD ENGLISH). • New features are listed as usual • Changes to features use edit operators + add a new value & intersect the values ! replace the feature entirely
Multiple feature declarations STANDARD COMMON FEATURES (1.0) NUM: -> $ { sg pl dual }. CASE: -> $ { nom acc }. TENSE: -> << [ PAST FUTURE ]. PAST: -> $ { + - }. FUTURE: -> $ { + - }. STANDARD ENGLISH FEATURES (1.0) PERS: -> $ { 1 2 3 }. PERS: -> $ { 1 2 3 }. &NUM: -> $ { sg pl }. NUM: -> $ { sg pl }. +CASE: -> $ { gen }. CASE: -> $ { nom acc gen }. !TENSE: -> $ { pres past fut }. TENSE: -> $ {pres past fut }. !PAST: -> $ { }. !FUTURE: -> $ { }.
Using Multiple Feature Decl. • Multilingual contexts • Language universal features • Customize to particular language • Grammar specialization • Add new features for odd constructions • Remove unused choices