110 likes | 181 Vues
This corpus agenda covers tagging with feature vectors or structures, combinatorics, and extensions in annotating the corpus. It discusses various linguistic features, such as feature vectors, feature structures, constructional features, and discourse/semantic/context features. The corpus also explores combinatorics in subject-verb agreement and determiners and possessive pronouns. It details the current coverage of the elicitation corpus, including word order, definiteness, animacy, agreement, possessive NPs, and inflectional features. Additionally, it lists areas not covered, such as subcategorization frames, voice, negation, relative and embedded clauses, coordination, questions, and other constructions.
 
                
                E N D
Elicitation Corpus April 12, 2003
Agenda • Tagging with feature vectors or feature structures • Combinatorics • Extensions
Annotating the corpus • Feature Vectors: • Maria saw the girls. • Snum-s, stype-prop, sanim-an, scount-na, sdef-def, vtype-perc, vtime-past, onum-pl, odef-def, etc. • Feature Structures: • ((SUBJ ((num sg) (type prop) (anim an) (count na) (def def))) (vtype perc) (vtime past) (OBJ ((etc. • These are easy: they come right out of the parser.
Adapting parser output • Do we need to filter out irrelevant features? • E.g., features about “have” and “be” to make the English auxiliary system work. • E.g., (AUX- TYPE) = have
Not covered by the parser • Derived features: • does the subject outrank the object in animacy? • Constructional features: • Counterfactual conditional: If I had gone, I would have seen him. • Do we want to extend the parsing grammar to label these automatically? • Discourse/semantic/context features: • Context: Who saw John? • Elicitation sentence: Bill saw John. • Feature: subject is new information. • Elicitation sentence: He must see it. • Feature: evidential or deontic (obligation) • Features that aren’t used in English • Context: we=you and me (inclusive ‘we’) • Elicitation sentence: We are tall.
Example of Combinatorics: subject verb agreement • five numbers (singular, plural, dual, trial, paucal) • three genders (masculine, feminine, and neuter, and more for Bantu languages) • four persons (first, second, third, and fourth), • several levels of animacy (animate, inanimate, first and second person, third person) • two levels of definiteness (definite and indefinite) • huge number of tenses and aspects (present, past, future, non-past, non-future, near past, remote past, near future, remote future, continuous, perfective, etc.). Two steps? (1) Which features are involved? (2) Which values are involved?
Example of combinatorics: determiners and possessive pronouns • See handout.
Current Coverage of the Elicitation Corpus • Basic word order: intransitive verb and subject; transitive verb with subject and object; noun phrase with determiners, adjectives, an possessors. • Definiteness and animacy: special treatment of indefinite subjects, inanimate subjects, definite direct objects, animate direct objects, and sentences where the object outranks the subject in definiteness or animacy. • Agreeement (in number, gender, person, etc.): subject and verb; object and verb; determiner and noun; adjective and noun; possessor and noun; relative pronoun and noun. • Possessive NPs: with inalienable possession (body parts); kinship terms; alienable possession; pronominal possessors; full NP possessors. • Inflectional Features: gender, number, person, case, tense.
Not covered by the elicitation corpus • Subcategorization frames for major verb classes: stative, change of state, change of location, change of possession, creation, filling and covering, experience, cognition, perception, saying and telling, causatives, etc. • Voice: active, passive, and oblique voices. • Negation: sentences and noun phrases • Relative clauses: inflectional features of the relative pronoun; possible locations of the gap; headed or unheaded, etc. • Embedded clauses: argument clauses; adjunct clauses; nominalized clauses.
Not Covered • Coordination: sentences (switch reference and same subject), noun phrases, and other constituents. • Questions: Yes-no questions (positive answer expected and negative answer expected); • Open questions (possible locations of gaps). • Other constructions: comparatives, conditionals, causatives, desideratives, imperatives, possessor ascension, quantifier float, noun incorporation (polysynthesis). • Each of these has a few parameters to check: e.g., does the causee come out in dative or accusative case; can the incorporated noun take an unincorporated modifier; which NPs can possessors ascend from/quantifiers float from, etc. • Further coverage of tense, aspect, and modality: present, past, and future time; ongoing and completed actions; punctual and non-punctual activities; habituality; iteration; realized and non-realized. • Cross product of these with lexical aspect: state, activity, accomplishment, punctual.
Not Covered • Information structure: treatment of topic (given information) and focus (new information), including clefted and topicalized sentences. • Other meanings that are typically grammaticalized: yet, still, only, distributive (each), etc. • Other noun phrase phenomena: quantification, deictic determiners, classifiers, etc.