1 / 41

Designing an Elicitation Corpus with Semantic Representations

This research project focuses on creating an elicitation corpus that covers various semantic categories and constructions, with the aim of studying how languages form different constructions and important semantic distinctions. The corpus will be used for learning translation rules and navigating through language features.

Télécharger la présentation

Designing an Elicitation Corpus with Semantic Representations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin August 2006

  2. Overview • motivation • elicitation corpus • constraints • issue: definiteness • status

  3. Corpus example Was there an apple? Wasn't there an apple? Will there be an apple? Won't there be an apple? There was an apple. There was not an apple. There will be an apple. There will not be an apple. ...

  4. Same sentences translated into Chinese

  5. What is an elicitation corpus for? • Make a small parallel corpus that can be used for learning translation rules

  6. Motivation • how do languages form various constructions (e.g. relative clauses)? • The student who I saw • <chinese>

  7. what semantic distinctions are important in different languages? • He is falling. <chinese> <french> • They are falling. <chinese> <french> • I ate an apple. <chinese> <french> • I ate apples. <chinese> <french>

  8. Elicitation Corpus • sentences covering various semantic categories/constructions • e.g. number, gender, relative clauses • to be translated into language under study

  9. The MILE (MInor Language Elicitation) Corpus • 10,000-20,000 words • translations done by one person • 7 languages per year for next 5 years • E.g., Thai, Bengali, Punjabi • May have a lot of speakers, but fewer electronic resources

  10. Constraints • maximize range of semantic categories and constructions • minimize corpus size

  11. Constraints • different languages complex in different areas • only one corpus, for this project • ultimate goal: dynamically navigate through features

  12. Method • create semantic representations first (instead of starting with English) • write English sentences based on them • translate sentences into various languages

  13. Method • create semantic representations first (instead of starting with English) • write English sentences based on them • translate sentences into various languages

  14. Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((actor((np-function fn-actor)(np-general-type interrogative-type) (np-person person-third)(np-number num-dual) (np-biological-gender bio-gender-male)(np-animacy anim-human)(np-pronoun-antecedent antecedent-n/a) (np-specificity specificity-neutral)(np-identifiability identifiability-neutral) (np-distance distance-neutral)(np-pronoun-exclusivity inclusivity-n/a))) (undergoer ((np-person person-third)(np-identifiability unidentifiable)(np-number num-pl) (np-specificity non-specific)(np-animacy anim-inanimate)(np-biological-gender bio-gender-n/a)(np-function fn-undergoer)(np-general-type common-noun-type)(np-pronoun-exclusivity inclusivity-n/a)(np-pronoun-antecedent antecedent-n/a)(np-distance distance-neutral))) (c-polarity polarity-positive) (c-v-absolute-tense future) (c-general-type open-question)(c-question-gap gap-actor)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)(c-comparator-function comparator-n/a)(c-causee-control control-n/a)(c-our-situations situations-n/a)(c-comparand-type comparand-n/a)(c-causation-directness directness-n/a)(c-source source-neutral)(c-causee-volitionality volition-n/a)(c-assertiveness assertiveness-neutral)(c-solidarity solidarity-neutral)(c-v-grammatical-aspect gram-aspect-neutral)(c-adjunct-clause-type adjunct-clause-type-n/a)(c-v-phase-aspect phase-aspect-neutral)(c-v-lexical-aspect activity-accomplishment)(c-secondary-type secondary-neutral)(c-event-modality event-modality-none)(c-function fn-main-clause)(c-minor-type minor-n/a)(c-copula-type copula-n/a)(c-power-relationship power-peer)(c-our-shared-subject shared-subject-n/a))

  15. Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((ACTOR ((NP-FUNCTION FN-ACTOR) (NP-GENERAL-TYPE INTERROGATIVE-TYPE))) (UNDERGOER ((NP-PERSON PERSON-THIRD) (NP-IDENTIFIABILITY UNIDENTIFIABLE)(NP-NUMBER NUM-PL) (NP-SPECIFICITY NON-SPECIFIC))) (C-POLARITY POLARITY-POSITIVE) (C-V-ABSOLUTE-TENSE FUTURE))

  16. Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((ACTOR ((NP-FUNCTION FN-ACTOR) (NP-GENERAL-TYPE INTERROGATIVE-TYPE))) (UNDERGOER ((NP-PERSON PERSON-THIRD) (NP-IDENTIFIABILITY UNIDENTIFIABLE)(NP-NUMBER NUM-PL) (NP-SPECIFICITY NON-SPECIFIC))) (C-POLARITY POLARITY-POSITIVE) (C-V-ABSOLUTE-TENSE FUTURE)) Feature name

  17. Example: feature structure srcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker; ((ACTOR ((NP-FUNCTION FN-ACTOR) (NP-GENERAL-TYPE INTERROGATIVE-TYPE))) (UNDERGOER ((NP-PERSON PERSON-THIRD) (NP-IDENTIFIABILITY UNIDENTIFIABLE)(NP-NUMBER NUM-PL) (NP-SPECIFICITY NON-SPECIFIC))) (C-POLARITY POLARITY-POSITIVE) (C-V-ABSOLUTE-TENSE FUTURE)) Feature name value

  18. Using semantic representation • Advantage: • specify grammatical features more precisely than text

  19. Method • create semantic representations first (instead of starting with English) • write English sentences based on them • translate sentences into various languages

  20. Corpus example Was there an apple? Wasn't there an apple? Will there be an apple? Won't there be an apple? There was an apple. There was not an apple. There will be an apple. There will not be an apple. ...

  21. Method • create semantic representations first (instead of starting with English) • write English sentences based on them • translate sentences into various languages

  22. Same sentences translated into Chinese

  23. 1. Naturalness • naturalness of sentences vs. holding lexical items constant • minimal pairs ideal (A tree fell/The tree fell) • but also want natural sentences • natural → easier to translate → less mistakes • e.g. She hurt herself. It ____ itself. • sentences are hand-written • vs using natural language generators (GenKit)

  24. 2. Restrictions • need to find restrictions on combinations of features • some combinations invalid/unnatural • e.g. inclusive and third-person male

  25. 3. Definition of values • use language-independent semantic categories • e.g. specificity, identifiability • writers need to agree on definitions of values • Intercoder agreement (informal experiment) • each coder needs to be consistent • writers agreed on English forms to use

  26. avoid language-specific grammatical features • Suppose you want to know about definiteness in the minor language • You think that you can find out about definiteness by checking how they translate “the”, “a” and null determiner • You get the following data from French • <French data here> • It’s a mess • Sometimes “the” is translated as “le/la” • Sometimes “le/la” occur where English has a null determiner • etc. • You don’t know why it’s a mess

  27. Avoid language specific… • Have to break it down by function: • Indefinite quantity • Generic • Predicate nominal • A definite noun phrase • Etc.

  28. Definiteness • example of a problem in design of features and values • how to define definiteness, • while avoiding using English definiteness categories?

  29. Criteria for definiteness Lyons (1999): • uniqueness • familiarity • identifiability • specificity • inclusiveness

  30. Definiteness You and I are in a room. I say “The chair is on fire!”

  31. Criteria for definiteness Why did I say “the chair”? • identifiability • I know that you know what chair I’m talking about • specificity • a chair you can single out among chairs you’re imagining

  32. Grammatical feature: specificity “John wants to marry a Norwegian.” Feature: np-specificityValues • specific • John wants to marry a (specific) Norwegian. • non-specific • John wants to marry someNorwegian. • specificity-neutral • She is a Norwegian.

  33. Grammatical feature: specificity • Turkish direct objects: Ali bir kitap okudu. Ali one book read Ali read a book. Ali bir kitab-ı okudu. Ali one book-acc read Ali read a (specific) book.

  34. Definiteness: corner cases • e.g. Who will be the manager? • Not about a specific manager, but it is about a specific role • e.g. She is a teacher. • identifiable-neutral, specificity-neutral • no article here in French • e.g. A dog has four legs. • identifiability-generic, specificity-neutral

  35. Criteria for definiteness chose the most important criteria: • identifiability • specificity

  36. Layout of Corpus 1. Clause types, negation, and formality 2. Discourse setting/Speaker-hearer features 3. Basic NP features 4. Verbal Tense and Aspect 5. Evidentiality and Modality 6. Causatives 7. Comparatives 8. Modifiers 9. Conjunctions 10. Clause-combining

  37. Layout of Corpus • combine feature values systematically • why combine • some features interact • e.g. Will the woman be happy?(interrogative, future tense) • what to combine? • some features known to interact • e.g. person, number (I am, we are, he is)

  38. Status • delivered (as of two weeks ago)!

  39. e.g. definite feature corresponding to English “the” • definiteness in other languages different • definiteness → familiarity, uniqueness, identifiability, etc.

  40. Steps in corpus creation • Define features and values • tricky to define meanings • e.g. semantics of definiteness • Uniqueness: the president • Familiarity: • Identifiability: • Specificity: • languages have subtly different categories • e.g. definiteness

More Related