1 / 26

Linguistic Specifications Penn, December 11 2000

Understand how the SIMPLE model harmonizes computational lexicons for multilingual links, enabling semantic multidimensionality for NLP tasks. Explore the structure and benefits for HLT applications.

davel
Télécharger la présentation

Linguistic Specifications Penn, December 11 2000

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linguistic Specifications Penn, December 11 2000

  2. What is SIMPLE? A set of harmonised computational lexicons for HLT applications, geared for multilingual links • use of a common model • use of a common representation language • use of a common methodology of building the lexicon common Template Types, with default obligatory info(Type defining), and indication of optional info • A subset of the 12 Lexicons crosslingually related: • choice of a shared set of SemUs (from EWN)

  3. MuS SynU SemU SemU SemU SemU PAROLE – SIMPLE Architecture MuS SynU Sem Info Sem Info Sem Info Sem Info TEMPLATE Sem. Rel Sem. Feat Lexical Rel

  4. Semantic information in SIMPLE Word senses are encoded as Semantic Units(SemUs),containing the following information: • Semantic type * • Domain * • Lexicographic gloss * • Qualia structure • Reg. Polysemy altern. • Event type • Derivation relations • Synonymy • Collocations • Argument structure for predicative SemUs * • Selection restrictions on the arguments * • Link of the arguments to the syntactic subcategorization frames (represented in the PAROLE lexicons) *

  5. Some research aspects of the model On a large scale, for so many languages: • multiple orthogonal dimensions of meanings (GL) for different POS, e.g.: qualia roles, made up by various semantic relations/features (also from Genelex & Acquilex, but reorganised in a coherent structure): the extended qualia structure • argument structure & selection preferences, linked to the PAROLE syntactic frame • Providing a framework for testing and evaluating the maturity of the current state-of-the-art in lexical semantics • Potential basis for future European multilingual initiatives for HLT applications

  6. Semantic Multidimensionality and NLP Crucial NLP tasks (IE, WSD, NP Recognition, etc.) need to access multidimensional aspects of word meaning, represented in SIMPLE with the Qualia Relations Is_a_part_of Member_of la pagina del libro (the page of the book) il difensore della Juventus (Juventus fullback) il suonatore di liuto (the lute player) il tavolo di legno (the wooden table) Telic Made_of

  7. Complexity? a constraining, structured model is necessary • to enforce uniformity betw. languages&systematicity in encoding • Great granularity and details in the specs (wrt the TA) implied: • more work for the Specs Group... • a common methodology for the lexicographers, guided by the Templates (also less waist of time) • Templates as a way to organise and classify relevant“clusters” of information • for coherent encoding, across sites and languages (distributed building of harmonised lexicons) • for later use/tuning of the information in applications and tasks

  8. SemU Predicate, arguments, Selection restrictions Qualia Derivation Polysemy Event Type Overall Organization Greek lexicon Danish lexicon Catalan lexicon Template Type Ontology Instantiation Italian lexicon Pred. Layer …

  9. Type System Coordinates Predicative Layer Qualia Structure Conextual/ Polysemy Information Template for Semantic Units “redundancy”

  10. Perception VerbExamples: hear, smell, etc. NounExamples: sight, look, etc. Linguistic Tests: Levin Class: 30.1 (See verb, e.g. detect, see, notice), 30.4 (Stimulus subject, e.g. look, smell) Comments: Processes involving an experiencing relation, whereby the perception involves the senses of a living entity. The instrument of perception (e.g. eyes for see is encoded in the Constitutive quale). Under this template we include both volitional (e.g. look) and non-volitional (e.g. see) events. The difference is expressed as a constitutive feature.

  11. Template for Perception SemU:1 Usyn: BC Number:105 Template_Type:[Perception] Template_Supertype:[Psychological_event] Domain:General Semantic Class: Perception Gloss: //free// Event type: process Pred _Rep.: Lex_Pred (<arg0>,<arg1>) Derivation: <Nil> or //Erli's Code// Selectional Restr.:arg0 = Animate //concept// arg1:default = [Entity] Formal: isa (1,<SemU>:[Perception]>) Agentive: <Nil> Constitutive:instrument (1, <SemU>:[Body_part]) intentionality ={yes,no} //optional// Telic: <Nil> Collocates:Collocates (<SemU1>,...<SemUn>) Complex: <Nil>

  12. Example SemU: <guardare_2> //look_2// Usyn: BC Number:105 Template_Type:[Perception] Template_Supertype:[Psychological_event] Domain:General Semantic Class: Perception Gloss: osservare con attenzione Event type: process Pred _Rep.: guardare (<arg0>,<arg1>) Derivation: <Nil> Selectional Restr.: arg0 = Animate //concept// arg1:default = [Entity] Formal: isa (<guardare_2>,<percepire>: [Psychological_event]) Agentive: <Nil> Constitutive:instrument (<guardare_2>, <occhio>:[body_part]) intentionality ={yes} Telic: <Nil> Collocates:Collocates (<SemU1>,...<SemUn>) Complex: <Nil>

  13. Semantic Relations in SIMPLE • To represent: • multiple meaning dimensions in a sense- Qualia Rel. • cross-PoS relations (nominalization etc)- Derivation Rel. • regular polysemous classes - Polysemy Rel. • collocation information - Collocation Rel. • Requirements of Flexibility & Openness • an extendable framework:to allow coherent future extensions with additional or more specific info • multipurpose requirements:to make it possible tuning for specific applications/text types

  14. SemU Semantic Relations in SIMPLE Modular Representation of a Semantic Unity Pred. Layer Predicate, arguments, Selectional restrictions Rel. Layer Relations between SemUs Qualia Derivation Collocation Polysemy

  15. Top Telic Formal Constitutive Agentive Is_a Is_a_part_of Property Created_by Agentive_cause Indirect_telic Activity ... Contains ... Instrumental Is_the_habit_of Used_for Used_as Semantic Relations • The targets of relations identify: • prototypical semantic information associated with a SemU • elements of dictionary definitions of SemUs • typical corpus collocates of the SemU

  16. Semantic Relations Calcina (mortar) SemU: 3070 Type: [Artifactual_material] White substance used as material to build walls Used_for Used_as Isa <costruire> build <materiale> material <sostanza> substance

  17. Semantic Relations Ala (wing) <fabbricare> make Agentive SemU: 3232 Type: [Part] Part of an airplane <volare> fly Used_for Is_a_part_of <aeroplano> building Isa SemU: 3268 Type: [Part] Part of a building <parte> part Isa Used_for Isa SemU: D358 Type: [Body_part] Organ of birds for flying <edificio> building Is_a_part_of Is_a_part_of SemU: 3467 Type: [Role] Role in football <giocatore> player <uccello> bird Isa

  18. Relations and Predicates in SIMPLE Pred_SELL <ARG0>, <ARG1>, <ARG2>, <ARG3> SemU Sell V Is_the_agent_of SemU Seller N SemU Sale N Event_noun

  19. Argument Structure Comprendere V Comprensione N SemU: 61725 Type: [Cognitive_event] To understand SemU: 61726 Type: [Cognitive_event] Understanding master SemU: 6962 Type: [Constitutive_state] To include verb_nominalization Comprendere#1 <Arg1 [+human]>, <Arg2 [ +semiotic]> Comprendere#2 <Arg1 [+group]>, <Arg2> master

  20. Argument Structure il difensore di Clinton il difensore della Juventus Difensore N agent_ nominalization Difendere#1 <Arg1>, <Arg2> SemU: 4125 Type: [Role] Defender SemU: 3526 Type: [Role] Fullback <squadra> team Is_a_member_of

  21. Usem: 1 BC number: number Template_Type: [Part] Template_Supertype: [Constitutive] Domain: General Semantic Class: Part + <Semantic Class> Gloss //free// Pred_Rep.: Part_of(<arg0>) Selectional Restr.: arg0 = [Entity] Derivation: <Derivational Relation> Formal: isa (1, <part> or <hyperonym>) Agentive: <Nil> Constitutive: is_a_part_of (1, <Usem>: [Constitutive]) Telic: <Nil> Synonymy: <Nil> Collocates: Collocates (<Usem1>,...,<Usemn>) Complex: <Nil> Multidimensional Ontology 1. TELIC [Top] 2. AGENTIVE [Top] 2.1. Cause [Agentive] 3. CONSTITUTIVE [Top] 3.1. Part [Constitutive] 3.1.1. Body_part [Part] 3.2. Group [Constitutive] 3.2.1. Human_group [Group] 3.3. Amount [Constitutive] 4. ENTITY [Top] 4.1. Concrete_entity [Entity] 4.1.1. Location [Concrete_entity] …

  22. SIMPLE wrt EAGLES/ISLEComputational Lexicon WG Multilingual Lexicons (US-EU coop.) • Last EAGLES work on Lexicon/Semantics used for SIMPLE specifications • SIMPLElexicons chosen as a basis for applying & testingEAGLES/ISLE work on defining common guidelines for Multilingual Lexicons

  23. Basic lexical semantic notions • BASE CONCEPTS, HYPONYMY, SYNONYMY: all applications and enabling technologies • SEMANTIC FRAMES: MT, IR, IE, & Gen, Pars, MWR, WSD, Coref • COOCCURRENCE RELATIONS: MT, Gen, Word Clust, WSD, Par • MERONYMY: MT, IR, IE & Gen, PNR • ANTONYMY: Gen, Word Clust, WSD • SUBJECT DOMAIN: MT, SUM, Gen, MWR, WSD • ACTIONALITY: MT, IE, Gen, Par • QUANTIFICATION: MT, Gen, Coref

  24. Complementarity wrt EuroWordNet • Use of a small EWN subset for all languages • Mappable Top Ontology • Actual linking of data for a few languages • Semantic subcategorisation and linking with syntax • Template structure for the description of SemU • SemU vs. Synset: basic unit • Nodes in the Ontology as structured Sem. Types (bundles of different info types)

  25. From SENSEVAL/ROMANSEVAL Which requirements? • Common semantic tagset, Gold Standard • Criteria for sense discrimination (flexible & adaptable) & sense-granularity • Different dimensions of meanings • Different disambiguation clues/strategies (interaction syntax & semantics) • Underspecified readings (regular polysemy) • MultiWords • Metaphorical usage

  26. Core Lexicons to be enlarged at the National level PAROLE/SIMPLE start providing the common platform • For the subsidiarity concept the process started at the EU level is continued at the national level: • PAROLE/SIMPLE resources are being enlarged withinNational Projects (e.g. Danish, Greek, Italian, Portuguese, ...) • This createsa really large infrastructure of harmonised LR throughout Europe, impossible without the fundamental role played by the EC Standards and LRs projects • A major achievement in Europe, where all the difficultiesof LRs building are multiplied by the language factor

More Related