1 / 29

REPORT on Computational Lexicon Working Group on Multilingual Lexicon

REPORT on Computational Lexicon Working Group on Multilingual Lexicon. EU -WG Meeting December 1 st -2 nd 2000 Pisa UPenn, December 11 2000. The Multilingual ISLE Lexical Entry (MILE). General methodological principles (from EAGLES):. Basic requirements for the MILE:

noraw
Télécharger la présentation

REPORT on Computational Lexicon Working Group on Multilingual Lexicon

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. REPORT onComputational Lexicon Working Groupon Multilingual Lexicon EU -WG Meeting December 1st-2nd 2000 Pisa UPenn, December 11 2000

  2. The Multilingual ISLE Lexical Entry(MILE) General methodological principles (from EAGLES): • Basic requirements for the MILE: • Modular and layered • Granular • Allow for underspecification • ISLE should discover and list (the maximal set of) basic notions to be included in the MILE • The leading principle for the design of the MILE should be the edited union of existing lexicons / models (redundancy should not be a problem)

  3. MILE • Objective: definition of MILE, its basic notions, architecture, • such that we can write a DTD • & have a tool to support it • discover a methodology of work towards this

  4. Modularity in MILE • Some advantages: • Flexibility of representation • Easy to customise andupdate • Easy integration of existing resources • High versatility towards different applications • Modularity at least under three respects: • in the macrostructure and general architecture of the MILE • in the microstructure of the MILE • in the specific microstructure of the MILE word-sense

  5. Modularity in MILE • Modularity in the macrostructure and general architecture of the MILE Meta-information - versioning of the lexicon, languages, updates, status, project, origin, etc. (see e.g. OLIF, GENELEX) Possible architecture(s) of multilingual lexicon(s) - interactions of the different modules within the general structure. Issues related to transfer-based, interlingua-based approaches, and hybrid solutions.

  6. Modularity in MILE • Modularity in the microstructure of the MILE – The MILE could be organized in at least the following modules: Monolingual linguistic representation Collocational information Multilingual apparatus (e.g. transfer conditions and actions)

  7. Monolingual Linguistic Representation • It includes the morphosyntactic, syntactic, and semantic information characterizing the MILE in a certain source language. • It possibly corresponds to the typology of information contained in existing lexicons, such as PAROLE-SIMPLE, (Euro)WordNet (EWN), COMLEX, FrameNet, etc.

  8. Monolingual Linguistic Representation: a Provisional List • Morphological layer • Grammatical category and subcategory • Gender, number, person, mood • Inflectional class • Modifications of the lemma • Mass/count, 'pluralia tantum' • …

  9. Monolingual Linguistic Representation: a Provisional List • Syntactic layer • Idiosyncratic behaviour with respect to specific syntactic rules (passivisation, middle, etc.) • Attributive vs. predicative function, gradability • List of syntactic positions forming subcategorization frames • Syntactic constraints and properties of the possible 'slot filler' • Morphosyntactic and/or lexical features (agreement, auxiliary, prepositions and particles introducing clausal complements) • Information on control (subject control, object control, etc.) and raising properties • …

  10. Monolingual Linguistic Representation: a Provisional List • Semantic layer • Characterization of senses through links to an Ontology • Domain information, gloss • Argument structure, semantic roles, selectional preferences • Event type for verbs, to characterize their actionality behaviour • Link to the syntactic realization of the arguments • Basic semantic relations between word senses (synonymy / synset, hyponymy, meronymy, etc.) • Semantic/world-knowledge relations among word senses (such as EWN relations and SIMPLE Qualia Structure) • Information about regular polisemous alternation • Information concerning cross-part of speech relations • ….

  11. Collocational Information More or less typical and/or fixed syntactic-semantic patterns • Typical or idiosyncratic syntactic constructions • Typical collocates • Support verb construction • Phraseological or multiwords constructions • Compounds (e.g. noun-noun, noun-PP, adjective noun, etc.) • Corpus-driven examples of MILE • …

  12. Multilingual Apparatus Transfer conditions and actions • possible starting points: OLIF, GENELEX, etc. • devise possible cases of problematic transfer (cf. e.g. the list of linguistic phenomena circulated) • identify which conditions must be expressible and which transformation actions are necessary • select which types of information these conditions must access • examine the variability in granularity needed when translating in different languages, and the architectural implications of this • which role for an Interlingua?

  13. Modularity in MILE • Modularity in the specific microstructure of the MILE word-sense • Word-senses are the basic units at the multilingual level • Senses should also have a modular structure • Coarse-grained (general purpose) characterisation in terms of prototypical properties, captured by the formal means in (B.1) • Fine-grained (domain or text dependent) characterisation mostly in terms of collocational/syntagmatic properties (B.2) (particularly useful for specific tasks, such as WSD and translation)

  14. Meta-information Architecture 1. Coarse-grained 2. Fine-grained 1. Monolingual 2. Collocational 3. Multilingual A. MILE Macrostructure MILE C. Word-Sense Microstructure B. MILE Microstructure

  15. Monolingual Linguistic Representation A strategy: • consider as the starting point for MILE the edited union of the basic notions represented in the existing syntactic/semantic lexicons (their models) • evaluate their notions wrtEAGLES recommendations for syntax and semantics • evaluate their usefulness & adequacy for multilingual tasks • evaluate integrability of their notions in a unitary MILE • look for deficient areas. To be decided: should ISLE reach a consensus at the level of the “types” of information only, or also at the level of their “token” values?

  16. Collocational Information • Open issues: • what is relevant • what can be generalised and formally characterised • what must be simply listed (but even lists may be partially categorised) • what type of representation and analysis to be provided of these phenomena (e.g. a Mel'cuk style analysis for support verb constructions, FrameNet style description of syntactic-semantic “constructions”, etc.)

  17. Agreed Principles • MILEincorporates previous recommendations: is the “complete” entry (to be evaluated wrt usefulness & adequacy for multilingual tasks) • MILEbuilds on the monolingual entry & expands it (at least) with an additional module where correspondences betw. languages are defined We consider 2 broad categories of applications • translation • CLIR (linking module may be simpler) (label info types wrt application)

  18. Paths to discover Basic Notions of MILE • Clues in dictionaries to decide on target equivalent • Guidelines for lexicographers • Clues (to disambiguate/translate) in corpus concordances • Lexical requirements from various types of transfer conditions and actions in MT systems • Lexical requirements from interlingua-based systems • Examined guidelines for bilingual dictionaries provided by SA

  19. Classification of Basic Notions of MILE • For all the notions: • notion already in previous work (Eagles/ Parole/ Simple/ EWN/ Comlex/ Framenet/…) • evaluate if the existing specs are adequate • draw a list of “not yet recommended/adopted” notions: • method of work • priorities • for which applications • assign tasks • need of further development

  20. Organisational Proposal • Start from available EAGLESrecommendations, e.g. as instantiated in Parole/Simple • adopt as starting point the P/SDTD, to be revised & augmented • see Barcelona tool • Evaluate if we can combine in a “hybrid super-model” the transfer & interlingua approaches

  21. Organisational Proposal The tasks should lead to: • Select a list of critical information types that will compose each module of the MILE • Start an in-depth analysis of each of these areas aiming at identifying: • The most stable solutions adopted in the community • Linguistic specifications and criteria • Possible representational solutions, their compatibility, etc. • An evaluation of their respective weight/importance in a multilingual lexicon (towards a layered approach to recommendations) • Identify the open issues and the current boundaries of the state of the art (which cannot be standardised yet) • …..

  22. Semantic relations • Typology (e.g. hyponymy, meronymy, etc.) • Available tests • Representational format(s) • Applicative constraints and needs • Expressive limits • Open issues Argument structure • How to represent it (e.g. frames, a selection of theta-roles, e.g.) • Typology of arguments • Representational problems • Applicative constraints and needs • Linking with syntax (how to express it) • Open issues Information Types

  23. Selectional preferences • How to represent them (e.g. features, reference to an ontology, word-senses, etc.) • Different status of the preferences • Criteria to identify them • Expressive limits of existing formal resources MultiWords Expressions • Typology • How to represent the “internal” structure of MWEs (e.g. Mel’cuk relations, etc.) • Encoding criteria • Application needs and biases • Open issues Modification relations • Types of modifiers • Representational issues • Open issues Information Types

  24. Collocational Patterns • Typology • How to represent them • Interaction with selectional preferences Ontology • Architectural issues (types of ontologies: e.g. taxonomies, “Qualia”-based type systems, etc.) • Inheritance • Which roles for ontologies in the MILE • Representational issues • Customisation and development criteria • Limits Transfer conditions and actions • Identification of categories of transfer phenomena • Ranking of hard cases • Possible parameterisation wrt language types • How to formalise them • Types of actions Information Types

  25. Organisational Proposal • Highlighted some hot issues & assigned tasks: • sense indicators (Issco) • selection preferences (Thurmair) • argument structure (US?….) • MWE (Pisa) • modifiers (Jock) • semantic relations (Piek?) • transfer conditions (…) • collocational patterns (…) • ontology (…) • ….

  26. Organisational Proposal • Ask to Americans, e.g.: • evaluate existing EAGLES etc. recommendations wrt usefulness, coverage, adequacy,… • analyse some of the above info types • look at other languages (Japanese, Chinese, Korean, …) for transfer conditions • look at transfer-based MT systems • look at interlingua MT systems (e.g. Mikrokosmos): additional info types? • … Meeting together US & EU, e.g. end February, beg. March?

  27. DIET Tool • From ISSCO: • for text annotation (of test suites for semantic annotation) • to be used for evaluation purposes • …. • … • ...

  28. Survey: List of Received Materials

  29. Others Surveys Expected • Surveys from US? • Microsoft • IBM • CMU • NMSU • ISI • Systran • Logos

More Related