1 / 117

Wordnet, EuroWordNet, Global Wordnet

Wordnet, EuroWordNet, Global Wordnet. Piek Vossen Piek.Vossen@irion.nl http://www.globalwordnet.org. Overview. Princeton WordNet (1980 - ongoing) EuroWordNet (1996 - 1999) The database design The general building strategy Towards a universal index of meaning

enid
Télécharger la présentation

Wordnet, EuroWordNet, Global Wordnet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wordnet, EuroWordNet, Global Wordnet Piek Vossen Piek.Vossen@irion.nl http://www.globalwordnet.org

  2. Overview • Princeton WordNet (1980 - ongoing) • EuroWordNet (1996 - 1999) • The database design • The general building strategy • Towards a universal index of meaning • Global WordNet Association (2001 - ongoing) • Other wordnets • BalkaNet (2001 - 2004) • IndoWordnet (2002 - ongoing) • Meaning (2002 - 2005)

  3. WordNet1.5 • Developed at Princeton by George Miller and his team as a model of the mental lexicon. • Semantic network in which concepts are defined in terms of relations to other concepts. • Structure: • organized around the notion of synsets (sets of synonymous words) • basic semantic relations between these synsets • Initially no glosses • Main revision after tagging the Brown corpus with word meanings: SemCor. • http://www.cogsci.princeton.edu/~wn/w3wn.html

  4. Structure of WordNet1.5

  5. EuroWordNet • The development of a multilingual database with wordnets for several European languages • Funded by the European Commission, DG XIII, Luxembourg as projects LE2-4003 and LE4-8328 • March 1996 - September 1999 • 2.5 Million EURO. • URL: http://www.hum.uva.nl/~ewn

  6. Objectives of EuroWordNet • Languages covered: • EuroWordNet-1 (LE2-4003): English, Dutch, Spanish, Italian • EuroWordNet-2 (LE4-8328): German, French, Czech, Estonian. • Size of vocabulary: • EuroWordNet-1: 30,000 concepts - 50,000 word meanings. • EuroWordNet-2: 15,000 concepts- 25,000 word meaning. • Type of vocabulary: • the most frequent words of the languages • all concepts needed to relate more specific concepts

  7. Consortium

  8. The basic principles of EuroWordNet • the structure of the Princeton WordNet • the design of the EuroWordNet database • wordnets as language-specific structures • the language-internal relations • the multilingual relations

  9. Specific features of EuroWordNet • it contains semantic lexicons for other languages than English. • each wordnet reflects the relations as a language-internal system, maintaining cultural and linguistic differences in the wordnets. • it contains multilingual relations from each wordnet to English meanings, which makes it possible to compare the wordnets, tracking down inconsistencies and cross-linguistic differences. • each wordnet is linked to a language independent top-ontology and to domain labels.

  10. object artifact, artefact (a man-made object) natural object (an object occurring naturally) block instrumentality body box spoon bag device implement container tool instrument Autonomous & Language-Specific Wordnet1.5 Dutch Wordnet voorwerp {object} blok {block} lichaam {body} werktuig{tool} bak {box} lepel {spoon} tas {bag}

  11. Differences in structure • Artificial Classes versus Lexicalized Classes: • instrumentality; natural object • Lexicalization differences of classes: • container and artifact (object) are not lexicalized in Dutch • What is the purpose of different hierarchies? • Should we include all lexicalized classes from all (8) languages?

  12. Linguistic versus Conceptual Ontologies • Conceptual ontology: • A particular level or structuring may be required to achieve a better control or performance, or a more compact and coherent structure. • introduce artificial levels for concepts which are not lexicalized in a language (e.g. instrumentality, hand tool), • neglect levels which are lexicalized but not relevant for the purpose of the ontology (e.g. tableware, silverware, merchandise). • What properties can we infer for spoons? • spoon -> container; artifact; hand tool; object; made of metal or plastic; for eating, pouring or cooking

  13. Linguistic versus Conceptual Ontologies Linguistic ontology: Exactly reflects the relations between all the lexicalized words and expressions in a language. It therefore captures valuable information about the lexical capacity of languages: what is the available fund of words and expressions in a language. What words can be used to name spoons? spoon -> object, tableware, silverware, merchandise, cutlery,

  14. WordNet1.5 container box object container box Separate Wordnets and Ontologies Language-Neutral Ontology Language-Specific Wordnets ReferenceOntologyClasses: BOX ContainerProduct; SolidTangibleThing Dutch Wordnet voorwerp doos EuroWordNet Top-Ontology: Form: Cubic Function: Contain Origin: Artifact Composition: Whole

  15. Wordnets versus ontologies Wordnets: autonomous language-specific lexicalization patterns in a relational network. Usage: to predict substitution in text for information retrieval, text generation, machine translation, word-sense-disambiguation. Ontologies: data structure with formally defined concepts. Usage: making semantic inferences.

  16. Wordnets asLinguistic Ontologies Classical Substitution Principle: Any word that is used to refer to something can be replaced by its synonyms, hyperonyms and hyponyms: horse  stallion, mare, pony, mammal, animal, being. It cannot be referred to by co-hyponyms and co-hyponyms of its hyperonyms: horse X cat, dog, camel, fish, plant, person, object. Conceptual Distance Measurement: Number of hierarchical nodes between words is a measurement of closeness, where the level and the local density of nodes are additional factors.

  17. Linguistic Principles for deriving relations • 1. Substitution tests (Cruse 1986): • 1 a. It is a fiddle therefore it is a violin. • b It is a violin therefore it is a fiddle. • 2 a. It is a dog therefore it is an animal. • b *It is an animal therefore it is a dog. • 3 a to kill (/a murder) causes to die (/ death) • to kill (/a murder) has to die (/ death) as a consequence • b *to die / death causes to kill • *to die / death has to kill as a consequence

  18. Linguistic Principles for deriving relations • 2. Principle of Economy (Dik 1978): • If a word W1 (animal) is the hyperonym of W2 (mammal) and W2 is the hyperonym of W3 (dog) then W3 (dog) should not be linked to W1 (animal) but to W2 (mammal). • 3. Principle of Compatibility • If a word W1 is related to W2 via relation R1, W1 and W2 cannot be related via relation Rn, where Rn is defined as a distinct relation from R1.

  19. Domains Ontology bewegen gaan move go 2OrderEntity Traffic III Location Dynamic Air Road` rijden ride drive Lexical Items Table Lexical Items Table Lexical Items Table Lexical Items Table ILI-record {drive} conducir cavalcare cabalgar jinetear III mover transitar andare muoversi Architecture of the EuroWordNet Data Base III berijden I I III III II II III III II II guidare Inter-Lingual-Index III I = Language Independent link II = Link from Language Specific to Inter lingual Index III = Language Dependent Link

  20. The mono-lingual design of EuroWordNet

  21. Language Internal Relations • WN 1.5 starting point • The ‘synset’ as a weak notion of synonymy: • “two expressions are synonymous in a linguistic context C • if the substitution of one for the other in C does not alter • the truth value.” (Miller et al. 1993) • Relations between synsets: • Relation POS-combination Example • ANTONYMY adjective-to-adjective • verb-to-verb open/ close • HYPONYMY noun-to-noun car/ vehicle • verb-to-verb walk/ move • MERONYMY noun-to-noun head/ nose • ENTAILMENT verb-to-verb buy/ pay • CAUSE verb-to-verb kill/ die

  22. Differences EuroWordNet/WordNet1.5 • Added Features to relations • Cross-Part-Of-Speech relations • New relations to differentiate shallow hierarchies • New interpretations of relations

  23. EWN Relationship Labels • Disjunction/Conjunction of multiple relations of the same type • WordNet1.5 • door1 -- (a swinging or sliding barrier that will close the entrance to a room or building; "he knocked on the door"; "he slammed the door as he left") PART OF: doorway, door, entree, entry, portal, room access • door 6 -- (a swinging or sliding barrier that will close off access into a car; "she forgot to lock the doors of her car") PART OF: car, auto, automobile, machine, motorcar.

  24. EWN Relationship Labels {airplane} HAS_MERO_PART: conj1 {door} HAS_MERO_PART: conj2 disj1 {jet engine} HAS_MERO_PART: conj2 disj2 {propeller} {door} HAS_HOLO_PART: disj1 {car} HAS_HOLO_PART: disj2 {room} HAS_HOLO_PART: disj3 {entrance} {dog} HAS_HYPERONYM: conj1 {mammal} HAS_HYPERONYM: conj2 {pet} {albino} HAS_HYPERONYM: disj1 {plant} HAS_HYPERONYM: dis2 {animal} Default Interpretation: non-exclusive disjunction

  25. EWN Relationship Labels • Disjunction/Conjunction of multiple relations of the same type • {{dog} • HAS_HYPONYM: dis1 {poodle} • HAS_HYPONYM: dis1 {labrador} • HAS_HYPONYM: {sheep dog} (Orthogonal) • HAS_HYPONYM: {watch dog} (Orthogonal) • Default Interpretation: non-exclusive disjunction

  26. EWN Relationship Labels • Factive/Non-factive CAUSES (Lyons 1977) • factive (default interpretation): • “to kill causes to die”: • {kill} CAUSES {die} • non-factive: E1 probably or likely causes event E2 or E1 is intended to cause some event E2: • “to search may cause to find”. • {search} CAUSES {find} non-factive

  27. EWN Relationship Labels Reversed In the database every relation must have a reverse counter-part but there is a difference between relations which are explicitly coded as reverse and automatically reversed relations: {finger} HAS_HOLONYM {hand} {hand} HAS_MERONYM {finger} {paper-clip} HAS_MER_MADE_OF {metal} {metal} HAS_HOL_MADE_OF {paper-clip} reversed Negation {monkey} HAS_MERO_PART {tail} {ape} HAS_MERO_PART {tail} not

  28. Cross-Part-Of-Speech relations • WordNet1.5: nouns and verbs are not interrelated by basic semantic relations such as hyponymy and synonymy: • adornment 2 change of state-- (the act of changing something) • adorn 1 change, alter-- (cause to change; make different) • EuroWordNet: words of different parts of speech can be inter-linked with explicit xpos-synonymy, xpos-antonymy and xpos-hyponymy relations: • {adorn V} XPOS_NEAR_SYNONYM {adornment N}

  29. Cross-Part-Of-Speech relations The advantages of such explicit cross-part-of-speech relations are: • similar words with different parts of speech are grouped together. • the same information can be coded in an NP or in a sentence. By unifying higher-order nouns and verbs in the same ontology it will be possible to match expressions with very different syntactic structures but comparable content • by merging verbs and abstract nouns we can more easily link mismatches across languages that involve a part-of-speech shift. Dutch nouns such as “afsluiting”, “gehuil” are translated with the English verbs “close” and “cry”, respectively.

  30. Entailment in WordNet WordNet1.5: Entailment indicates the direction of the implication or entailment: a. + Temporal Inclusion (the two situations partially or totally overlap) a.1 co-extensiveness (e. g., to limp/to walk) hyponymy/troponymy a.2 proper inclusion (e.g., to snore/to sleep) entailment b. - Temporal Exclusion (the two situations are temporally disjoint) b.1 backward presupposition (e.g., to succeed/to try) entailment b.2 cause (e.g., to give/to have)

  31. Subevents in EuroWordNet EuroWordNet Direction of the entailment is expressed by the labels factive and reversed: {to succeed} is_caused_by {to try} factive {to try} causes {to succeed} non-factive Proper inclusion is described by the has_subevent/ is_subevent_of relation in combination with the label reversed: {to snore} is_subevent_of {to sleep} {to sleep} has_subevent {to snore} reversed {to buy} has_subevent {to pay} {to pay} is_subevent_of {to buy} reversed

  32. The interpretation of the CAUSE relation • WordNet1.5: The causal relation only holds between verbs and it should only apply to temporally disjoint situations: • EuroWordNet: the causal relation will also be applied across different parts of speech: • {to kill} V causes {death} N • {death} n is_caused_by {to kill} v reversed • {to kill } v causes {dead} a • {dead} a is_caused_by {to kill} v reversed • {murder} n causes {death}n • {death} a is_caused_by {murder} n reversed

  33. The interpretation of the CAUSE relation • Various temporal relationships between the (dynamic/non-dynamic) situations may hold: • Temporally disjoint: there is no time point when dS1 takes place and also S2 (which is caused by dS1) (e.g. to shoot/to hit); • Temporally overlapping: there is at least one time point when both dS1 and S2 take place, and there is at least one time point when dS1 takes place and S2 (which is caused by dS1) does not yet take place (e.g. to teach/to learn); • Temporally co-extensive: whenever dS1 takes place also S2 (which is caused by dS1) takes place and there is no time point when dS1 takes place and S2 does not take place, and vice versa (e.g. to feed/to eat).

  34. Role relations In the case of many verbs and nouns the most salient relation is not the hyperonym but the relation between the event and the involved participants. These relations are expressed as follows: {hammer} ROLE_INSTRUMENT {to hammer} {to hammer} INVOLVED_INSTRUMENT {hammer} reversed {school} ROLE_LOCATION {to teach} {to teach} INVOLVED_LOCATION {school} reversed These relations are typically used when other relations, mainly hyponymy, do not clarify the position of the concept network, but the word is still closely related to another word.

  35. Co_Role relations guitar player HAS_HYPERONYM player CO_AGENT_INSTRUMENT guitar player HAS_HYPERONYM person ROLE_AGENT to play music CO_AGENT_INSTRUMENT musical instrument to play music HAS_HYPERONYM to make ROLE_INSTRUMENT musical instrument guitar HAS_HYPERONYM musical instrument CO_INSTRUMENT_AGENT guitar player ice saw HAS_HYPERONYM saw CO_INSTRUMENT_PATIENT ice saw HAS_HYPERONYM saw ROLE_INSTRUMENT to saw ice CO_PATIENT_INSTRUMENT ice saw REVERSED

  36. Co_Role relations Examples of the other relations are: criminal CO_AGENT_PATIENT victim novel writer/ poet CO_AGENT_RESULT novel/ poem dough CO_PATIENT_RESULT pastry/ bread photograpic camera CO_INSTRUMENT_RESULT photo

  37. BE_IN_STATE and STATE_OF Example: the poor are the ones to whom the state poor applies Effect: poor N HAS_HYPERONYM person N poor N BE_IN_STATE poor A poor A STATE_OF poor N reversed IN_MANNER and MANNER_OF Example: to slurp is to eat in a noisely manner Effect: slurp V HAS_HYPERONYM eat V slurp V IN_MANNER noisely Adverb noisely Adverb MANNER_OF slurp V reversed

  38. Overview of the Language Internal relations in EuroWordnet • Same Part of Speech relations: • NEAR_SYNONYMY apparatus - machine • HYPERONYMY/HYPONYMY car - vehicle • ANTONYMY open - close • HOLONYMY/MERONYMY head - nose • Cross-Part-of-Speech relations: • XPOS_NEAR_SYNONYMY dead - death; to adorn - adornment • XPOS_HYPERONYMY/HYPONYMY to love - emotion • XPOS_ANTONYMY to live - dead • CAUSE die - death • SUBEVENT buy - pay; sleep - snore • ROLE/INVOLVED write - pencil; hammer - hammer • STATE the poor - poor • MANNER to slurp - noisily • BELONG_TO_CLASS Rome - city

  39. Thematic networks organisme (organism) Causes genezen (to get well) Patient Part of wezen(being) ziekte (disease) Patient orgaan (organ) persoon (person) behandelen(treat) Agent scalpel Patient arts (doctor) Instrument opereren (operate) zieke (sick person, patient) maagaandoening (stomach disease) maag (stomach) Involves

  40. The multi-lingual design of EuroWordNet

  41. The Multilingual Design • Inter-Lingual-Index: unstructured fund of concepts to provide an efficient mapping across the languages; • Index-records are mainly based on WordNet1.5 synsets and consist of synonyms, glosses and source references; • Various types of complex equivalence relations are distinguished; • Equivalence relations from synsets to index records: not on a word-to-word basis; • Indirect matching of synsets linked to the same index items;

  42. EWN Interlingual Relations • EQ_SYNONYM: there is a direct match between a synset and an ILI-record • EQ_NEAR_SYNONYM: a synset matches multiple ILI-records simultaneously, • HAS_EQ_HYPERONYM: a synset is more specific than any available ILI-record. • HAS_EQ_HYPONYM: a synset can only be linked to more specific ILI-records. • other relations: CAUSES/IS_CAUSED_BY, EQ_SUBEVENT/EQ_ROLE, EQ_IS_STATE_OF/EQ_BE_IN_STATE

  43. Equivalent Near Synonym • 1. Multiple Targets • One sense for Dutch schoonmaken (to clean) which simultaneously matches with at least 4 senses of clean in WordNet1.5: • {make clean by removing dirt, filth, or unwanted substances from} • {remove unwanted substances from, such as feathers or pits, as of chickens or fruit} • (remove in making clean; "Clean the spots off the rug") • {remove unwanted substances from - (as in chemistry)} • The Dutch synset schoonmaken will thus be linked with an eq_near_synonym relation to all these sense of clean.

  44. Equivalent Near Synonym • 2. Multiple Source meanings • Synsets inter-linked by a near_synonym relation can be linked to same target ILI-record(s), either with an eq_synonym or an eq_near_synonym relation: • Dutch wordnet: • toestel near_synonym apparaat • ILI-records: {machine}; {device}; {apparatus}; {tool}

  45. Equivalent Hyponymy has_eq_hyperonym Typically used for gaps in WordNet1.5 or in English: • genuine, cultural gaps for things not known in English culture, e.g. citroenjenever, which is a kind of gin made out of lemon skin, • pragmatic, in the sense that the concept is known but is not expressed by a single lexicalized form in English, e.g.: Dutch hoofd only refers to human head and Dutch kop only refers to animal head, English uses head for both. has_eq_hyponym Used when wordnet1.5 only provides more narrow terms. In this case there can only be a pragmatic difference, not a genuine cultural gap, e.g.: Spanish dedo can be used to refer to both finger and toe.

  46. = normal equivalence = eq _has_hyponym = eq _has_hyperonym Complex mappings across languages GB-Net IT-Net toe dito toe { : part of foot } finger finger { : part of hand } head dedo dito { , : finger or toe } head { : part of body } NL-Net ES-Net hoofd { : human head } kop { : animal head } hoofd dedo kop

  47. The methodologies for building wordnets

  48. Overall Building Process Machine Readable Dictionaries Wordnets, Taxonomies, Corpora Loaded in local databases Ia Ib Specification of selection criteria Subset of word meanings Improve and extend the wordnet fragments Encoding of language internal and equivalence relations Ia Wordnet fragment with links to WordNet1.5 in local database Adjust coverage improve encoding II Load wordnet in the EuroWordNet Database Ic Verification by users Wordnet fragment in EuroWordNet database Demonstration in Information Retrieval Comparing and restructuring the wordnet Verification Report III

  49. Main Methods • Expand approach: translate WordNet1.5 synsets to another language and take over the structure • easier and more efficient method • compatible structure with WordNet1.5 • structure is close to WordNet1.5 but also biased by it • Merge approach: create an independent wordnet in another language and align the separate hierarchies by generating the appropriate translations • more complex and labour intensive • different structure from WordNet1.5 • lanuage specific patterns can be maintained

  50. Methods for extracting language-internal relations • editors and database for manually encoding relations; • comparison with WordNet1.5 structure; • definition patterns in monolingual dictionaries; • co-occurrences in corpora; • morphology; • bilingual dictionaries; • lexical semantic substitution tests

More Related