1 / 31

WordNets and TEI-LEX

WordNets and TEI-LEX. John P. McCrae , Thierry Declerck. Global WordNet Grid. EuroWordNet. BalkaNet. 3.0. Princeton WordNet. Multi WordNet. Asian WordNet. Indo WordNet. Open Multilingual WordNet. Open WordNet PT. 1990. 1998. 2002. 2009. 2012. Problems.

fortney
Télécharger la présentation

WordNets and TEI-LEX

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WordNets and TEI-LEX John P. McCrae, Thierry Declerck

  2. Global WordNet Grid

  3. EuroWordNet BalkaNet 3.0 Princeton WordNet Multi WordNet Asian WordNet Indo WordNet Open Multilingual WordNet Open WordNet PT 1990 1998 2002 2009 2012

  4. Problems Different relations used Different definitions of synonymy (tight, loose) Different interpretations of relations Anglo-saxon view of Princeton WordNet Updates to Princeton break all other wordnets Different coverage “25 synsets shared from 117,677 (0%)” - Open Multilingual WordNet (Bond)

  5. All open source wordnets linked to a single ILI Merge of concepts across all languages Princeton WordNet versions linked to ILI Adaptable by the open source wordnet community. Available as LOD and downloadable with open source license: CC-BY, CC-BY-SA. Solution: CILI - Collaborative Interlingual Index

  6. A unique, permanent URI A proper (!) English gloss Linked to at least one other synset How to define a concept

  7. Cross-lingual Mapping Professore Lehrer Teacher Lehrerin Professoressa

  8. Interlingual Mapping Teacher Professore Lehrer i123456 i123457 Professoressa Lehrerin

  9. WordNet LMF, JSON and RDF WordNet LMF is ‘LMF-like’ XML format Converter/validator http://server1.nlp.insight-centre.org/gwn-converter/ Isomorphic RDF/XML WordNet JSON is JSON-LD based representation A profile of OntoLex Turtle SPARQL

  10. The GWN WordNet Formats

  11. GlobalWordNet Structure LexicalResource Lexicon+ LexicalEntry+ Lemma Form* Sense* SenseRelation* Synset* Definition* SynsetRelation*

  12. <?xmlversion="1.0"encoding="UTF-8"?> <!DOCTYPE LexicalResource SYSTEM "http://globalwordnet.github.io/schemas/WN-LMF-1.0.dtd"> <LexicalResource xmlns:dc="http://purl.org/dc/elements/1.1/"> GlobalWordNet XML: Header XML Declaration Root element Dublin Core Namespace DTD for validation

  13. <Lexiconid="example-en" label="Example wordnet (English)" version="1.0" language="en" email="john@mccr.ae" citation="CILI: the Collaborative ..." license="https://creativecommons.org/publicdomain/zero/1.0/" url="http://globalwordnet.github.io/schemas/" dc:publisher="Global Wordnet Association"> GlobalWordNet XML: The Lexicon ID, label, version Language (ISO-639) Author/ citation License Homepage Dublin Core properties

  14. <LexicalEntryid="w1"> <LemmawrittenForm="paternal grandfather" partOfSpeech="n"/> <Senseid="example-en-1-n-1" synset="example-en-1-n"> <SenseRelationrelType="derivation" target="example-en-10161911-n-1"/> </Sense> </LexicalEntry> GlobalWordNet XML: Lexical Entries Unique ID Part-of-speech from fixed list Matches ID of aSynsetelement Fixed list of relations Matches ID of a Sense element

  15. <Synsetid="example-en-10161911-n" ili="i90287" partOfSpeech="n"> <Definitionlanguage="en"> the father of your father or mother </Definition> <SynsetRelationrelType="hypernym" target="example-en-10162692-n"/> </Synset> GlobalWordNet XML: Synsets Interlingual Identifier (Optional) Language Like sense relations but applies to all synset members

  16. Beyond Princeton WordNet: Wikipedia Linking

  17. Yellow (in a dictionary) Is a verb, noun and adjective Secondary synonyms: cowardly, warning (especially in soccer) Yellow (in an encyclopedia) A colour 2 books 8 Films or TV shows 4 songs A butterfly Lexical vs. Encyclopedic

  18. Is a lexical resource with some encyclopedic information This information is quite biased to Anglo-Saxon, American and even North Eastern US context. Princeton WordNet Paterson, NJ, USA Pop: 147,000 In Princeton WordNet Kawasaki, Japan Pop: 1,500,000 Not In Princeton WordNet

  19. Wikipedia is open and most-widely used encyclopedia Many lexical concepts are included, e.g., Play (activity) https://en.wikipedia.org/wiki/Play_(activity) Wikipedia

  20. Overlap between lexical and encyclopedic resources Lexical Resource Encyclopedic Resource

  21. Linking between resource types Lexical Encyclopedic Gold-standard, manual linking

  22. Matching Exact Match to Title Paris Paris - sometimes placed in subfamily Trilliaceae All but 77 (1.0%) synsets have at least one candidate Average of 21.6 Wikipedia articles per synset Title matches up to first comma Paris, Texas Paris - a town in northeastern Texas Title matches except for parentheses Paris - (Greek mythology) the prince of Troy who... Paris (Mythology) Redirect from matching title Paris, capital of France - the capital and largest city of France... Paris (Romeo and Juliet) → Count Paris Wikipedia Articles WordNet Synsets

  23. Category Matches (based on Suchanek’s YAGO) National Capital Category:Capitals in Europe Paris, capital of France - the capital and largest city of France... Paris Prague Prague, Praha, Czech capital - the capital and largest city of the Czech Republic

  24. Length based matches Diana (Mythology) i94915 Diana Princess Diana (comics) →Wonder Woman Princess Diana Princess of Wales Princess of Wales Lady Diana Frances Spencer Lady Diana Frances Spencer →Diana, Princess of Wales Longer matching strings are less ambiguous

  25. s(h,c) = Σ(i,a)∈P(h,c) σ(i,a) σ(i,a) = l(i,a) - ⍺ if (i,a) is unambiguous σ(i,a) = -β otherwise Ranking Category matches Wikipedia Category Hypernym synset Instance-Article matches Length of string matching s to a Shortness Penalty Fixed Ambiguity Penalty

  26. Manually checked linking of 7,687 synsets to Wikipedia available at https://github.com/jmccrae/wn-wiki-instances WordNet-Wikipedia Mapping

  27. Beyond Princeton WordNet: Colloquial WordNet

  28. Princeton WordNet is infrequently released Version 2.1: Mar 2005 Version 3.0: Dec 2006 Version 3.1: Nov 2012 (fewer synsets) Princeton WordNet Release Schedule

  29. English 12 years ago Spreading Tweeting Exit

  30. English Now Manspreading Tweeting Brexit

More Related