1 / 29

WordNet and Extended WordNet

WordNet and Extended WordNet. Sriram Rajaraman. Objective. Introduce the idea of an semantic lexicon ontology, especially WordNet and eXtended WordNet. Focus. Introduction WordNet eXtended WordNet Summary. Reference. WordNet: http://wordnet.princeton.edu/

gray-boone
Télécharger la présentation

WordNet and Extended WordNet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WordNet and Extended WordNet Sriram Rajaraman

  2. Objective • Introduce the idea of an semantic lexicon ontology, especially WordNet and eXtended WordNet University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  3. Focus • Introduction • WordNet • eXtended WordNet • Summary University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  4. Reference • WordNet: http://wordnet.princeton.edu/ • eXtended WordNet: http://xwn.hlt.utdallas.edu/ • Christiane Fellbaum,MIT ,”WordNet : an electronic lexical database”, MIT Press, 1999, c1998. • George A. Miller, Richard Beckwith, Christiane Fellbaum,Derek Gross, and Katherine Miller, “Introduction to WordNet: An On-line Lexical Database”, core working paper • Rada Mihalcea, Dan I. Moldovan,” eXtended WordNet: progress report  ” Proceedings of NAACL Workshop on WordNet and Other Lexical Resources , 2001 • Sanda M. Harabagiu, George A. Miller, Dan I. Moldovan, “WordNet 2 - A Morphologically and Semantically Enhanced Resource”, SIGLEX 1999 University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  5. Focus • Introduction • WordNet • eXtended WordNet • Summary University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  6. Introduction • Traditional Dictionary • What is available: • spelling • pronunciation • inflected and derivative forms • etymology • part of speech • definitions • illustrative uses of alternative senses • synonyms and antonyms • special usage notes University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  7. TreeRef: http://www.merriam-webster.com/dictionary/Tree • Main Entry: tree • Pronunciation: \ˈtrē\ • Function: noun • Etymology: Middle English, from Old English trēow; akin to Old Norse trē tree, Greek drys, Sanskrit dāru wood • Date: before 12th century • - a woody perennial plant having a single usually elongate main stem generally with few or no branches on its lower part University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  8. Drawback of traditional dictionary • What is missing: • It does not say, for example, that trees have roots, or that they consist of cells having cellulose walls, or even that they are living organisms • “Sense” of the super ordinate term aka hypernym (living plant or industrial plant) • Coordinate terms (bushes, shrubs, …) • Hyponyms - types of trees (pine, tropical,deciduous..) • Information assumed to be known to everyone ( trees have barks and leaves, they grow from seeds, they make their own food by photosynthesis- probably information for encyclopedia!) University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  9. How can we improve ? • The missing information is structural – every word points upwards to its super-ordinate (hypernym), but not sideward to its co-ordinates or downward to the hyponym. • Restriction due to alphabetical ordering, budget and size constraints- which can be overcome in an electronic lexical database University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  10. Focus • Introduction • WordNet • eXtended WordNet • Summary University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  11. What is WordNet? • WordNet is a lexical database for the English language. • WordNet 3.0 has [1]: • – 117,097 nouns (average noun has 1.23 senses) • – 11,488 verbs (average verb has 2.16 sense) • – 22,141 adjectives • – 4,601 adverbs • Created and maintained at the Cognitive Science Laboratory of Princeton University • Accessible online @ http://wordnetweb.princeton.edu/perl/webwn (Also Downloadable) • Interfaces available in , c, dot Net , java, perl, php, python, sql etc..(JWNL, WordNet.Net, RTiA wordNet, pywordne ..) University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  12. WordNet Structure • Words are organized as synsets in WordNet • There are four disjoint kinds of synsets, containing either • Nouns • verbs • Adjectives • Adverbs University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  13. What is a synset? • Basic unit of WordNet • A group of synonymous words which refer to a common semantic concept • Words may belong to more than one synset – first sense is the most frequent sense • Words also include collocations (“eye contact’, “mix up”) • Example University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  14. Synset example • “car” as in • {car, auto, automobile, machine, motorcar} • {car, railcar, railway car, railroad car}. • “Chocolate” as in- University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  15. How are synsets related? • A list of pointers associated with each sysnet to express the relationship between synsets • WordNet defines 17 relations • 10 between synsets • 5 between wordsense • "gloss" (between a synset and a sentence, i.e a textual definition for each synset) • "frame" (between a synset and a verb construction pattern) University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  16. WordNet relations University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  17. University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  18. Applications of WordNet • Information Extraction • Information Retreival • Question Answering • Word Sense Disambiguation • Text Inference • Coreference, coherence and metonymy • Knowledge acquisition • Internet Search engine University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  19. Limitations of WordNet • Designed as a semantic lexicon, not a knowledge base • Limited connections between topically related words • Lack of morphological relationship (special algorithm does that) • Lack of selectional restriction • And more…. [6] University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  20. Focus • Introduction • WordNet • eXtended WordNet • Summary University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  21. eXtended WordNet[2] • A project at the Human Language Technology Research Institute , at The University of Texas at Dallas(http://xwn.hlt.utdallas.edu) • Provides several important enhancements (over WordNet2.0) intended to remedy the present limitations of WordNet • Current Version: eXtended WordNet 2.0 (xwn 2.0-1.1) University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  22. Objective of eXtended WordNet • Exploit the rich information, available in synset glosses (gloss is a sentence, i.e a textual definition for each synset) • Semantic and logical enhancements to WordNet • Increase the connectivity among the synsets by at least one order of magnitude • Enable access to a broader context for each concept University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  23. What eXtended WordNet does?[5] • Preprocessing and Parsing • Separation of glosses into definition and examples, tokenization and identification of compound words • Word Sense Disambiguation • All words in a gloss are tagged with appropriate senses and linked to corresponding synsets • Logical Form Transformation • Gloss  Logical Forms • Topical Relations • Connections are established between the words, based on the context/topic University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  24. “Tennis court: A court on which tennis is played.” def location-of tennis court court play object tennis {“tennis”, “lawn tennis”} Extended WordNet University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  25. eXtended WordNet format • Consists of four XML files--one for each part of speech: • Noun • Verb • Adjective • Adverb • The xml tags contain attributes that specify the relationships University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  26. eXtended WordNet- Applications • Core Knowledge Base for applications - • Question Answering • Information Retrieval • Information Extraction • Summarization • Natural Language Generation • Inferences • Other knowledge intensive applications University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  27. Focus • Introduction • WordNet • eXtended WordNet • Summary University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  28. Further Reading • W3C- RDF/OWL Representation of WordNet • http://www.w3.org/TR/wordnet-rdf/ • eXtended WordNet Format/algorithm • http://xwn.hlt.utdallas.edu/wsd.html • Current research at Princeton • http://wordnet.cs.princeton.edu/projects.html • Related Projects (APIs, Web Interface, Extension) • http://wordnet.princeton.edu/wordnet/related-projects/ University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

  29. Back up University of Texas at Dallas Erik Jonnson School of Engineering and Computer Science

More Related