1 / 29

WORDNET

WORDNET. Approach on word sense techniques. - AKILAN VELMURUGAN. What is WORDNET. Machine readable semantic dictionary interlinked by semantic relations Developed by PRINCETON University Large lexical database for English language

iolana
Télécharger la présentation

WORDNET

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WORDNET Approach on word sense techniques - AKILAN VELMURUGAN

  2. What is WORDNET • Machine readable semantic dictionary interlinked by semantic relations • Developed by PRINCETON University • Large lexical database for English language • Language forms a scale free network with small average shortest path having words as nodes and concepts as links source: http://wordnet.princeton.edu/

  3. Use of wordnet • Easily navigable • Used as online dictionary for English • Freely for public availability • structure to show relations in the form of • - noun, verb, adjective, adverb • - synonymn • - hypernym (Is a kind of …) • - hyponym (… is a kind of) • - troponym (particular ways to …) • - meronym (parts of . . .) • WORDNET Application source: http://wordnet.princeton.edu/

  4. Few representations of WORDNET • Schema representation • Graph Theory • Tree structure • Force graph structure • wordnet explorer • Visual Interface for wordnet

  5. Using RDF Schema and OWL ontology • Wordnet classes and properties are represented as wn:word and wn:wordsense Source: www.w3.org/.../WNET/wordnet-sw-20040713.html

  6. Source: www.w3.org/.../WNET/wordnet-sw-20040713.html

  7. Represented using Graph theory can be directed or un-directed graph Source: www. nodebox.net/code/index.php/Graph

  8. Source: www. nodebox.net/code/index.php/Graph

  9. Represented using Tree sturucture • uses tokens and lexical relations Source: www. docs.huihoo.com/nltk/0.9.5/en/ch02.html

  10. Source: www. docs.huihoo.com/nltk/0.9.5/en/ch02.html

  11. Represented using Force Graph Structure • Presentation of words and meanings as graph nodes, and relations as edges between them Source: www. code.google.com/p/synonym/

  12. Source: www. code.google.com/p/synonym/

  13. Represented for WORDNET Explorer • For applying visual principles to Lexical semantics Source: www.cs.toronto.edu/~ccollins/research/wnVis.htm

  14. Source: www.cs.toronto.edu/~ccollins/research/wnVis.htm

  15. Background study on wordsense • word ontology • Word Sense Disambiguation • Variable lexical notation for a concept • i-level generic notation • i-level specific notation • Semantic relatedness in WSD • Experiment Results • Thesaurus as a complex network • Visual Interface for wordnet Flow of study WORDNET – synsets – word ontology – set algebra – rules for representing lexical notations – semantic relatedness between concepts – concept distribution statistics – Degree of semantic relatedness :: WSD – Word Sense Disambiguation – semcor – Test cases – WSD on a complex network – WSD in English Thesaurus – Future work Source: http://kylescholz.com/projects/wordnet

  16. Wordnet – common sense ontology • Symbols are words • Concept meanings are synsets • Represented by one or more wods • Words used for representation: synonymns • Synonyms and polysemous word • Synset comprises a list of words and a list of semantic relations between other sysnsets. • Part I – list of words each one with a list of synsets that the word represents • Part II – set of semantic relations between synsets(is-a, part-of, substance-of, member-of)

  17. WSD: variable lexical notations for a concept • Generic concept notation: D = I ∪ J ∪ K ∴ J = D − (I ∪ K) = (D − I )∩(D − K) = D∩ (I∪ K) J = D∩ ( I ∩K) since, B = D ∪ E ∪ F D = B − (E∪F) =(B − E)∩(B − F) = B∩(E ∪F) D =B ∩(E ∩ F) ¯¯¯¯ ¯ ¯ ¯¯¯¯ ¯ ¯ Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

  18. WSD: variable lexical notations for a concept J = D∩ ( I ∩K) =( B∩(E ∩ F) )∩( I ∩ K) J = B∩( (E ∩ F)∩( I ∩ K) ) when J = fly, D = fish lure I = spinner k = troll And introducing boolean operators, AND for ∩ OR for ∪ NOT for ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

  19. WSD: variable lexical notations for a concept • (“fly”) becomes : (“fisherman's lure” OR “fish lure”) AND ( (NOT “spinner”) AND (NOT “troll”) ) then B = lure, E = ground bait, F = stool pigeon • (“fly”) becomes : (“bait” OR “decoy” OR “lure”) AND ( ((NOT “ground bait”) AND (NOT “stoolpigeon”) AND((NOT “spinner”)AND(NOT “troll”)) ) Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

  20. Notation for synset • i-level generic notation for a synset If Sk is a synset, Fi is the synset that is located i links away following the hypernym links from Sk then the i-level generic notation for Sk is: • Note: Fi is the parent node of Fi-1, Fi-1 is the parent node of Fi-2 … • i-level specific notation for a synset J = P ∪Q∪ R when, P = T Q = U R = V∪ W ∴J = T ∪ U ∪(V ∪W) If S is a synset, Li is the set of synsets, Cik that are located i links away following the hyponym links from S, then the i-level specific regular notation for S is: • Note: if Cik is null, then C(i-1)k would be used (C(i-1)k is a leaf node in the case) Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

  21. Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

  22. WSD: Semantic relatedness and word sense disambiguation • Procedure for determining the semantic relatedness of two given wordnet synsets • Conception 1: Concepts that appear more frequently and closer with each others are "more related" to each others than the concepts that appear less frequently and farther are. Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

  23. WSD: Semantic relatedness and word sense disambiguation Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

  24. WSD: Tested for four random texts i-level generic notation ( 1, 2, 3 ) Size of windows of context: Target words Vs Context words ( 3, 5, 7 ) Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

  25. Thesaurus as a complex network As a Directed Graph: • sink composed of the 73,046 terms with kout = 0 • source are the 30,260 terms with at least one outgoing link (kout > 0) – Root words • absolute source : without incoming links kin = 0 • normal source : (kout > 0 and kin > 0) • bridge source : without outgoing links to root words (kout(source) = 0) 1 – Normal source 2 – Bridge source 3 – Absolute source 4 – sink Source: arXiv:cond-mat/0312586 v1 2003

  26. Thesaurus as a complex network Frequency of outgoing links Frequency of incoming links Source: arXiv:cond-mat/0312586 v1 2003

  27. Thesaurus as a complex network Incoming Vs Outgoing Frequency Frequency distribution • Kout – for root words • Kin – for all words • - Root words in Kout • - All words in Kin • - Root words in Kin • - Non root words in Kin

  28. Extension of wordnet • Transforming a Tree structure to a Matrix structure • Wordnet in other languages (japanese, korean, Thai) • Imagenet interlinked with wordnet • REBUILDER – a repository of software designs • Retrieves using bayesian network and wordnet

More Related