1 / 16

Presenter: Chun-Ping Wu

Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary. Presenter: Chun-Ping Wu Authors: Yeohoon Yoon, Choong-Nyoung Seon , Songwook Lee, Jungynu Seo. 國立雲林科技大學 National Yunlin University of Science and Technology. IPM 2007.

river
Télécharger la présentation

Presenter: Chun-Ping Wu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary Presenter: Chun-Ping Wu Authors: Yeohoon Yoon, Choong-NyoungSeon, Songwook Lee, JungynuSeo 國立雲林科技大學 National Yunlin University of Science and Technology IPM 2007

  2. Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Comments

  3. Motivation • The Word Sense Disambiguation is a common problem in natural language processing. • Traditional approaches only consider the co-occurrence probability alone. Sample:I deposit some money in the bank. Options: bank = 銀行? bank = 堤; 岸? bank = (一)排; (一)組

  4. Objective • To construct a WSD system, which can be easily implemented by learning all polysemous words at once, while covering all polysemous words which are listed in MRD. • To consider relation between each sense of context words and the sense of the target word. Sample:I deposit some money In the bank. Ans: bank = 銀行

  5. Methodology • Learning step • Similarity matrix • Word vector • Vector representations of sense definitions in MRD • Disambiguation step • The definition of acyclic weighted digraph. • Selecting context words • Constructing the acyclic weighted digraph • Searching the optimal path on the acyclic weighted digraph

  6. Methodology • Learning step • Similarity matrix • Word vector • Vector representations of sense definitions in MRD

  7. Methodology • Learning step • Similarity matrix • Word vector • Vector representations of sense definitions in MRD.

  8. Methodology • Learning step • Similarity matrix • Word vector • Vector representations of sense definitions in MRD

  9. Methodology • Disambiguation step • The definition of acyclic weighted digraph. • Selecting context words • Constructing the acyclic weighted digraph • Searching the optimal path on the acyclic weighted digraph

  10. Methodology • Disambiguation step • The definition of acyclic weighted digraph. • Selecting context words • Constructing the acyclic weighted digraph • Searching the optimal path on the acyclic weighted digraph

  11. Methodology • Disambiguation step • The definition of acyclic weighted digraph. • Selecting context words • Constructing the acyclic weighted digraph • Searching the optimal path on the acyclic weighted digraph

  12. Methodology • Disambiguation step • The definition of acyclic weighted digraph. • Selecting context words • Constructing the acyclic weighted digraph • Searching the optimal path on the acyclic weighted digraph

  13. Experiments • System results

  14. Experiments • Experiment on English • The accuracy of the system is 30.7% on average. • The result is very low; there are some reasons as follows. • Context words are not appropriate although context words are very important in that they decide which sense of the target word might be the best. • Mapping English senses to Korean for using English-Korean dictionary leads to some loss of information. • The errors of the stemming process disturbed us to search the right root of the verb in the MRD.

  15. Conclusion • To consider the relationship between each sense of context words and the sense of the target word • By using Viterbi algorithm to reduce computational complexity. • The system showed bad results on English (30.7), but it resulted in suitable performances, 76.4% by accuracy, over the semantically ambiguous Korean words. • To apply this method to other languages by studying language characteristics. 15

  16. Comments • Advantage • To consider the relationship between each sense of context words and the sense of the target word. • By using Viterbi algorithm to reduce computational complexity. • Drawback • The performance of this system is better in Korean. • Application • Word Sense Disambiguation 16

More Related