1 / 23

Evaluation of Utility of LSA for Word Sense Discrimination

Evaluation of Utility of LSA for Word Sense Discrimination. Esther Levin, Mehrbod Sharifi, Jerry Ball http://www-cs.ccny.cuny.edu/~esther/research/lsa/. Outline. Latent Semantic Analysis (LSA) Word sense discrimination through Context Group Discrimination Paradigm Experiments

victoriah
Télécharger la présentation

Evaluation of Utility of LSA for Word Sense Discrimination

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball http://www-cs.ccny.cuny.edu/~esther/research/lsa/

  2. Outline • Latent Semantic Analysis (LSA) • Word sense discrimination through Context Group Discrimination Paradigm • Experiments • Sense-based clusters (supervised learning) • K-means clustering (unsupervised learning) • Homonyms vs. Polysemes • Conclusions

  3. Latent Semantic Analysis (LSA)Deerwester ’90 • Represents words and passages as vectors in the same (low-dimensional) semantic space • Similarity in word meaning is defined by similarity of their contexts.

  4. LSA Steps • Document-Term Co-occurrence Matrix e.g., 1151 documents X 5793 terms • Compute SVD • Reduce dimension by taking k largest singular values • Compute the new vector representations for documents • [Our Research] Clustering the new context vectors

  5. Context Group Discrimination ParadigmShutze ’98 • Inducing senses of ambiguous words from their contextual similarity Context Vectors of an ambiguous word

  6. 3. Classify new contexts based on distance to centroids b a Sense 1 Sense 2 Context Group Discrimination ParadigmShutze ’98 2. Compute the centroids (sense vectors) 1. Cluster the context vectors a < b

  7. Experiments

  8. Experimental Setup • Corpus – Leacock `93 • Line (3 senses – 1151 instances) • Hard (2 senses – 752 instances) • Serve (2 senses – 1292 instances) • Interest (3 senses – 2113 instances) • Context size: full document (small paragraph) • Number of clusters = Number of senses

  9. Research Objective • How well the different senses of ambiguous words are separated in the LSA-based vector space. • Parameters: • Dimensionality of LSA representation • Distance measure • L1: City Block • L2: Squared Euclidean • Cosine

  10. Best Case Separation Worst Case Separation Sense-based Clusters • An instance of supervised learning • An upper bound on unsupervised performance of K-means or EM • Not influenced by the choice of clustering algorithm

  11. Sense-based Clusters: Accuracy • Training: Finding sense vectors based on 90% of data • Testing: Assigning the 10% remaining data to the closest sense vectors and evaluate by comparing this assignment to sense tags • Random selection, cross validation

  12. Evaluating Clustering Quality:Tightness and Separation • Dispersion: Inter-cluster (K-Means minimizes) • Silhouette: Intra-cluster a(i): average distance of point i to all other points in the same cluster b(i): average distance of point i to the points in closest cluster

  13. Closest Cluster i • Points are perfectly clustered • Points can belong one cluster or another • Points belong to wrong cluster More on Silhouette Value a(i) average of all blue lines b(i) average of all yellow lines

  14. Evaluating Clustering Quality:Tightness and Separation Average Silhouette Value Cosine 0.9639 L1 0.7355 L2 0.9271 Cosine -0.0876 L1 -0.0504 L2 -0.0879

  15. Sense-based Clusters:Discrimination Accuracy Baseline: Percentage of the majority sense

  16. Sense-based Clusters:Average Silhouette Value

  17. Sense-based Clusters:Results • Good discrimination accuracy • Low silhouette value • How is that possible?

  18. Start with sense vector Most compact result Start randomly Sense-based clustering Training/Testing Unsupervised Learning with K-means • Cosine measure

  19. Unsupervised Learning with K-means

  20. Polysemes vs. Homonyms • Polysemes: words with multiple related meanings • Homonyms: words with the same spelling but completely different meaning

  21. … find it hard to believe … … exactly how to say a lineand … … about 30 minutes and serve warm … … set the interest rate on the … … find it x to believe … … exactly how to say a xand … … about 30 minutes and x warm … … set the x rate on the … Pseudo Words as HomonymsShutze ’98

  22. Points on red lines are the most compact cluster out of 10 experiments Dimensions (Pseudo Words) Polysemes vs. Homonyms: In LSA Space The correlation between compactness of clusters and discrimination accuracy is higher for homonyms than polysemes

  23. Conclusions • Good unsupervised sense discrimination performance for homonyms • Major deterioration in sense discrimination of polysemes in absence of supervision • Dimensionality reduction benefit is computational only (no peak in performance) • Cosine measure performs better than L1 and L2

More Related