230 likes | 236 Vues
Evaluation of Utility of LSA for Word Sense Discrimination. Esther Levin, Mehrbod Sharifi, Jerry Ball http://www-cs.ccny.cuny.edu/~esther/research/lsa/. Outline. Latent Semantic Analysis (LSA) Word sense discrimination through Context Group Discrimination Paradigm Experiments
E N D
Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball http://www-cs.ccny.cuny.edu/~esther/research/lsa/
Outline • Latent Semantic Analysis (LSA) • Word sense discrimination through Context Group Discrimination Paradigm • Experiments • Sense-based clusters (supervised learning) • K-means clustering (unsupervised learning) • Homonyms vs. Polysemes • Conclusions
Latent Semantic Analysis (LSA)Deerwester ’90 • Represents words and passages as vectors in the same (low-dimensional) semantic space • Similarity in word meaning is defined by similarity of their contexts.
LSA Steps • Document-Term Co-occurrence Matrix e.g., 1151 documents X 5793 terms • Compute SVD • Reduce dimension by taking k largest singular values • Compute the new vector representations for documents • [Our Research] Clustering the new context vectors
Context Group Discrimination ParadigmShutze ’98 • Inducing senses of ambiguous words from their contextual similarity Context Vectors of an ambiguous word
3. Classify new contexts based on distance to centroids b a Sense 1 Sense 2 Context Group Discrimination ParadigmShutze ’98 2. Compute the centroids (sense vectors) 1. Cluster the context vectors a < b
Experimental Setup • Corpus – Leacock `93 • Line (3 senses – 1151 instances) • Hard (2 senses – 752 instances) • Serve (2 senses – 1292 instances) • Interest (3 senses – 2113 instances) • Context size: full document (small paragraph) • Number of clusters = Number of senses
Research Objective • How well the different senses of ambiguous words are separated in the LSA-based vector space. • Parameters: • Dimensionality of LSA representation • Distance measure • L1: City Block • L2: Squared Euclidean • Cosine
Best Case Separation Worst Case Separation Sense-based Clusters • An instance of supervised learning • An upper bound on unsupervised performance of K-means or EM • Not influenced by the choice of clustering algorithm
Sense-based Clusters: Accuracy • Training: Finding sense vectors based on 90% of data • Testing: Assigning the 10% remaining data to the closest sense vectors and evaluate by comparing this assignment to sense tags • Random selection, cross validation
Evaluating Clustering Quality:Tightness and Separation • Dispersion: Inter-cluster (K-Means minimizes) • Silhouette: Intra-cluster a(i): average distance of point i to all other points in the same cluster b(i): average distance of point i to the points in closest cluster
Closest Cluster i • Points are perfectly clustered • Points can belong one cluster or another • Points belong to wrong cluster More on Silhouette Value a(i) average of all blue lines b(i) average of all yellow lines
Evaluating Clustering Quality:Tightness and Separation Average Silhouette Value Cosine 0.9639 L1 0.7355 L2 0.9271 Cosine -0.0876 L1 -0.0504 L2 -0.0879
Sense-based Clusters:Discrimination Accuracy Baseline: Percentage of the majority sense
Sense-based Clusters:Results • Good discrimination accuracy • Low silhouette value • How is that possible?
Start with sense vector Most compact result Start randomly Sense-based clustering Training/Testing Unsupervised Learning with K-means • Cosine measure
Polysemes vs. Homonyms • Polysemes: words with multiple related meanings • Homonyms: words with the same spelling but completely different meaning
… find it hard to believe … … exactly how to say a lineand … … about 30 minutes and serve warm … … set the interest rate on the … … find it x to believe … … exactly how to say a xand … … about 30 minutes and x warm … … set the x rate on the … Pseudo Words as HomonymsShutze ’98
Points on red lines are the most compact cluster out of 10 experiments Dimensions (Pseudo Words) Polysemes vs. Homonyms: In LSA Space The correlation between compactness of clusters and discrimination accuracy is higher for homonyms than polysemes
Conclusions • Good unsupervised sense discrimination performance for homonyms • Major deterioration in sense discrimination of polysemes in absence of supervision • Dimensionality reduction benefit is computational only (no peak in performance) • Cosine measure performs better than L1 and L2