Evaluation of Utility of LSA for Word Sense Discrimination

Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball http://www-cs.ccny.cuny.edu/~esther/research/lsa/

Outline • Latent Semantic Analysis (LSA) • Word sense discrimination through Context Group Discrimination Paradigm • Experiments • Sense-based clusters (supervised learning) • K-means clustering (unsupervised learning) • Homonyms vs. Polysemes • Conclusions

Latent Semantic Analysis (LSA)Deerwester ’90 • Represents words and passages as vectors in the same (low-dimensional) semantic space • Similarity in word meaning is defined by similarity of their contexts.

LSA Steps • Document-Term Co-occurrence Matrix e.g., 1151 documents X 5793 terms • Compute SVD • Reduce dimension by taking k largest singular values • Compute the new vector representations for documents • [Our Research] Clustering the new context vectors

Context Group Discrimination ParadigmShutze ’98 • Inducing senses of ambiguous words from their contextual similarity Context Vectors of an ambiguous word

3. Classify new contexts based on distance to centroids b a Sense 1 Sense 2 Context Group Discrimination ParadigmShutze ’98 2. Compute the centroids (sense vectors) 1. Cluster the context vectors a < b

Experiments

Experimental Setup • Corpus – Leacock `93 • Line (3 senses – 1151 instances) • Hard (2 senses – 752 instances) • Serve (2 senses – 1292 instances) • Interest (3 senses – 2113 instances) • Context size: full document (small paragraph) • Number of clusters = Number of senses

Research Objective • How well the different senses of ambiguous words are separated in the LSA-based vector space. • Parameters: • Dimensionality of LSA representation • Distance measure • L1: City Block • L2: Squared Euclidean • Cosine

Best Case Separation Worst Case Separation Sense-based Clusters • An instance of supervised learning • An upper bound on unsupervised performance of K-means or EM • Not influenced by the choice of clustering algorithm

Sense-based Clusters: Accuracy • Training: Finding sense vectors based on 90% of data • Testing: Assigning the 10% remaining data to the closest sense vectors and evaluate by comparing this assignment to sense tags • Random selection, cross validation

Evaluating Clustering Quality:Tightness and Separation • Dispersion: Inter-cluster (K-Means minimizes) • Silhouette: Intra-cluster a(i): average distance of point i to all other points in the same cluster b(i): average distance of point i to the points in closest cluster

Closest Cluster i • Points are perfectly clustered • Points can belong one cluster or another • Points belong to wrong cluster More on Silhouette Value a(i) average of all blue lines b(i) average of all yellow lines

Evaluating Clustering Quality:Tightness and Separation Average Silhouette Value Cosine 0.9639 L1 0.7355 L2 0.9271 Cosine -0.0876 L1 -0.0504 L2 -0.0879

Sense-based Clusters:Discrimination Accuracy Baseline: Percentage of the majority sense

Sense-based Clusters:Average Silhouette Value

Sense-based Clusters:Results • Good discrimination accuracy • Low silhouette value • How is that possible?

Start with sense vector Most compact result Start randomly Sense-based clustering Training/Testing Unsupervised Learning with K-means • Cosine measure

Unsupervised Learning with K-means

Polysemes vs. Homonyms • Polysemes: words with multiple related meanings • Homonyms: words with the same spelling but completely different meaning

… find it hard to believe … … exactly how to say a lineand … … about 30 minutes and serve warm … … set the interest rate on the … … find it x to believe … … exactly how to say a xand … … about 30 minutes and x warm … … set the x rate on the … Pseudo Words as HomonymsShutze ’98

Points on red lines are the most compact cluster out of 10 experiments Dimensions (Pseudo Words) Polysemes vs. Homonyms: In LSA Space The correlation between compactness of clusters and discrimination accuracy is higher for homonyms than polysemes

Conclusions • Good unsupervised sense discrimination performance for homonyms • Major deterioration in sense discrimination of polysemes in absence of supervision • Dimensionality reduction benefit is computational only (no peak in performance) • Cosine measure performs better than L1 and L2

Evaluation of Utility of LSA for Word Sense Discrimination

Evaluation of Utility of LSA for Word Sense Discrimination

Presentation Transcript

Unsupervised Word Sense Discrimination By Clustering Similar Contexts

Survey of Word Sense Disambiguation Approaches

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Making Sense Out of Fraction Word Problems

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word sense disambiguation of WordNet glosses

Word Sense Disambiguation

Unsupervised Word Sense Discrimination By Clustering Similar Contexts

Sense discrimination

The Utility of Metadata for Questionnaire Design and Evaluation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation