1 / 66

Representing Meaning in Unsupervised Word Sense Disambiguation

Bridget T. McInnes 5 September 2008. Representing Meaning in Unsupervised Word Sense Disambiguation. University of Minnesota Twin Cities. What is WSD?. The culture count doubled. Culture. Anthropological Culture. Laboratory Culture. Sense Inventory. Approaches to WSD. Supervised

shiela
Télécharger la présentation

Representing Meaning in Unsupervised Word Sense Disambiguation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bridget T. McInnes 5 September 2008 Representing Meaning in Unsupervised Word Sense Disambiguation University of Minnesota Twin Cities

  2. What is WSD? The culture count doubled. Culture Anthropological Culture Laboratory Culture Sense Inventory

  3. Approaches to WSD • Supervised • Advantages: obtains a high accuracy • Disadvantages: manually annotated training data is required for each word that needs to be disambiguated therefore it can not scale • Unsupervised • Advantages: does not require manually annotated training data • Disadvantages: generally does not obtain as high of an accuracy as supervised approaches

  4. Unsupervised Approaches • Similarity and Relatedness Based

  5. Unsupervised Approaches • Similarity and Relatedness Based • Patwardhan, Banerjee and Pedersen 2005 • Pedersen, et al 2006 • Budanitsky and Hirst 2006

  6. Unsupervised Approaches • Similarity and Relatedness based • Vector Based

  7. Unsupervised Approaches • Similarity and Relatedness Based • Vector-based • Mohammad and Hirst, 2006 • Patwardhan, 2003 • Pedersen, et al 2006 • Humphrey, et al 2006

  8. Unsupervised Approaches • Similarity and Relatedness-based • Vector-based • Clustering

  9. Unsupervised Approaches • Similarity and Relatedness based • Vector-based • Clustering • Pedersen and Bruce, 1997 • Shütze, 1998 • Pedersen and Bruce, 1998 • Purandare and Pedersen, 2004 • Kulkarni and Pedersen, 2005

  10. Road Map • Previous Approaches • Our vector approach • Future Work

  11. Previous Approaches • Similarity and Relatedness Based • SenseRelate (Banerjee and Pedersen, 2003) • Vector-based • Semantic Type Indexing (Humphrey et al 2006) • Clustering • SenseClusters (Kulkarni and Pedersen, 2005)

  12. Banerjee and Pedersen 2003 Sense Relate

  13. SenseRelate Target Word: Transport Transport of glutathione S-linked conjugates. Concept 1: Biological Transport (C0005528) Concept 2: Patient Transport (C0150390) C0005528 = SS + SS + SS = Total SS for Concept 1 C0017817 C0522529 C0301869 glutathione S-linked conjugates.

  14. SenseRelate Target Word: Transport Transport of glutathione S-linked conjugates. Concept 1: Biological Transport (C0005528) Concept 2: Patient Transport (C0150390) C0005528 = SS + SS + SS = Total SS for concept 1 C0150390 = SS + SS + SS = Total SS for concept 2 C0017817 C0522529 C0301869 glutathione S-linked conjugates.

  15. Humphrey et al, 2006 Semantic Type Indexing for WSD

  16. Semantic Type Indexing (STI) Target Word: Transport Transport of glutathione S-linked conjugates. Concept 1: Biological Transport Semantic type: Cell Function Concept 2: Patient Transport Semantic type: Health Care Activity CV1 – JDI vector Concept 1 Vector CV2 – JDI vector JDI Concept 2 Vector Cosine 2 Target Word Vector Cosine 1 TW – JDI vector

  17. Target Word Vector Contains the words surrounding the ambiguous word Transport of glutathione S-linked conjugates.

  18. STI - Target Word Vectors Contains the words surrounding the ambiguous word Transport of glutathione S-linked conjugates.

  19. STI -Concept Vectors The concept vectors are created based on their semantic type(s) Transport: C0005528: Biological Transport C0150390: Patient Transport One word terms in the Metathesaurus associated with Cell Function Cell Function C0005528 One word terms in the Metathesaurus associated with Health Care Activity C0150390 Health Care Activity

  20. Kulkarni and Pedersen, 2005 SenseClusters

  21. Sense Clusters (SC) Target Word: Transport Transport of glutathione S-linked conjugates. Concept 1: Biological Transport Concept 2: Patient Transport Instance 1 Instance 2 Instance 3 Instance 4 Instance 5 Instance 6 Instance 7 Instance 8 Instance 9 Instance 10 Instance 11 Instance 12 Instance 13 … Concept 1 Concept 2

  22. Sense Clusters (SC) Target Word: Transport Transport of glutathione S-linked conjugates. Concept 1: Biological Transport Concept 2: Patient Transport Instance 1 Instance 2 Instance 3 Instance 4 Instance 5 Instance 6 Instance 7 Instance 8 Instance 9 Instance 10 Instance 11 Instance 12 Instance 13 … Concept 1 Concept 2

  23. Sense Clusters Target Word: Transport Transport of glutathione S-linked conjugates. Concept 1: Biological Transport Concept 2: Patient Transport Concept 1 Vector Concept 2 Vector Cosine 2 Cosine 1 Target Word Vector

  24. SC -Vectors • Contain the words surrounding the ambiguous word • Created using: • First order co-occurrences • Second order co-occurrences

  25. First Order Co-occurrence Vectors Word N 5 50 5 20 . . . . . . . . . . . . . . . . Word 2 6 0 6 4 Word 1 7 5 15 1 glutathione S-linked conjugates Target Vector

  26. Second Order Co-occurrence Vectors Word1 Word 2 … Word N Word N 20 10 … 0 10 . . . . . . . … … … … 10 0 … 50 30 Word 2 … 0 0 2 2 Word 1 2nd order glutathione 1st order glutathione … 0 2 2

  27. Second Order Co-occurrence Vectors Word N 0 10 5 5 . . . . . . . . . . . . . . . . Word 2 30 0 6 13 Word 1 5 2 13 0 glutathione Target Vector S-linked conjugates

  28. Our unsupervised approach

  29. CuiTools Approach Our approach uses a general vector approach with SenseCluster vectors

  30. CuiTools Target Word: Transport Transport of glutathione S-linked conjugates. Concept 1: Biological Transport (C0005528) Concept 2: Patient Transport (C0150390) Concept 1 Vector Concept 2 Vector Cosine 2 Cosine 1 Target Word Vector

  31. CuiTools Approach Our approach uses a general vector approach with SenseCluster vectors • We explore using • First-order co-occurrence vectors • Second-order co-occurrence vectors

  32. Target Word Vector Contains the words surrounding the ambiguous word Transport of glutathione S-linked conjugates.

  33. CuiTools - Concept Vectors How to create a vector that can represent the meaning of a concept for word sense disambiguation?

  34. To answer this question We explore information in the UMLS that can be used to represent the meaning of a concept.

  35. CuiTools - Concept Vectors CUI definition • Adjustment • Individual Adjustment • Conceptually broad term referring to a state of harmony between internal needs and external … • Adjustment Action • The act of making necessary corrections or modifications … • Psychological Adjustment • A state of harmony between internal needs and external demands and the processes used …

  36. CuiTools - Concept Vectors CUI definition • Blood Pressure • Blood Pressure • Force exerted by the blood on the walls of the arteries and other vessels. • Blood Pressure Determination • Actions performed to measure the diastolic and systolic pressure of the blood. • Arterial Pressure • NO DEFINTION

  37. CuiTools - Concept Vectors • CUI definition Use CUI definition but if it doesn’t exist • PARent definition • Semantic Type definition SYNonymous terms For example: C0430400: Laboratory Culture laboratory culture microbial culture sample culture

  38. CuiTools - Concept Vectors • CUI definition If CUI definition doesn’t exist • PARent definition • Semantic Type definition SYNonymous terms SIBlings For example: C0010453: Anthropological Culture archeology family social groups

  39. CuiTools - Concept Vectors • CUI definition If CUI definition doesn’t exist • PARent definition • Semantic Type definition SIBlings SYNonymous terms TOP 50 most frequent words surrounding the terms associated with the CUI

  40. Dataset • National Library of Medicine's Word Sense Disambiguation (NLM-WSD) Dataset • 50 words from the 1998 MEDLINE abstracts • 100 instances for each of the 50 words • The target word was manually assigned a UMLS concept or None • All instances of None were removed • Average number of concepts per ambiguous word is 2.26

  41. Data subsets • Humphrey subset • Humphrey, et al 2006 • 45 out of the 50 words in NLM-WSD • 5 words were excluded because at least two of the possible concepts associated with these words have the same semantic type • Instances that were assigned “None” were removed

  42. Training Data The training data used to create the 1st and 2nd order co-occurrence vectors is 2005 Medline baseline

  43. Results

  44. Results

  45. Results of Co-occurrence Vectors

  46. Results of the Representations of Meaning

  47. Results of the Representations of Meaning - CUI • Adding the parent and semantic type definitions decreased the accuracy by 6 and 7 percentage points • Parent and semantic type definitions are too broad to define the meaning of a concept

  48. Results of the Representations of Meaning - SYN • Using the synonymous terms associated with a concept is too narrow to represent the meaning. • Adjustment Action • Adjustment – action • Adjustments • Adjustment, NOS • Adjustment – action qualifier value • Adjustment – action procedure

  49. Results of the Representations of Meaning - SIB • Using the terms associated the siblings of a concept is too broad to represent the meaning. • Adjustment Action • Biopsy • Cauterisation • Cautery • Cold Therapy • Desiccation • Drainage procedure • Electrolysis

  50. Results of the Representations of Meaning

More Related