Computational Lexical Semantics: Knowledge-based Word Sense Disambiguation and Lesk Algorithm

Chapter 20Part 2 Computational Lexical Semantics Acknowledgements: these slides include material from Rada Mihalcea, Ray Mooney, Katrin Erk, and Ani Nenkova 1

Knowledge-based WSD • Task definition • Knowledge-based WSD = class of WSD methods relying (mainly) on knowledge drawn from dictionaries and/or raw text • Resources • Yes • Machine Readable Dictionaries • Raw corpora • No • Manually annotated corpora

Machine Readable Dictionaries • In recent years, most dictionaries made available in Machine Readable format (MRD) • Oxford English Dictionary • Collins • Longman Dictionary of Ordinary Contemporary English (LDOCE) • Thesauruses – add synonymy information • Roget Thesaurus • Semantic networks – add more semantic relations • WordNet • EuroWordNet

WordNet definitions/examples for the noun plant • buildings for carrying on industrial labor; "they built a large plant to manufacture automobiles” • a living organism lacking the power of locomotion • something planted secretly for discovery by another; "the police used a plant to trick the thieves"; "he claimed that the evidence against him was a plant" • an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience MRD – A Resource for Knowledge-based WSD • For each word in the language vocabulary, an MRD provides: • A list of meanings • Definitions (for all word meanings) • Typical usage examples (for most word meanings)

MRD – A Resource for Knowledge-based WSD • A thesaurus adds: • An explicit synonymy relation between word meanings • A semantic network adds: • Hypernymy/hyponymy (IS-A), meronymy/holonymy (PART-OF), antonymy, etc. WordNet synsets for the noun“plant” 1. plant, works, industrial plant 2. plant, flora, plant life WordNet related concepts for the meaning “plant life” {plant, flora, plant life} hypernym: {organism, being} hypomym: {house plant}, {fungus}, … meronym: {plant tissue}, {plant part} member holonym: {Plantae, kingdom Plantae, plant kingdom}

Lesk Algorithm • (Michael Lesk 1986): Identify senses of words in context using definition overlap. That is, disambiguate more than one word. • Algorithm: • Retrieve from MRD all sense definitions of the words to be disambiguated • Determine the definition overlap for all possible sense combinations • Choose senses that lead to highest overlap Example: disambiguate PINE CONE • PINE 1. kinds of evergreen tree with needle-shaped leaves 2. waste away through sorrow or illness • CONE 1. solid body which narrows to a point 2. something of this shape whether solid or hollow 3. fruit of certain evergreen trees Pine#1  Cone#1 = 0 Pine#2  Cone#1 = 0 Pine#1  Cone#2 = 1 Pine#2  Cone#2 = 0 Pine#1  Cone#3 = 2 Pine#2  Cone#3 = 0

Lesk Algorithm for More than Two Words? • I saw a man who is 98 years old and can still walk and tell jokes • nine open class words: see(26), man(11), year(4), old(8), can(5), still(4), walk(10), tell(8), joke(3) • 43,929,600 sense combinations! How to find the optimal sense combination? • Simulated annealing (Cowie, Guthrie, Guthrie 1992) • Let’s review (from CS1571)

Search Types • Backtracking state-space search • Local Search and Optimization • Constraint satisfaction search • Adversarial search

Local Search • Use a single current state and move only to neighbors. • Use little space • Can find reasonable solutions in large or infinite (continuous) state spaces for which the other algorithms are not suitable

Optimization • Local search is often suitable for optimization problems. Search for best state by optimizing an objective function.

Visualization • States are laid out in a landscape • Height corresponds to the objective function value • Move around the landscape to find the highest (or lowest) peak • Only keep track of the current states and immediate neighbors

Simulated Annealing • Based on a metallurgical metaphor • Start with a temperature set very high and slowly reduce it.

Simulated Annealing • Annealing: harden metals and glass by heating them to a high temperature and then gradually cooling them • At the start, make lots of moves and then gradually slow down

Simulated Annealing • More formally… • Generate a random new neighbor from current state. • If it’s better take it. • If it’s worse then take it with some probabilityproportional to the temperature and the delta between the new and old states.

Simulated annealing • Probability of a move decreases with the amount ΔE by which the evaluation is worsened • A second parameter T isalso used to determine the probability: high Tallows more worse moves, Tclose to zero results in few or no bad moves • Scheduleinput determines the value of Tas a function of the completed cycles

function Simulated-Annealing(problem, schedule) returns a solution state inputs: problem, a problem schedule, a mapping from time to “temperature” current ← Make-Node(Initial-State[problem]) for t ← 1 to ∞ do T ← schedule[t] ifT=0 then return current next ← a randomly selected successor of current ΔE ← Value[next] – Value[current] if ΔE > 0 then current ← next else current ← next only with probability eΔE/T

Intuitions • the algorithm wanders around during the early parts of the search, hopefully toward a good general region of the state space • Toward the end, the algorithm does a more focused search, making few bad moves

Lesk Algorithm for More than Two Words? • I saw a man who is 98 years old and can still walk and tell jokes • nine open class words: see(26), man(11), year(4), old(8), can(5), still(4), walk(10), tell(8), joke(3) • 43,929,600 sense combinations! How to find the optimal sense combination? • Simulated annealing (Cowie, Guthrie, Guthrie 1992) • Given: W, set of words we are disambiguating • State: One sense for each word in W • Neighbors of state: the result of changing one word sense • Objective function: value(state) • Let DWs(state) be the words that appear in the union of the definitions of the senses in state; • value(state) = sum over words in DWs(state): # times it appears in the union of the definitions of the senses • The value will be higher, the more words appear in multiple definitions. • Start state: the most frequent sense of each word

Lesk Algorithm: A Simplified Version • Original Lesk definition: measure overlap between sense definitions for all words in the text • Identify simultaneously the correct senses for all words in the text • Simplified Lesk (Kilgarriff & Rosensweig 2000): measure overlap between sense definitions of a word and its context in the text • Identify the correct sense for one word at a time • Search space significantly reduced (the context in the text is fixed for each word instance)

Lesk Algorithm: A Simplified Version • Algorithm for simplified Lesk: • Retrieve from MRD all sense definitions of the word to be disambiguated • Determine the overlap between each sense definition and the context of the word in the text • Choose the sense that leads to highest overlap Example: disambiguate PINE in “Pine cones hanging in a tree” • PINE 1. kinds of evergreen tree with needle-shaped leaves 2. waste away through sorrow or illness Pine#1  Sentence = 1 Pine#2  Sentence = 0

Selectional Preferences • A way to constrain the possible meanings of words in a given context • E.g. “Wash a dish” vs. “Cook a dish” • WASH-OBJECT vs. COOK-FOOD • Alternative terminology • Selectional Restrictions • Selectional Preferences • Selectional Constraints

Acquiring Selectional Preferences • From raw corpora • Frequency counts • Information theory measures

Preliminaries: Learning Word-to-Word Relations • An indication of the semantic fit between two words • 1. Frequency counts (in a parsed corpus) • Pairs of words connected by a syntactic relations • 2. Conditional probabilities • Condition on one of the words

Learning Selectional Preferences • Word-to-class relations (Resnik 1993) • Quantify the contribution of a semantic class using all the senses subsumed by that class (e.g., the class is an ancestor in WordNet)

Using Selectional Preferences for WSD • Algorithm: • Let N be a noun that stands in relationship R to predicate P. Let s1…sk be its possible senses. • For i from 1 to k, compute: • Ci = {c |c is an ancestor of si} • Ai = max for c in Ci A(P,c,R) • Ai is the score for sense i. Select the sense with the highest score. • For example: Letter has 3 senses in WordNet (written message; varsity letter; alphabetic character) and belongs to 19 classes in all. • Suppose we have predicate “write”. For each sense, calculate a score, by measuring association of “write” & direct object, with each ancestor of that sense.

Computational Lexical Semantics: Knowledge-based Word Sense Disambiguation and Lesk Algorithm

Computational Lexical Semantics: Knowledge-based Word Sense Disambiguation and Lesk Algorithm

Presentation Transcript

Chapter 2 part 2

Chapter 20, part B

Chapter 20 Part 2

Ch 20 Part 2

Chapter 20 Part 1

Chapter 20, part 2

Chapter 20 part 2:

Sheet Metalworking Chapter 20- Part 2

Chapter 20 Notes: Part I

Chapter 20, part 3

Chapter 2 Part 2

Chapter 2 Part 2

Chapter 2 - Part 2

Chapter 20, part B

Chapter 2, part 2

Topic 20 Arrays part 2

Chapter 2 Part 2

Chapter 20, part 4

Chapter 20, part 1

Chapter 2- Part 2