Lecture 20: Lexical Relations & WordNet

Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002 http://www.sims.berkeley.edu/academics/courses/is202/f02/ Lecture 20: Lexical Relations & WordNet SIMS 202: Information Organization and Retrieval

Lecture Overview • Review • Probabilistic Models of IR • Relevance Feedback • Lexical Relations • WordNet • Can Lexical and Semantic Relations be Exploited to Improve IR? Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Probability Ranking Principle • If a reference retrieval system’s response to each request is a ranking of the documents in the collections in the order of decreasing probability of usefulness to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data has been made available to the system for this purpose, then the overall effectiveness of the system to its users will be the best that is obtainable on the basis of that data. Stephen E. Robertson, J. Documentation 1977

Probabilistic Models: Some Unifying Notation • D = All present and future documents • Q = All present and future queries • (Di,Qj) = A document query pair • x = class of similar documents, • y = class of similar queries, • Relevance (R) is a relation:

Probabilistic Models • Model 1 -- Probabilistic Indexing, P(R|y,Di) • Model 2 -- Probabilistic Querying, P(R|Qj,x) • Model 3 -- Merged Model, P(R| Qj, Di) • Model 0 -- P(R|y,x) • Probabilities are estimated based on prior usage or relevance estimation

Probabilistic Models Q D y Qj x Di

Logistic Regression • Another approach to estimating probability of relevance • Based on work by William Cooper, Fred Gey and Daniel Dabney • Builds a regression model for relevance prediction based on a set of training data • Uses less restrictive independence assumptions than Model 2 • Linked Dependence

Logistic Regression 100 - 90 - 80 - 70 - 60 - 50 - 40 - 30 - 20 - 10 - 0 - Relevance 0 10 20 30 40 50 60 Term Frequency in Document

Logistic Regression • Probability of relevance is based on Logistic regression from a sample set of documents to determine values of the coefficients • At retrieval the probability estimate is obtained by: • For the 6 X attribute measures shown previously

Relevance Feedback in an IR System Storage Line Interest profiles & Queries Documents & data Information Storage and Retrieval System Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Formulating query in terms of descriptors Indexing (Descriptive and Subject) Storage of profiles Storage of Documents Store1: Profiles/ Search requests Store2: Document representations Comparison/ Matching Potentially Relevant Documents Selected relevant docs

Relevance Feedback • Main Idea: • Modify existing query based on relevance judgements • Extract terms from relevant documents and add them to the query • And/or re-weight the terms already in the query • Two main approaches: • Automatic (pseudo-relevance feedback) • Users select relevant documents • Users/system select terms from an automatically-generated list

Rocchio/Vector Illustration Information 1.0 D1 Q’ 0.5 Q0 Q” D2 0 0.5 1.0 Retrieval Q0 = retrieval of information = (0.7,0.3) D1 = information science = (0.2,0.8) D2 = retrieval systems = (0.9,0.1) Q’ = ½*Q0+ ½ * D1 = (0.45,0.55) Q” = ½*Q0+ ½ * D2 = (0.80,0.20)

Alternative Notions of Relevance Feedback • Find people whose taste is “similar” to yours • Will you like what they like? • Follow a users’ actions in the background • Can this be used to predict what the user will want to see next? • Track what lots of people are doing • Does this implicitly indicate what they think is good and not good?

Alternative Notions of Relevance Feedback • Several different criteria to consider: • Implicit vs. Explicit judgements • Individual vs. Group judgements • Standing vs. Dynamic topics • Similarity of the items being judged vs. similarity of the judges themselves

Syntax • The syntax of a language is to be understood as a set of rules which accounts for the distribution of word forms throughout the sentences of a language • These rules codify permissible combinations of classes of word forms

Semantics • Semantics is the study of linguistic meaning • Two standard approaches to lexical semantics (cf., sentential semantics; and, logical semantics): • (1) compositional • (2) relational

Lexical Semantics: Compositional Approach • Compositional lexical semantics, introduced by Katz & Fodor (1963), analyzes the meaning of a word in much the same way a sentence is analyzed into semantic components. The semantic components of a word are not themselves considered to be words, but are abstract elements (semantic atoms) postulated in order to describe word meanings (semantic molecules) and to explain the semantic relations between words. For example, the representation of bachelor might be ANIMATE and HUMAN and MALE and ADULT and NEVER MARRIED. The representation of man might be ANIMATE and HUMAN and MALE and ADULT; because all the semantic components of man are included in the semantic components of bachelor, it can be inferred that bachelor  man. In addition, there are implicational rules between semantic components, e.g. HUMAN  ANIMATE, which also look very much like meaning postulates. • George Miller, “On Knowing a Word,” 1999

Lexical Semantics: Relational Approach • Relational lexical semantics was first introduced by Carnap (1956) in the form of meaning postulates, where each postulate stated a semantic relation between words. A meaning postulate might look something like dog  animal (if x is a dog then x is an animal) or, adding logical constants, bachelor  man and never married [if x is a bachelor then x is a man and not(x has married)] or tall  not short [if x is tall then not(x is short)]. The meaning of a word was given, roughly, by the set of all meaning postulates in which it occurs. • George Miller, “On Knowing a Word,” 1999

Pragmatics • Deals with the relation between signs or linguistic expressions and their users • Deixis (literally “pointing out”) • E.g., “I’ll be back in an hour” depends upon the time of the utterance • Conversational implicature • A: “Can you tell me the time?” • B: “Well, the milkman has come.” [I don’t know exactly, but perhaps you can deduce it from some extra information I give you.] • Presupposition • “Are you still such a bad driver?” • Speech acts • Constatives vs. performatives • E.g., “I second the motion.” • Conversational structure • E.g., turn-taking rules

Language • Language only hints at meaning • Most meaning of text lies within our minds and common understanding • “How much is that doggy in the window?” • How much: social system of barter and trade (not the size of the dog) • “doggy” implies childlike, plaintive, probably cannot do the purchasing on their own • “in the window” implies behind a store window, not really inside a window, requires notion of window shopping

Semantics: The Meaning of Symbols • Semantics versus Syntax • add(3,4) • 3 + 4 • (different syntax, same meaning) • Meaning versus Representation • What a person’s name is versus who they are • A rose by any other name... • What the computer program “looks like” versus what it actually does

Semantics • Semantics: Assigning meanings to symbols and expressions • Usually involves defining: • Objects • Properties of objects • Relations between objects • More detailed versions include • Events • Time • Places • Measurements (quantities)

The Role of Context • The concept associated with the symbol “21” means different things in different contexts • Examples? • The question “Is there any salt?” • Asked of a waiter at a restaurant • Asked of an environmental scientist at work

What’s In a Sentence? “A sentence is not a verbal snapshot or movie of an event. In framing an utterance, you have to abstract away from everything you know, or can picture, about a situation, and present a schematic version which conveys the essentials. In terms of grammatical marking, there is not enough time in the speech situation for any language to allow for the marking of everything which could possibly be significant to the message.” Dan Slobin, in Language Acquisition: The state of the art, 1982

Lexical Relations • Conceptual relations link concepts • Goal of Artificial Intelligence • Lexical relations link words • Goal of Linguistics

Major Lexical Relations • Synonymy • Polysemy • Metonymy • Hyponymy/Hyperonymy • Meronymy • Antonymy

Different ways of expressing related concepts Examples cat, feline, Siamese cat Overlaps with basic and subordinate levels Synonyms are almost never truly substitutable: Used in different contexts Have different implications This is a point of contention Synonymy

Most words have more than one sense Homonym: same word, different meaning bank (river) bank (financial) Polysemy: different senses of same word That dog has floppy ears. She has a good ear for jazz. bank (financial) has several related senses the building, the institution, the notion of where money is stored Polysemy

Use one aspect of something to stand for the whole The building stands for the institution of the bank. Newscast: “The White House released new figures today.” Waitperson: “The ham sandwich spilled his drink.” Metonymy

Hyponymy/Hyperonymy • ISA relation • Related to Superordinate and Subordinate level categories • hyponym(robin,bird) • hyponym(bird,animal) • hyponym(emu,bird) • A is a hypernym of B if B is a type of A • A is a hyponym of B if A is a type of B

Basic-Level Categories (review) • Brown 1958, 1965, Berlin et al., 1972, 1973 • Folk biology: • Unique beginner: plant, animal • Life form: tree, bush, flower • Generic name: pine, oak, maple, elm • Specific name: Ponderosa pine, white pine • Varietal name: Western Ponderosa pine • No overlap between levels • Level 3 is basic • Corresponds to genus • Folk biological categories correspond accurately to scientific biological categories only at the basic level

SUPERORDINATE animal furniture BASIC LEVEL dog chair SUBORDINATE terrier rocker Children take longer to learn superordinate Superordinate not associated with mental images or motor actions Psychologically Primary Levels

Meronymy • Parts-of relation • part of(beak, bird) • part of(bark, tree) • Transitive conceptually but not lexically: • The knob is a part of the door. • The door is a part of the house. • ? The knob is a part of the house ?

Antonymy • Lexical opposites • antonym(large, small) • antonym(big, small) • antonym(big, little) • but not large, little • Many antonymous relations can be reliably detected by looking for statistical correlations in large text collections. (Justeson &Katz 91)

Polysemy: Same word, different senses of meaning Slightly different concepts expressed similarly Synonyms: Different words, related senses of meanings Different ways to express similar concepts Thesauri help draw all these together Thesauri also commonly define a set of relations between terms that is similar to lexical relations BT, NT, RT Thesauri and Lexical Relations

What is an Ontology? • From Merriam-Webster’s Collegiate: • A branch of metaphysics concerned with the nature and relations of being • A particular theory about the nature of being or the kinds of existence • More prosaically: • A carving up of the world’s meanings • Determine what things exist, but not how they inter-relate • Related terms: • Taxonomy, dictionary, category structure • Commonly used now in CS literature to describe structures that function as Thesauri

WordNet • Started in 1985 by George Miller, students, and colleagues at the Cognitive Science Laboratory, Princeton University • Can be downloaded for free: • www.cogsci.princeton.edu/~wn/ • “In terms of coverage, WordNet’s goals differ little from those of a good standard college-level dictionary, and the semantics of WordNet is based on the notion of word sense that lexicographers have traditionally used in writing dictionaries. It is in the organization of that information that WordNet aspires to innovation.” • (Miller, 1998, Chapter 1)

Presuppositions of WordNet project • Separability hypothesis: T • The lexical component of language can be separated and studied in its own right • Patterning hypothesis: • People have knowledge of the systematic patterns and relations between word meanings • Comprehensiveness hypothesis: • Computational linguistics programs need a store of lexical knowledge that is as extensive as that which people have

POS Unique Synsets Strings Noun 107930 74488 Verb 10806 12754 Adjective 21365 18523 Adverb 4583 3612 Totals 144684 109377 WordNet: Size WordNet Uses “Synsets” – sets of synonymous terms

Structure of WordNet

Unique Beginners • Entity, something • (anything having existence (living or nonliving)) • Psychological_feature • (a feature of the mental life of a living organism) • Abstraction • (a general concept formed by extracting common features from specific examples) • State • (the way something is with respect to its main attributes; "the current state of knowledge"; "his state of health"; "in a weak financial state") • Event • (something that happens at a given place and time)

Unique Beginners • Act, human_action, human_activity • (something that people do or cause to happen) • Group, grouping • (any number of entities (members) considered as a unit) • Possession • (anything owned or possessed) • Phenomenon • (any state or process known through the senses rather than by intuition or reasoning)

WordNet Usage • Available online (from Unix) if you wish to try it… • Login to irony and type “wn word” for any word you are interested in • Demo…

Lexical Relations and IR • Recall that most IR research has primarily looked at statistical approaches to inferring the topicality or meaning of documents • I.e., Statistics imply Semantics • Is this really true or correct? • How has (or might) WordNet be used to provide more functionality in searching? • What about other thesauri, classification schemes and ontologies?

Lecture 20: Lexical Relations & WordNet