RELEVANCE in information science

RELEVANCE in information science Tefko Saracevic, PhD tefkos@rutgers.edu http://comminfo.rutgers.edu/~tefko/articles.htm Tefko Saracevic

Fundamental concepts • Relevance is a fundamental concept or notion in information science Every scholarly field has a fundamental, basic notion, concept, idea ... or a few

Two large questions* Why?(Part I) What? (Part II) What did we learn about relevance through research in information science? • Why did relevance become a central notion of information science? * URLs and references are in Notes – accessible after download

Relevance definitions “1:a: relation to the matter at hand (emphasis added)b: practical and especially social applicability :pertinence <giving relevance to college courses> 2:the ability (as of an information retrieval system) to retrieve material that satisfies the needs of the user.”

What is “matter at hand”? • Context in relation to which • a question is asked • an information need is expressed as a query • a problem is addressed interaction is conducted • No such thing as relevance without a context • Axiom: One cannot not have a context in information interaction. Relevance is ALWAYS contextual

Relevance – by any other name... • Many names connote relevance e.g.: pertinent; useful; applicable; significant; germane; material; bearing; proper; related; important; fitting; suited; apropos; ... & nowadays even truthful • Connotations may differ but the concept is still relevance "A rose by any other name would smell as sweet“ Shakespeare, Romeo and Juliet

Two worlds in information science IR systems offer as answers their version of what may be relevant • by ever improving algorithms People go their way & asses relevance • by their problem at hand, context & criteria The two worlds interact • Covered here: human world of relevance • NOT covered: how IR deals with relevance

Part I Why relevance?

Bit of history • Vannevar Bush: Article “As we may think” 1945 • Defined the problem as “... the massive task of making more accessible of a bewildering store of knowledge.” • problem still with us & growing • Suggested a solution, a machine:“Memex ... association of ideas ... duplicate mental processes artificially.” • Technological fix to problem 1890-1974

Information Retrieval (IR) –definition • Term “information retrieval” coined & defined by Calvin Mooers, 1951 “ IR: ... intellectual aspects of description of information, ... and its specification for search ... and systems, technique, or machines...[to provide information] useful to user” 1919-1994

Technological determinant • In IR emphasis was not only on organization but even more on searching • technology was suitable for searching • in the beginning information organization was done by people & searching by machines • nowadays information organization mostly by machines (sometimes by humans as well) & searching almost exclusively by machines

Two important pioneers • at IBM pioneered many IR computer applications • first to describe searching using Venn diagrams • at Documentation Inc. pioneered coordinate indexing • first to describe searching as Boolean algebra Mortimer Taube1910-1965 Hans Peter Luhn 1896-1964

Searching & relevance • Searching became a key component of information retrieval • extensive theoretical & practical concern with searching • technology uniquely suitable for searching • And searching is about retrieval of relevant answers Thus RELEVANCE emerged as a key notion

Aboutness in librarianship • Key notion for bibliographic classifications, subject headings, indexing languages • used in organizing inf. records – goes back centuries • choice of a given classification code, subject heading, index term ... denotes what a document (or part) is all about • Searching is assumed but not addressed • a given, taken for granted

A bit of history – assumptions related to searching • IFLA 1998, 2009, defined FRBR (Functional Requirements for Bibliographic Records) • “four generic user tasks ... in relation to the elementary uses that are made of the data by the user: ...Find, Identify, Select, Obtain” • essentially the same as Cutter’s • In “Rules for Dictionary Catalog” (1876, 1904) defined “Objects” – objectives of a catalog – “to enable a person to find...to show what a library has ... to assist in choice ...” Charles Ammi Cutter 1837-1903

Why relevance? Aboutness Relevance A fundamental notion related to searching for information Relates to problem-at-hand and context & in a broader sense to pragmatism • A fundamental notion related to organization of information • Relates to subject & in a broader sense to epistemology Relevance emerged as a central notion in information science because of practical & theoretical concerns with searching

Part II What have we learned about relevance?

Claims & counterclaims in IR • Historically from the outset: “My system is better than your system!” • Well, which one is it? A: Lets test it. But: • what criterion to use? • what measure(s) based on the criterion? • Things got settled by the end of 1950’s and remain mostly the same to this day

Relevance & IR testing • In 1955 Allen Kent & James W. Perry were first to propose two measures for test of IR systems: • “relevance” later renamed “precision” & “recall” • A scientific & engineering approach to testing Allen Kent 1921 - James W. Perry 1907-1971

Relevance as criterion for measures Precision Recall Probability that what is relevant in a file is retrieved conversely: how much relevant stuff is missed? • Probability that what is retrieved is relevant • conversely: how much junk is retrieved? • Probability of agreement between what the system retrieved/not retrieved as relevant (systems relevance) & what the user assessed as relevant (user relevance)where user relevance is the gold standard for comparison

First test – law of unintended consequences Results: • Mid 1950’s test of two competing systems: • subject headings by Armed Services Tech Inf Agency • uniterms (keywords) by Documentation Inc. • 15,000 documents indexed by each group, 98 questions searched • but relevance judged by each group separately • First group: 2,200 relevant • Second: 1,998 relevant • but low agreement • Then peace talks • but even after these talks agreement came to 30.9% • Test collapsed on relevance disagreements Learned: Never, ever use more than a single judge per query. Since then to this day IR tests don’t

Cranfield tests 1957-1967 Cyril Cleverdon 1914-1997 • Funded by NSF • Controlled testing: different indexing languages, same documents, same relevance judgment • Used traditional IR model – non-interactive • Many results, some surprising • e.g. simple keywords “high ranks on many counts” • Developed Cranfield methodology for testing • Still in use today incl. in TREC started in 1992, still strong in 2013

Tradeoff in recall vs. precision Cleverdon’s law • Generally, there is a tradeoff: • recall can be increased by retrieving more but precision decreases • precision can be increased by being more specific but recall decreases • Some users want high precision others high recall Example from TREC:

Relevance experiments • First experiments reported in 1960 & 61 • by an IBM group • compared effects on relevance judgements of various representations • Over the years about 300 or so experiments • Little funding • only two funded by a US agency (1967) • A variety of factors in human judgments of relevance addressed

Assumptions in Cranfield methodology • IR and thus relevance is static(traditional IR model) • Further: Relevance is: • topical • binary • independent • stable • consistent • if pooling: complete • Inspired relevance experimentation on every one of these assumptions • Main finding:none of them holds but these simplified assumptions enabled rich IR tests and many improvements

IR & relevance: static vs. dynamic Q: Do relevance inferences & criteria change over time for the same user & task? A:They do • For a given task, user’s inferences are dependent on the stage of the task:Different stages = differing selections but different stages = similar criteria = different weightsIncreased focus = increased discrimination = more stringent relevance inferences IR & relevance inferences are highly dynamic processes

Experimental results

Experimental results (cont.)

Other experiments: Clues - on what basis & criteria users make relevance judgments?

Matching - on what basis & criteria users make relevance judgments to match their context?

Major general finding & conclusion from relevance experiments Relevance is measurable became part of general experimentation related to human information behavior

In conclusion • Information technology & systems will change dramatically • even in the short run • and in unforeseeable directions • But relevance is here to stay! and relevance has many faces – some unusual

...... different technology...

and relevance in its use

Unusual [relevant] services: Library therapy dogs U Michigan, Ann Arbor, Shapiro Library

Seed lending at public libraries

Thank you Gracias Merci Hvala Obrigado Thank you for inviting me! Grazie

Presentation in Wordle

RELEVANCE in information science

RELEVANCE in information science

Presentation Transcript

“Relevance of market information to farmer entrepreneurship”

Relevance Models In Information Retrieval

Information Technology in Science Instruction

Information Technology in Science Instruction

Relevance

Relevance

Information science in journalism

The Relevance of Social Science in Paediatrics

INFORMATION RELEVANCE Making Information Meaningful

RELEVANCE

Entanglement in Quantum Information Science

Emphasis in Geographic Information Science

“An Experimental Study of Self-Relevance in Information Processing”

Societal relevance of climate science

Relevance

Relevance

Information Science in International Perspective

Relevance in ISR

RELEVANCE in information science