Information Filtering & Recommender Systems

Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

Short vs. Long Term Info Need • Short-term information need (Ad hoc retrieval) • “Temporary need”, e.g., info about fixing a bug • Information source is relatively static • User “pulls” information • Application example: library search, Web search,… • Long-term information need (Filtering) • “Stable need”, e.g., your hobby • Information source is dynamic • System “pushes” information to user • Applications: news filter, movie recommender, …

Part I: Content-Based Filtering(Adaptive Information Filtering)

Adaptive Information Filtering • Stable & long term interest, dynamic info source • System must make a delivery decision immediately as a document “arrives” my interest: Filtering System …

Accumulated Docs Feedback Learning A Typical AIF System User profile text Initialization Accepted Docs Binary Classifier ... User User Interest Profile Doc Source utility func • Linear Utility = 3* #good - 2 *#bad • #good (#bad): number of good (bad) documents delivered to user • Are the coefficients (3, -2) reasonable? What about (10, -1) or (1, -10)?

Three Basic Problems in AIF • Making filtering decision (Binary classifier) • Doc text, profile text  yes/no • Initialization • Initialize the filter based on only the profile text or very few examples • Learning from • Limited relevance judgments (only on “yes” docs) • Accumulated documents • All trying to maximize the utility

Extend a Retrieval System for Information Filtering • “Reuse” retrieval techniques to score documents • Use a score threshold for filtering decision • Learn to improve scoring with traditional feedback • New approaches to threshold setting and learning

A General Vector-Space Approach no doc vector Scoring Thresholding Utility Evaluation yes profile vector threshold Vector Learning Threshold Learning Feedback Information

Difficulties in Threshold Learning • Censored data (judgments only available on delivered documents) • Little/none labeled data • Exploration vs. Exploitation 36.5 Rel 33.4 NonRel 32.1 Rel 29.9 ? 27.3 ? … ... =30.0 No judgments are available for these documents

Empirical Utility Optimization • Basic idea • Compute the utility on the training data for each candidate score threshold • Choose the threshold that gives the maximum utility on the training data set • Difficulty: Biased training sample! • We can only get an upper bound for the true optimal threshold • Could a discarded item be possibly interesting to the user? • Solution: • Heuristic adjustment (lowering) of threshold

Encourage exploration up to zero  Utility  Cutoff position (descending order of doc scores) ,N 0 1 2 3 … K ...  , [0,1] The more examples, the less exploration (closer to optimal) Beta-Gamma Threshold Learning

Beta-Gamma Threshold Learning (cont.) • Pros • Explicitly addresses exploration-exploitation tradeoff (“Safe” exploration) • Arbitrary utility (with appropriate lower bound) • Empirically effective • Cons • Purely heuristic • Zero utility lower bound often too conservative

Part II: Collaborative Filtering

What is Collaborative Filtering (CF)? • Making filtering decisions for an individual user based on the judgments of other users • Inferring individual’s interest/preferences from that of other similar users • General idea • Given a user u, find similar users {u1, …, um} • Predict u’s preferences based on the preferences of u1, …, um

CF: Assumptions • Users with a common interest will have similar preferences • Users with similar preferences probably share the same interest • Examples • “interest is IR” => “favor SIGIR papers” • “favor SIGIR papers” => “interest is IR” • Sufficiently large number of user preferences are available

CF: Intuitions • User similarity (Kevin Chang vs. Jiawei Han) • If Kevin liked the paper, Jiawei will like the paper • ? If Kevin liked the movie, Jiawei will like the movie • Suppose Kevin and Jiawei viewed similar movies in the past six months … • Item similarity • Since 90% of those who liked Star Wars also liked Independence Day, and, you liked Star Wars • You may also like Independence Day The content of items “didn’t matter”!

The Collaboration Filtering Problem Ratings Objects: O o1 o2 … oj … on 3 1.5 …. … 2 2 1 3 Users: U Xij=f(ui,oj)=? u1 u2 … ui ... um The task • Assume known f values for some (u,o)’s • Predict f values for other (u,o)’s • Essentially function approximation, like other learning problems ? Unknown function f: U x O R

Memory-based Approaches • General ideas: • Xij: rating of object oj by user ui • ni: average rating of all objects by user ui • Normalized ratings: Vij = Xij – ni • Memory-based prediction of rating of object oj by user ua • Specific approaches differ in w(a,i) -- the distance/similarity between user ua and ui

User Similarity Measures • Pearson correlation coefficient (sum over commonly rated items) • Cosine measure • Many other possibilities!

Improving User Similarity Measures • Dealing with missing values: set to default ratings (e.g., average ratings) • Inverse User Frequency (IUF): similar to IDF

Summary • Filtering is “easy” • The user’s expectation is low • Any recommendation is better than none • Making it practically useful • Filtering is “hard” • Must make a binary decision, though ranking is also possible • Data sparseness (limited feedback information) • “Cold start” (little information about users at the beginning)

What you should know • Filtering, retrieval, and browsing are three basic ways of accessing information • How to extend a retrieval system to do adaptive filtering (adding threshold learning) • What is exploration-exploitation tradeoff • Know how memory-based collaborative filtering works

Information Filtering & Recommender Systems