Adaptive Information Filtering

Adaptive Information Filtering
Lanbo Zhang (ISSDM fellow) Yi Zhang (UCSC advisor) Carla Kuiken (LANL mentor)

Outline Introduction Our Research Interactive Retrieval Based on Faceted Feedback (SIGIR 2010) Discriminative Factored Prior Models for Personalized Content-Based Recommendation (CIKM 2010) Future Work Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Why Filtering? In some cases, users want to persistently track certain kinds of information on the Internet CDC (Centers for Disease Control and Prevention) personnel News reports about H1N1 Physicians New treatments of a disease FBI investigators Potential terrorist threats Financial analysts News that may influence a stock For these tasks, search engines that require users to actively issue the queries are not enough We need an intelligent system that can PUSH our desired information to us whenever it is available! Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Adaptive Information Filtering The central task Identify the relevant documents from a document stream Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

The Cold-Start Problem The filtering performance for new users is usually bad due to a lack of enough training data (user feedback) from these users We follow two directions to handle this problem Explore new user interaction mechanisms to encourage more user feedback Research advanced filtering models that can borrow information for new users Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Outline Introduction Our Research Direction 1: A New User Feedback Mechanism Faceted Feedback Direction 2: A New Filtering Model Discriminative Factored Prior Model Future Work Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Semi-Structured Documents Semi-structured documents with metadataare proliferating on the Internet Authors, Topic, Publisher, Created Time, etc. Metadata might be useful for filtering Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Human assigned metadata Algorithm generated metadata From New York Times

Definitions Facet Each metadata field is called a facet E.g., Date, Topic, Location, Author, etc. Facet-Value Pair A metadata field with a specific value is called a facet-value pair E.g., Publisher = New York Times Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Faceted Feedback Traditional User Feedback Mechanism Allows users to provide feedback on the relevance of documents Doc1 Relevant Doc2 Non-relevant Faceted Feedback Allows users to provide feedback on facet-value pairs Each facet-value pair represents a constraint on the desired documents Topic = FIFA World Cup Yes Year = 2010 Yes Year = 2006 No Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Why Faceted Feedback Users may have clear ideas on some facets of the target documents “FIFA World Cup” Year = 2010 May encourage user feedback Facet-value pairs are short and easy to understand Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Research Questions Question 1 How to select a small number of facet-value pair candidates? Question 2 How to make use of faceted feedback? Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Q1: Facet-Value Pair Selection Four approaches to rank facet-value pairs Top Document Frequency (TDF) Frequency in the top N ranked documents TDF*IDF (Inverse Document Frequency) Query Likelihood (QL) P(q|f=v) TDF+QL TDF: P(f=v|q) QL: P(q|f=v) Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Q2: How to Use Faceted Feedback? The commonly used method Boolean Model Problem with Boolean Model Document metadata is not perfect Inaccurate / incomplete This may badly hurt the retrieval performance Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

The Soft Model The basic idea Rewarding documents with user-identified facet-value pairs by adding a certain number of credits The number of credits for each facet are learnt on training queries Score(d) = original score + rewards for facet match Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Experimental Settings Datasets OHSUMED + Queries from TREC (Text REtrieval Conference) 2000 filtering track 348,566 medical articles, 63 queries RCV1 + Queries from TREC 2002 filtering track ~810,000 news articles from Reuters, 50 queries User Study We collected user faceted feedback on Amazon Mechanical Turk Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Chosen Facets MeSH(Medical Subject Headlines) OHSUMED Region RCV1 Industry Topic Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Experimental Results: Overall Performance of Faceted Feedback Faceted feedback significantlyimprovesthe retrieval performance Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Experimental Results:Boolean Models vs. Soft Model OHSUMED RCV1 The Boolean models don’t work well or even hurt, while the soft model always performs well Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Outline Introduction Our Research Direction 1: A New User Feedback Mechanism Faceted Feedback Direction 2: A New Filtering Model Discriminative Factored Prior Model Future Work Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Existing Filtering Approaches Two categories Retrieval models + threshold setting methods Rocchio, BM25, Language Models, etc. Standard machine learning models for binary text classification Naïve Bayes, logistic regression, SVM, neural networks, etc. Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Characteristics of User Interests For example, User 1: “Sports”, “Technology” User 2: “Sports”, “Politics”, “Shopping” User 3: “Politics”, “Technology”, “Travel” Characteristics A single user may have multiple interests Different users may have overlapped interests Existing filtering approaches don’t explicitly capture these characteristics Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Discriminative Factored Prior Models (DFPM) The variance matrix The hidden factor matrix The feature vector of the j-th training document of user m The profile/classifier of user m The label of the j-th training document of user m The hidden vector of user m

Advantages As discriminative models, our models can incorporate any kinds of features Textual features (words) Semantic features (very useful) Topic = Lung Cancer Source = Cancer Cause and Control Borrow information from other users when learning profiles for new users All user profiles share a common hidden factor matrix Capture a single user’s multiple interests Each user profile follows a factored prior distribution Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Parameter Estimation Assume is diagonal and all entries are equivalent to a constant value c1, then

Optimization Use an EM-like iterative algorithm to solve the above optimization problem 1: Initialize 2: Conjugate gradient decent 3: Close form solution!

Experimental Settings Dataset Collected from Digg.com, where users can “digg” their interested news articles to promote their rankings 15,162 users, 251 relevant documents per user Details 80%(training), 10%(validation), 10%(test) Words as features: 35,865 (TFIDF score) Metrics: Precision, Recall, Macro-F1 Baselines L-2 normalized Logistic Regression (L2LR) Learns user profile separately without borrowing information The standard Bayesian Hierarchical model with Logistic Regression (BHLR) Uses a standard prior Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Performance Comparison Our models outperform the baselines significantly Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Outline Introduction Our Research A New User Interaction Mechanism Faceted Feedback A New Filtering Approach Discriminative Factored Prior Model Future Work Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Future Work Active learning on facet-value pair selection To maximize learning benefits Integrating multiple types of user feedback Feedback on documents Feedback on facets … Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Thanks! Comments & Questions ? Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)

Adaptive Information Filtering

Adaptive Information Filtering

Presentation Transcript

Adaptive Information Filtering

Information Filtering

Normalised Least Mean-Square Adaptive Filtering

Information Filtering

Least Mean-Square Adaptive Filtering

Adaptive Median filtering of Still Images

Information Filtering / Personalization

Adaptive Filtering: One Year On

Adaptive Filtering of Raster Map Images

Adaptive ECG Filtering

ADAPTIVE MORPHOLOGICAL FILTERING FOR DEM GENERATION

Information Filtering

A Review of Information Filtering Part I: Adaptive Filtering

Multirate Adaptive Filtering

Information Filtering

Information Filtering

Adaptive Information Cluster

Dynamic information filtering

The TREC-9 Adaptive Filtering track

Information Filtering

Information Filtering

Adaptive Optics with Adaptive Filtering and Control Steve Gibson

3.7 Adaptive filtering