170 likes | 267 Vues
This study explores personalized active learning for collaborative filtering, a method for making recommendations based on user interests. Active learning aims to identify user interests efficiently by selecting items for users to rate. The Bayesian approach for active learning outperforms other methods by accelerating model updates and reducing failures.
E N D
Personalized Active Learning for Collaborative Filtering SIGIR, 2008 Presented by Abhay S. Harpale, Yiming Yang Carnegie Mellon University 2009-01-22 Summarized by Jaeseok Myung Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea
Background • Collaborative Filtering • Make a recommendation for a specific user based on the judgments of users with similar interests Rating Database Active User Center for E-Business Technology
Background • Collaborative Filtering • Make a recommendation for a specific user based on the judgments of users with similar interests • Identify the training users that share similar interests as the active user Rating Database Active User Center for E-Business Technology
Background • Collaborative Filtering • Make a recommendation for a specific user based on the judgments of users with similar interests • Identify the training users that share similar interests as the active user • Predict the ratings of the active user as the average of ratings by the training user with similar interests Rating Database Active User Center for E-Business Technology
Background • Model-based Collaborative Filtering • Different from memory-based(instance-based) CF • Make the intuitive assumption that users/items can be grouped based on their interests • Aspect Model • Is a probabilistic latent semantic model in which users are considered to be a mixture of multiple interests or aspects User A training z1 z2 zk movies Rating R={1, 2, … , 5} predicting Center for E-Business Technology
Background • Active Learning • There are situations in which unlabeled data is abundant but labeling data is expensive • Aims to train classifiers/models using least amount of training/labeled data • Because obtaining labeled data can be a very costly or infeasible process • Has been extensively studied for classification, where the goal is to identify unlabeled instances to be labeled according to membership in a class Center for E-Business Technology
Problem • How to identify user’s interests based on a few rated item? • To identify the interests of the user, the system needs to ask the user to rate a few item • Given a user is only willing to rate a few items, which one should be asked to solicit rating? => Active Learning for CF Center for E-Business Technology
Active Learning for CF • Selective Sampling • Ask a user to rate the items that are most distinguishable for users’ interests • A General Strategy • Define a loss function that represents the uncertainty in determining users’ interests • Choose the item whose rating will result in the largest reduction in the loss function • Baseline Approaches • Random Selection • Entropy-based Selection Center for E-Business Technology
A Bayesian Approach for Active Learning • A Baysian approach outperforms the other approaches • Rong Jin, Luo Si, A Bayesian Approach toward Active Learning for Collaborative Filtering, UAI 2004 • This approach identifies item m, such that the updated model will be accelerated towards the true user model • Maximized when the estimated distribution is equal to the true distribution being modeled (negated KL-Divergence equation) The model posterior after retraining the user-model based on a newly obtained rating r for movie m from the user i.e. P(z|u, m, r) Since the true user model is unknown beforehand, it is estimated as the expectation over the posterior distribution of the user model The rating r is unknown for unrated movies and the expected value is used instead Center for E-Business Technology
Personalized Bayesian Selection • A Common Assumption on Existing Approaches • Users can provide rating to any item that is requested by the system => unrealistic assumption • To rate a movie, a person has to first procure the movie, and watch it • Personalized Bayesian Selection 5 Failure ? I haven’t seen the movie The probability of getting a rating, on the item m from the user u Center for E-Business Technology
Experimental Setup Constrained Setup Unconstrained Setup Center for E-Business Technology
Evaluation Metrics • Mean Absolute Error (MAE) • # of Failures • The system solicits ratings for movies from the user and the user may not provide ratings for some of them • The system cannot be re-trained and wastes an active-learning cycle and proceeds to the next iteration The evaluation set for the user u The set of test users Center for E-Business Technology
Constrained Setup • The active-selection set is constrained to rated items (unrealistic!) Center for E-Business Technology
Unconstrained Setup • Personalized Bayesian Selection outperforms other approaches • BS performs even worse as compared to plain RS Center for E-Business Technology
# of Failures • The most informative items may not be rated by the user • PBS substantially reduces the number of failures Center for E-Business Technology
Summary • Collaborative Filtering • To obtain the preference of new users, the system asks users to rate some items • Users would not like a system which solicits ratings for movies the user may not even watch and such a dialog can be frustrating • Active Learning for Collaborative Filtering • The system would like to understand the user preference with the least amount of training examples • Existing approaches are not realistic • Personalized Active Learning for Collaborative Filtering • Considering the probability of getting a rating from the user • Good performance Center for E-Business Technology
Paper Evaluation • Pros • Good & clear idea • Tackles to a common assumption • A new evaluation metric • # of failures • Cons • No examples Center for E-Business Technology