Advancements in Recommender Systems: A Comprehensive Survey of Methods and Techniques
This paper surveys the state-of-the-art in recommender systems, outlining key methodologies such as content-based, collaborative, and hybrid approaches. It addresses the recommendation problem, focusing on estimating user preferences for unseen items based on prior ratings and other available information. The advantages and disadvantages of each method, including issues like user profiling, overspecialization, and the new user challenge, are discussed. Emphasizing the impact of effective recommender systems, the research highlights their role in enhancing user experience and driving sales in online platforms.
Advancements in Recommender Systems: A Comprehensive Survey of Methods and Techniques
E N D
Presentation Transcript
Toward the Next Generation of Recommender Systems: A Survey of theState-of-the-Art and Possible Extensions IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005 Applied Algorithm Lab WooramHeo
Outline • Recommemder Systems • Problem statement • Survey of Recommender systems • Content-Based Methods • Collabolative Methods • Hybrid Methods
Recommender Systems • Systems for recommending items (e.g. books, movies, CD’s, web pages, newsgroup messages) to users based on examples of their preferences. • Many on-line stores provide recommendations (e.g. Amazon, CDNow). • Recommenders have been shown to substantially increase sales at on-line stores.
Recommender Systems • Examples
Problem statement • Recommendation problem is to estimate ratings for the items that have not been seen by a user • Estimation is usually based on the ratings given by the user to other items and on some other information
Problem statement • : the set of all users • : the set of all possible items that can be recommended • : , where is a nonnegative integers or real numbers within certain range • For each user , we want to choose such item that maximizes the user’s utility. • Utility needs to be extrapolated to the whole space
Recommender System Categories • Content-based recommendations • The user will be recommended items similar to the ones the user preferred in the past • Collaborative recommendations • The user will be recommended items that people with similar tastes and preferences liked in the past • Hybrid approaches • These methods combine collaborative and content-based methods
Content-Based Methods • Recommend items similar to those users preferred in the past • User profiling is the key • E.g. in a movie recommender application, • Specific actors • Directors • Genres • etc
Content-Based Methods • Content-based approach has its roots in information retrieval • Documents, web sites(URLs), and news messages • Designed mostly to recommend text-based items • Content is usually described with keywords
Content-Based Methods • TF-IDF weight for keywords in document is defined as • Content of document is defined as • Cosine similarity measure
Disadvantages • Not all content is well represented by keywords • Multimedia data • Items represented by same set of features are indistinguishable • Overspecialization problem • New user problem • No history available
Collaborative Methods • Use other users recommendations (ratings) to judge item’s utility • Key is to find users/user groups whose interests match with the current user • More users, more ratings: better results • Can account for items dissimilar to the ones seen in the past too
User Database A 9 B 3 C 9 : : Z 5 A B C 9 : : Z 10 A 5 B 3 C : : Z 7 A B C 8 : : Z A 6 B 4 C 2 : : Z A 10 B 4 C 8 . . Z 1 A 9 B 3 C . . Z 5 A 9 B 3 C 9 : : Z 5 A 10 B 4 C 8 . . Z 1 Correlation Match Extract Recommendations C Active User Collaborative Methods
Collaborative Methods • Memory-based algorithms • Value of the unknown rating for user and item is usually computed as an aggregate of the ratings of some other users for the same item • Where denotes the set of users that are the most similar to user c and who have rated item
Collaborative Methods • Similarity between two users • Pearson correlation coefficient • Cosine similarity
Collaborative Methods • Model-based algorithm • Cluster models and Bayesian networks are used to estimate this probability
Collaborative Methods • Model-based approaches use various machine learning techniques • K-means clustering • Gibbs sampling • Bayesian model • Probabilistic relational model • Linear regression • Maximum entropy model • Markov decision process • Probabilistic latent semantic analysis • Latent Dirichlet allocation • etc
Disadvantages • Finding similar users/user groups isn’t very easy • New user problem : No preferences available • New item problem: No ratings available • Sparsity problem