1 / 28

Item-Based Collaborative Filtering Recommendation Algorithms

Item-Based Collaborative Filtering Recommendation Algorithms. Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army HPC Research Center Department of Computer Science and Engineering University of Minnesota, Minneapolis, 2001 2008. Nov. 05

Télécharger la présentation

Item-Based Collaborative Filtering Recommendation Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army HPC Research Center Department of Computer Science and Engineering University of Minnesota, Minneapolis, 2001 2008. Nov. 05 Presented by Eun-gyeong Kim, IDS Lab.

  2. Contents • Introduction • Collaborative Filtering Based Recommender Systems • Overview of the Collaborative Filtering Process • Challenges of User-based Collaborative Filtering Algorithms • Item-based Collaborative Filtering Algorithm • Item Similarity Computation • Prediction Computation • Performance Implications • Experimental Evaluation • Contributions • Discussion & Conclusion Center for E-Business Technology

  3. Introduction (What is Collaborative filtering?) • Now it is time to create the technologies that can help us sift through all the available information to find that which is most valuable to us. • One of the most promising such technologies is collaborative filtering • Collaborative filtering (by Wikipedia) • The process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. • The underlying assumption of CF approach is that those who agreed in the past tend to agree again in the future • CF systems usually take two steps • Look for users who share the same rating patterns with the active user • Use the ratings from those like-minded users found in step 1 to calculate a prediction for the active user Center for E-Business Technology

  4. Two main Categories of CF algorithms • Memory-based CF Algorithms • Utilize the entire user-item database to generate a prediction • Employ statistical techniques to find the neighbors • Model-based CF Algorithms • First developing a model of user ratings. • Computing the expected value of a user prediction , given his/her ratings on other items. • To build the model • Bayesian network (probabilistic) • clustering (classification) • rule-based approaches (association rules between co-purchased items) Center for E-Business Technology

  5. Recommendation Algorithms • User-based collaborative filtering • Traditional Collaborative Filtering • Cluster Models • Item-based collaborative filtering • Search-based Methods • Item-to-item collaborative filtering Amazon.com Recommendations: Item-to-Item Collaborative Filtering http://www.win.tue.nl/~laroyo/2L340/resources/Amazon-Recommendations.pdf Center for E-Business Technology

  6. CF Based Recommender Systems • provide item recommendations or predictions based on the opinions of other like-minded users 2 3 4 Center for E-Business Technology

  7. Traditional Collaborative Filtering (1) • Represents a customer as an N-dimensional vector of items, where N is the number of distinct catalog items • For almost all customers, this vector is extremely sparse • Generates recommendations based on a few customers(neighbors) who are most similar to the user • Measure the similarity of two customers, A and B Center for E-Business Technology

  8. Traditional Collaborative Filtering (2) • Generate recommendations • A common technique is to rank each item according to how many similar customers purchased it • O(MN) in the worst case • Performance tends to be closer to O(M+N) because the average customer vector is extremely sparse • Scaling issues • Reduce the data size • Reduce M by randomly sampling the customers or discarding customers with few purchases • Reduce N by discarding very popular or unpopular items • Reduce recommendation quality • We need better algorithms to scale to large data sets and at the same time produce high-quality recommendations Center for E-Business Technology

  9. Challenges of User-based CF Algorithms • Challenges • Sparsity • A person may have purchased well under 1% of the items • (1% of 2 million books is 20,000 books) • The accuracy of recommendations may be poor • Scalability • Computation grows with both the number of users and the number of items • Traditional CF does little or no offline computation, and its online computation scales with the number of customers and catalog items. => The key to item-to-item CF’s scalability and performance is that it creates the expensive similar-items table offline Center for E-Business Technology

  10. Item-based CF Algorithm • Similarity computation between two item i and j • First isolate the users who have rated both of these items • Then apply a similarity computation technique to determine the similarity • Prediction generation • Take a weighted average of the target user’s ratings on these similar items Center for E-Business Technology

  11. Item Similarity Computation Center for E-Business Technology

  12. Item Similarity Computation Center for E-Business Technology

  13. Prediction Computation Center for E-Business Technology

  14. Prediction Computation • Weighted Sum • Compute the sum of the ratings given by the user on the items similar to I • Each ratings is weighted by the corresponding similarity • Regression • Similarities computed using cosine or correlation measures may be misleading • Approximated values based on a linear regression model are used (Instead of using the similar item N’s “raw” ratings values ) Center for E-Business Technology

  15. Weighted Sum Example • Let’s predict the value of item i1 for u4 Center for E-Business Technology

  16. Item-to-item CF in Amazon.com • We could build a product-to-product matrix by iterating through all item pairs and computing a similarity metric for each pair. • However, many product pairs have no common customers, thus the approach is inefficient in terms of processing time and memory usage • Better approach by calculating the similarity between a single product and all related products • in the worst case • in practical Center for E-Business Technology

  17. Performance Implications • Precompute item-item similarity scores • In a typical E-Commerce scenario, we usually have a set of item that is static compared to the number of users that changes most often • Compute all-to-all similarity and then performing a quick table look-up to retrieve the required similarity values • Generating predictions for a user u on item i • Retrieves the precomputed k most similar items corresponding to the target item i • Then intersect between those k items and items purchased by the user u • The prediction is computed using basic item-based CF algorithm Center for E-Business Technology

  18. Experimental Evaluation: Data set • Movie data • Data from MovieLens • 943 users (among 43,000 users ) • 1682 movies (among over 3,500 different movies) • 100,000 ratings (only considered users that had rated 20 or more movies) • Divided the DB into a training set and a test set. • X=0.8 (80% of the data is used as training set) • Sparsity level: Center for E-Business Technology

  19. Experimental Evaluation: Evaluation Metrics • Statistical accuracy metrics • Mean Absolute Error (MAE) is a measure of the deviation of recommendations from their true user-specified values. • The lower the MAE, the more accurately the recommendation engine predicts user ratings. • Decision support accuracy metrics Center for E-Business Technology

  20. Experimental Results (1) • Effect of Similarity Algorithms Center for E-Business Technology

  21. Experimental Results (2) • Sensitivity of Training/Test Ratio • Experiments with neighborhood size Center for E-Business Technology

  22. Experimental Results (3) • Quality Experiments Center for E-Business Technology

  23. Sensitivity of the Model Size • The High accuracy that can be achieved using only a fraction of items • It is useful to precompute the item similarities using only a fraction of items and yet possible to obtain good prediction quality 100% 96% 98.3% Center for E-Business Technology

  24. Impact of the model size on run-time and throughput Center for E-Business Technology

  25. Contributions • Analysis of the item-based prediction algorithms and identification of different ways to implement its subtasks • Formulation of a precomputed model of item similarity to increase the online scalability of item-based recommendations • An experimental comparison of the quality of several different item-based algorithms to the classic user-based (nearest neighbor) algorithms Center for E-Business Technology

  26. Discussion & Conclusion • Discussion • Item-item scheme provides better quality of predictions than the user-user scheme • Item neighborhood is fairly static, which can be pre-computed, which results in very high online performance • Possible to retain only a small subset of items and produce reasonably good prediction quality • Conclusion • Item-based techniques allow CF-based algorithms to scale to large data sets and at the same time produce high-quality recommendations Center for E-Business Technology

  27. My comments • Lack of explanations about recommendation process • Does the calculated similarity really represent the similarity of items? • Lack of explanations about the range of similarity value • Can’t we precompute the similarity of users? Center for E-Business Technology

  28. References • Amazon.com Recommendations: Item-to-Item Collaborative Filtering http://www.win.tue.nl/~laroyo/2L340/resources/Amazon-Recommendations.pdf • Item-based Collaborative Filtering Recommendation Algorithms http://www.grouplens.org/papers/pdf/www10_sarwar.pdf Center for E-Business Technology

More Related