180 likes | 856 Vues
Top-N Recommendation Algorithm Based on Item-Graph. Allen, Zhenjiang LIN CSE, CUHK Nov 13, 2007. Outline. 1. Top-N Recommendation Problem 2. Top-N Recommendation Algorithm 3. Item-Graph Model and GCP-based Method Item-Graph Model
E N D
Top-N Recommendation Algorithm Based on Item-Graph Allen, Zhenjiang LIN CSE, CUHK Nov 13, 2007
Outline • 1. Top-N Recommendation Problem • 2. Top-N Recommendation Algorithm • 3. Item-Graph Model and GCP-based Method • Item-Graph Model • Generalized Conditional Probability (GCP)-based Recommendation Algorithm • 4. Preliminary Experimental Results • 5. Conclusion and Future Work
Active User Basket 1. Top-N Recommendation Problem • The Top-N Recommendation Problem • Given the preference information of users, recommend a set of N items to a certain user that he might be interested in, based on the items he has selected. • E-commerce system example: Amazon. COM, customers vs products. User-Item matrix
Active User Basket Recommendations Example: the Amazon.com
1. Top-N Recommendation Problem • Challenges in E-commerce Systems • Huge amounts of data: millions of users and/or items; • Real-time return the results set; • Limited new user’s preference information; • Volatile users’ preference information.
2. Top-N Recommendation Algorithm • Two major approaches • Content-based: recommend items based on the content (textual information) of items. • Fab system [Balabanovic97], Syskill & Webert system [Pazzani97]. • Collaborative Filtering (CF): recommend items by collecting taste information from other users. • Collaborative (correlation) information between users. • More popular than content-based recommendation, since in many domains (such as music, restaurants) it is hard to extract useful features from items. • Tapestry system [Goldberg92], Video Recommender [Hill95], Ringo [Shardanand95], GroupLens [Konstan97], Jester system [Goldberg01],Amazon [Linden03].
2. Top-N Recommendation Algorithm • CF algorithms classified by strategy of using data • Memory-based:make recommendations based on the entire collection of references of the users. • No pre-computing is needed, suffer serious scalability problem. • E.g., Correlation-based [Resnick94], Cosine-based [Breese98]. • Model-based:use the collection of user preferences to learn a model, which is then used to make recommendations. • Building a model off-line, more scalable. • E.g., Cluster models [Ungar98], Bayesian network model [Breese98], Association Rule Mining approach [Lin00].
2. Top-N Recommendation Algorithm • CF algorithms classified by strategy of using objects • User-centric: look for similar (like-minded) users first and then make recommendation. • Similarity between users is relatively dynamic. • Pre-computing user neighborhood may lead to poor predictions. • Item-centric: look for similar (or related) items first and then make recommendation. • Similarity between items is relatively static. • Enables pre-computing of item-item similarity. • More scalable.
2. Top-N Recommendation Algorithm • Notations • Item set I = {I1, I2, …, Im}. • User set U = {U1, U2, …, Un}. • User-Item (binary) matrix D = (Dn,m). • Basket of the active user B I. • Similarity score of x and y: sim(x,y). • Formal definition of top-N recommendation problem • Given a user-item matrix D and a set of items B that have been selected by the active user, identify an ordered set of items X, such that |X| ≤ N, and X ∩B = 0.
2. Top-N Recommendation Algorithm • Two classical item-item similarity measures • Cosine-based (symmetric) sim(Ii, Ij) = cos(D*,i, D*,j) (1) • Conditional Probability(CP)-based (asymmetric) sim(Ii, Ij) = P(Ij | Ii) ≈Freq(Ii Ij) / Freq(Ii) (2) Freq(X): the number of users who have purchased the item set X. • The ranking score for item x RS(x) = ∑ b∈B sim(b,x) (3) (the sum of similarity score between x and the items in the basket B)
4. Preliminary Experimental Results • Dataset • The MovieLens(http://www.grouplens.org/data) • A web-based movies recommender system; • Contains multi-valued ratings that indicate how much each user liked a particular movie or not; • Each user has rated at least 20 movies. • We treat the ratings as an indication that the users have seen the movies (nonzero) or not (zero). Table 1: The characteristics of the MovieLens dataset 1Density: the percentage of nonzero entries in the user-item matrix.
4. Preliminary Experimental Results-1 • Evaluation Design • Split the dataset into training and test sets by • randomly selecting one rated movie of each user to be part of the test set, • use the remaining rated movies for training. • Cosine(COS)-based, CP-based, GCP-based methods, 10-runs average. • Evaluation Metrics • Hit-Rate (HR) HR = # of hits / n (6) • Average Reciprocal Hit-Rate (ARHR) ARHR = (∑i=1,h1/pi) / n (7) # of hits: the number of items in the test set that were also in the top-N lists. h is the number of hits that occurred at positions p1, p2, … , ph within the top-N lists (i.e., 1 ≤ pi ≤ N).
4. Preliminary Experimental Results-1 • Performance of Top-N Recommendation Algorithms HR (left):x-axis: top-N items, y-axis: hit-rate of all users. ARHR (right):x-axis: top-N items, y-axis: average reciprocal hit-rate of all users. (For the GCP-based method, set d = 2.)
4. Preliminary Experimental Results-2 • Testing the Parameter d in GCP Method • Testing the effect of d ( d = 1, 2, 3 ). • Evaluation: Online Shopping Simulation • Randomly selecting part of the user records to be the training set; • Use the remaining user records for training. • STEP 0: Constructing the item-graph based on the training set; • STEP 1: for each user in the training set • randomly moving one item out of the user’s basket and make recommendation based on the remaining items in the basket; • computing the order of this item in the recommendation list; • updating the item-graph. • STEP 2: Computing HR and ARHR metrics.
4. Preliminary Experimental Results-2 • Performance of Top-N Recommendation Algorithms HR (left):x-axis: top-N items, y-axis: hit-rate of all users. ARHR (right):x-axis: top-N items, y-axis: average reciprocal hit-rate of all users.
5. Conclusion and Future Work • Conclusion • Top-N Recommendation Problem and item-centric Algorithms • Cosine-based, conditional probability-based • Item-Graph model • Visualizing the relationship among items. • Easy to update. • Generalized Conditional Probability-based top-N recommendation algorithm • Item-centric & based on the Item-Graph model • Future Work • Clustering items and measuring item-item similarities based on the Item-Graph model • Speeding up the GCP method.
References • [Balabanovic97] M. Balabanovic and Y. Shoham. Fab: Content-based, Collaborative Recommendation.Commun. ACM, 40(3):66-72, 1997. • [Breese98] J. S.Breese, D. Heckerman, David and C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98),pages 43-52, San Francisco, 1998. • [Deshpande04]M. Deshpande and G. Karypis. Item-based Top-N Recommendation Algorithms.ACM Trans. Inf. Syst., 22(1):143-177, 2004. • [Lin00] W. Lin. Association Rule Mining for Collaborative Recommender Systems. Thesis submitted for theDegree of M.S. inComputer Science. • [Linden03] G. Linden, B. Smith and J. York.Amazon.com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing, 7(1):76-80, 2003. • [Resnick94] P.Resnick, N. Iacovou, M. Suchak, P. Bergstorm and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews.Proc. Computer Supported Cooperative Work Conf., pages 175-186, 1994.