1 / 20

Yoda: An Accurate and Scalable Web-based Recommendation Systems

Yoda: An Accurate and Scalable Web-based Recommendation Systems. Cyrus Shahabi, Farnoush Banaei-Kashani, Yi-Shin Chen, and Dennis McLeod Integrated Media Systems Center and Computer Science Department, University of Southern California E-mail:{shahabi, banaeika, yishinc, mcleod}@usc.edu.

nikita
Télécharger la présentation

Yoda: An Accurate and Scalable Web-based Recommendation Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Yoda: An Accurate and Scalable Web-based Recommendation Systems Cyrus Shahabi, Farnoush Banaei-Kashani, Yi-Shin Chen, and Dennis McLeod Integrated Media Systems Center and Computer Science Department, University of Southern California E-mail:{shahabi, banaeika, yishinc, mcleod}@usc.edu

  2. Outline • Motivation • Related Work • Content-based Filtering • Collaborative Filtering • Offline Process: Clustering, Voting, Aggregation • Online Process: Classification & Aggregation • Performance Evaluation • Conclusion & Future Work

  3. Motivation • The amount of data is enormous on the Web • Users suffer from information overload • Recommendation systems can personalize and customize the Web environment in real-time • Similar to Amazon.com “real-time” recommendations (people who bought this book also purchased …) • Different approach (vs. association-rule mining) • Challenges: • Scalability : As the # of items and users grow, the system stay efficient • Sparsity: Not enough information available on the user

  4. Related Work: Content-Based Filtering • From the Information Retrieval community [Maes1994] [Shardanand and Maes 1995] [Balabanovi and Shoham 1997] • Based on a comparison between the feature vectors of items (e.g., artist, style) in the database and the user’s interest list • Major weakness [Balabanovi and Shoham 1997] • Content limitation: only can be applied to few kinds of content, can only capture certain aspects of the content • Over-specialization: users can only obtain information based on the content of their profiles

  5. Related Work:Collaborative Filtering(CF) • Employ a user’s item evaluations (not the actual content) to find other similar users: nearest-neighbor algorithm [Resnick et al. 1994] Three major weaknesses • Scalability: time complexity O(U*I) (I:#items, U: #users) • Clustering [Breese et al. 2000] • Bayesian network [Kitts et al. 2000] • Sparsity: profile matrix (i.e., # of user evaluated items) is sparse • SVD [Sarwar et al. 2000] • Synonymy: latent association between items is not considered • Content analysis [Balabanovi and Shoham 1997] • Categorization [Kohrs and Merialdo 2000]

  6. Clusters Item Database User Navigation Behaviors User 1 User 2 User 3 Fuzzy Aggregation User 4 User 5 User 6 User U-6 Cluster Wish-list User U-5 User U-4 0.87 0.83 User U-3 0.72 User U-2 0.61 User U-1 0.47 User U Offline Process PPED Similarity Measure and Clustering Favorite PVs (Rock= High Classical= Low Pop= Low Rap= High) Voting

  7. Rock Classical Pop Rap Blues Property Values High Low MidHigh Low Voting 51 22 10 7 15 61 21 25 37 Cp,f(k) Rock Classical Blues H M L H M L H M L Mpf=Max{Cp,f(k)} f in F Voting Mechanism Favorite PVs (Rock= High Classical= Low Pop= Low Rap= High Blues= Low)

  8. Locality Sensitive Hashing algorithm Property Values Item Database Rock Classical Pop Rap Blues Cluster Wish-List High Low MidMid Low 0.87 0.83 Fuzzy Aggregation 0.82 Favorite PVs (Rock= High Classical= Low Pop= Low Rap= High Blues= Low) 0.79 0.72 0.70 Fp(k) 0.68 (High*High) , (Mid*Low) 0.65 , (Low*Low) 0.63 0.61 0.54 0.47 0.42 Ranking Items fmax{ …} Vk(i)

  9. Property Values Rock Classical Pop Rap Blues High Low MidMid Low Favorite PVs (Rock= High Classical= Low Pop= Low Rap= High Blues= Low) Mhigh(k) Fuzzy Aggregation f (High*High) , (Low*Mid) fmax{ } Optimized Equation • Why optimized: time complexity O(#P*I) (#P: # of properties, I: # of items) • Intuition: the vk(i) value comes from the maximum value among

  10. Optimized Equation • Optimized Equation • Time complexity: O(f*I) I=#items f=#fuzzy terms • Satisfy a triangular norm form • Time complexity can be further reduced to O(N) (N: constant number) by Fagin’s A0 algorithm [Fagin 1996]

  11. Clusters Cluster Wish-lists User Wish-List 0.87 0.87 0.87 0.83 0.83 0.83 0.72 0.72 0.72 PPED Similarity Measure 0.87 0.61 0.61 0.61 0.83 0.47 0.47 0.47 0.82 0.79 0.72 0.70 0.68 Fuzzy Aggregation 0.65 A List of Similarity Values 0.63 0.65 0.32 0.61 0.79 0.54 0.47 0.42 Online Process Current User’s Navigation Behavior

  12. Optimized Method • Original Time complexity: O(K*I) K=#clusters I=#items • Time complexity of optimized method: • O(f*I) f=#fuzzy terms • Time complexity can be further reduced to O(N) (N: constant number) by Fagin’s A0 algorithm [Fagin 1996]

  13. User Navigation Behaviors Clustering Item Database Clusters Similarity Matrix cluster user Ranking of Items in Clusters Cluster Favorite PVs Generate • Assign Property Values • to Items: • Item-PV = f(Cluster-PV, noise) • noise ~ item-rank Generate User Set Experimental Methodology

  14. Item Database User Set Clusters Similarity Matrix User Navigation Behaviors cluster user H L M N F F L NF F L L M N F F L Ranking of Items in Clusters Cluster Favorite PVs • Assign evaluation values to items • Item-Rating = f(Cluster-Ranking, weight) • weight ~ user-cluster similarities M N F F L M MNF F L L M N F F L Experimental Methodology

  15. Item Database User Set User Navigation Behaviors H L M N F F L NF F L L M N F F L M N F F L M MNF F L L M N F F L Experimental Methodology Training Testing Current Session Recommendation

  16. 0.45 1.1 1 0.4 0.9 0.35 0.8 0.3 0.7 0.25 0.6 Improvement Harmonic Mean 0.5 0.2 0.4 0.15 0.3 0.1 0.2 0.05 0.1 0 0 1000 5000 Number of Items Accuracy Comparison Nearest Neighbor Method Yoda Improvement

  17. Yoda BNN: Basic Nearest Neighbor Method Processing Time Comparison Processing Time= CPU +IO In BNN process: #Items = 5000; #Users = 1000 In Yoda process: #Items in each cluster wish-list = 250 #Clusters = 18 2500 2000 1500 1000 CPU Time (milliseconds/user) 500 0 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Number of Users

  18. Conclusion • Yoda scales as the # of users/items grow • Higher accuracy Future Work • Compare other techniques • Run more experiments with real data • Incorporate the content-based filtering mechanism into the user clustering & classification phases • Incorporate the user profiles

  19. Reference • [Shardanand and Maes 1995] U. Shardanand and P. Maes, Social Information Filtering: Algorithm for automating ''Word of Mouth'', proceedings on Human factors in computing systems, Denver,CO,USA , p. 210-217, May, 1995 • [Maes 1994] Pattie Maes, Agents that reduce work and information overload, Communications of the ACM, 37(7), p.30-40, 1994 • [Balabanovi and Shoham 1997]Marko Balabanovi and Yoav Shoham, Fab: content-based, collaborative recommendation, Communications of the ACM, 40(3), p. 66-72, 1997 • [Resnick et al. 1994] P. Resnick and N. Iacovou and M. Suchak and P. Bergstrom and J. Riedl, GroupLens: An Open Architecture for Collaborative Filtering of Netnews, Proceedings of ACM conference on Cumputer-Supported Cooperative Work, Chapel Hill, NC, p.175-186, 1994 • [Sarwar et al. 2000] B. Sarwar and G. Karypis and J. Konstan and J.Riedl, Application of Dimensionality Reduction in Recommender System -- A Case Study, ACM WebKDD 2000 Web Mining for E-Commerce Workshop, 2000 • [Kohrs and Merialdo 2000] A. Kohrs and B. Merialdo, Using category-based collaborative filtering in the Active WebMuseum, Proceedings of IEEE International Conference on Multimedia and Expo, 1, p.351-354, 2000

  20. Reference • [Kitts et al. 2000] Brendan Kitts and David Freed and Martin Vrieze, Cross-sell: a fast promotion-tunable customer-item recommendation method based on conditionally independent probabilities, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, MA USA, p. 437-446, August, 2000 • [Breese et al. 2000] J. Breese and D. Heckerman and C. Kadie, Empirical Analysis of Predictive Algorithms for Collaborative Filtering, Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI USA, p.43-52, July, 1998 • Shahabi C., A.M. Zarkesh, J. Adibi, and V. Shah: Knowledge, Discovery from Users Web Page Navigation, Proceedings of the IEEE, RIDE97 Workshop, April, 1997. • Shahabi C., F. Banaei-Kashani, J. Faruque, and A. Faisal: Feature Matrices: A Model for Efficient and Anonymous Web Usage Mining , EC-Web 2001, Germany, September 2001 • Fagin R.: Combining Fuzzy Information from Multiple Systems, Proceedings of Fifteenth ACM Symposyum on Principles of Database Systems, Montreal, pp. 216-226, 1996. • Shahabi C., and Y. Chen: A Unified Framework to Incorporate Soft Query into Image Retrieval Systems , International Conference on Enterprise Information Systems, Setubal, Portugal, July 2001

More Related