1 / 27

Large-scale Recommendations in a Dynamic Marketplace

Large-scale Recommendations in a Dynamic Marketplace. Jay Katukuri Rajyashree Mukherjee Tolga Konik Chu-Cheng Hsieh. Meet John Doe. John is interested in an item: “iPhone 5 64gb white”, should we recommends “iPhone 5 case” (or) “iPhone 5s gold”. Recommendation on e-marketplace.

thi
Télécharger la présentation

Large-scale Recommendations in a Dynamic Marketplace

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large-scale Recommendations in a Dynamic Marketplace Jay Katukuri Rajyashree Mukherjee TolgaKonik Chu-Cheng Hsieh LSRS 2013

  2. Meet John Doe John is interested in an item: “iPhone 5 64gb white”, should we recommends • “iPhone 5 case”(or) • “iPhone 5s gold” LSRS 2013

  3. Recommendation on e-marketplace • Recommendation “before” purchase • iPhone 5S gold • Recommendation “after” purchase • iPhone 5 case Similar Item Recommendation (SIR) Related Item Recommendation (RIR) LSRS 2013

  4. SIR- Example 1 LSRS 2013

  5. SIR Example 2 LSRS 2013

  6. Related Item Recommendation Recommendations for Xbox 360 4GB on Checkout page LSRS 2013

  7. Main Idea • Similar Item Clustering (SIC) • Titles • Attributes (Price, etc.) • Images • Recommendation • SIR: (same cluster) • RIR: (neighbor clusters) LSRS 2013

  8. Models • Item clustersCluster represented by meaningful keywords • “clarkswomen shoe pumps classics” • “authentic handmade amish quilt” • Cluster-Cluster Relations • “samsung galaxy s4” – “samsung galaxy s4 screen protector” • “wolfgang puck electric pressure cooker” – “kitchenaid food processor” LSRS 2013

  9. System Architecture - Overview Offline Model Generation The Data Store Real-time Performance System Clusters Bought Item Lost Item Similar Items Recommender (SIR) Clusters Model Generation ?relatedTo(item) ?similarTo(item) Inventory Related Items Similar Items Clickstream Transactions Related Items Recommender (RIR) Related Clusters Model Generation Conceptual Knowledgebase Cluster-Cluster Relations LSRS 2013

  10. Cluster Generation(offline) LSRS 2013

  11. Data on eBay • Item-item co-occurrences on transaction logs • Large Data • Much bigger data set in both users and inventory than other ecommerce sites. • Scale • More than 300M listings. • More than 10M  new items every day LSRS 2013

  12. Challenges • Global clustering not feasible • Size bias on different categories • Performance LSRS 2013

  13. Model Generation - Clusters • Select a few keyword to represents “big notions”, e.g. iPhone, Handbags, etc. • How to select? • Clustering by K-means • How to set K? LSRS 2013

  14. Model Generation - Clusters • Problem:Global clustering not feasible • Solution:Partition input data by user queries • Parallel distributed K-Means in Hadoop MapReduce • Dedupe and merge overlapping clusters(100X reduction in size over inventory with over 90% coverage) Inventory Conceptual Knowledgebase Data Store Clickstream Clusters concepts, categories user queries new clusters items Query-Recall Generation Cluster Generation query-to-items Clusters Model Generation LSRS 2013

  15. Base Cluster Generation • Base Cluster ≡ Query • Find merge candidates based on query term overlap • Eg: “nikeairmax tennis shoes” -> “nikeairmax” • Score candidates using cosine similarity • Term weight : TF-IDF in the query space(document=query) • TF : Query Demand • IDF : Number of Queries LSRS 2013

  16. Step 1: base cluster candidates • Method for choosing the ``base clusters’’ (initial states): • Minimum frequency • Supply threshold (Enough Inventory) • Min and max token constraint (Length of queries) • Heuristic constraints • Queries that have only numbers are not allowed: “10 5” • … • Merge similar clusters into one LSRS 2013

  17. candidates merge • 4.34M base clusters merged into 1.95M • Example phrase(hand,made) phrase(king,s) queen quilt phrase(hand,made) phrase(pink,s) quilt phrase(hand,made) phrase(prae,owned) queen quilt phrase(hand,made) queen quilt phrase(hand,made) phrase(prae,owned) quilt phrase(hand,made) quilt size twin phrase(hand,made) quilt silk phrase(hand,made) quilt twin phrase(hand,made) phrase(patch,work) quilt phrase(hand,made) quilt white phrase(hand,made) phrase(king,size) quilt phrase(hand,made) phrase(yo,yo,s) quilt phrase(hand,made) quilt sale phrase(hand,made) quilt red phrase(hand,made) quilt LSRS 2013

  18. Step 2: K-Means Clustering Query to Items Data Transaction Logs Base Cluster Generation Inventory Logs Generate Item Features Scoring Models K-Means Clustering of Base Clusters Split Clusters LSRS 2013

  19. Clusters on Item Signature Cluster apple ipod touch 4g clear film protector screen clarks women shoe pumps classics LSRS 2013

  20. Recommendation (online) LSRS 2013

  21. Performance System Data Store Data Store Cluster-Cluster Relations Clusters Conceptual Knowledgebase Conceptual Knowledgebase Clusters Inventory Inventory Item Search Item Search SIR query formation Item Selection Item Selection related clusters clusters Cluster Assignment Cluster Assignment query RIR Query Formation queries items items SIR Ranking RIR Ranking ?similarTo(item) recommendations ?relatedTo(item) recommendations Lost Item Similar Items Bought Item Related Items LSRS 2013

  22. Items in the same cluster LSRS 2013

  23. Similar Item Recommendations LSRS 2013

  24. Experimental Results • A/B Tests comparing against legacy systems • SIR legacy system • Completely online • Naïve approach of using seed item title as a search query • RIR legacy system • Chen, Y. and J.F. Canny, Recommending ephemeral items at web scale, ACM SIGIR 2011 • Collaborative Filtering on stable representations of items • Significant improvements at 90% confidence interval • SIR resulted in 38.18%higher user engagement (CTR) • RIR resulted in 10.5% higher CTR • Statistically significant improvement in site-wide business metrics from both SIR & RIR LSRS 2013

  25. Conclusion • Balance between similarity and quality crucial in driving user engagement and conversion • Clusters of similar items in the inventory • Local clustering in the coverage set of user queries • Offline models built using Map-Reduce • Huge input datasets including inventory, clickstream and transactional data • Efficient real-time performance system • Currently deployed on ebay.com LSRS 2013

  26. Acknowledgments • Current & Past team members • Kranthi Chalasani • Santanu Kolay • Riyaaz Shaik • Venkat Sundaranatha LSRS 2013

  27. Chu-Cheng Hsieh chsieh@ebay.com We’re hiring LSRS 2013

More Related