1 / 23

Dynamic Covering for Recommendation Systems

Dynamic Covering for Recommendation Systems. Ioannis Antonellis Anish Das Sarma Shaddin Dughmi. Outline. Covering & Recommendations Succinct Dynamic Covering Results: Upper Bounds Lower Bounds. Max k-cover Problem. Input: integer k items: X = {1,2, ..., n}

erma
Télécharger la présentation

Dynamic Covering for Recommendation Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Covering for Recommendation Systems Ioannis Antonellis Anish Das Sarma Shaddin Dughmi

  2. Outline • Covering & Recommendations • Succinct Dynamic Covering • Results: • Upper Bounds • Lower Bounds

  3. Max k-cover Problem • Input: • integer k • items: X = {1,2, ..., n} • sets: I = {S1, ..., Sm}, Si subset of X • Output: Find subset of I with size less than k that maximizes cover of items 1 k=1, Solution: A (size=3) k=2, Solutions: A,C (size=4) A,B (size=4) B,C (size=4) A 2 B 3 C 4 Sets 5 Items

  4. Max k-cover Problem • NP-complete • Greedy Algorithm • pick set that cover more items • iterate • 1 - ((k-1)/k)^k <= 1 - 1/e = 0.67 approximation Items Sets 1 k=1, Solution: A (size=3) k=2, Solutions: A,C (size=4) A,B (size=4) B,C (size=4) A 2 B 3 C 4 5

  5. Max k-cover in Recommendations • Alice views and rates movies • Netflix would like to recommend new movies to Alice for watching • Important problem: • Find users "similar" to Alice • Find users who cover a large set of Alice's likes and dislikes

  6. Netflix example • Each user is identified by subset of movies he likes/viewed • Alice likes {A, B, C} • Fred likes {A, D} • Bob likes {B, E} • Ben likes {C, F} • Jim likes {A, B, F} • James likes {A, B, F} Ben and Jim in conjunction cover all Alice's likes Fred, Bob and Ben in conjunction cover all Alice's likes Jim and James add same value

  7. k-covering vs nearest neighbor • for k=1, equivalent (dot product similarity) • covering allows for diversifying recommendations • want to cover all genres liked by a user • consider a user that likes 100 thriller movies and 10 comedies • want "similar" users to cover as many movies as possible • k-nearest neighbor attempts to find many similar users, not cover as many movies as possible

  8. oDesk example • Online labor marketplace • clients post jobs and/or invite contractors • contractors apply to jobs • Contractor recommendations for clients • Bob invites/interviews/hires contractors • find clients "similar" to Bob • Job recommendations for contractors • Alice applies to jobs • find contractors "similar" to Alice

  9. Succinct Dynamic Covering (SDC) • Input: • integer k • items: X = {1,2, ..., n} • sets: I = {S1, ..., Sm}, Si subset of X • query Q subset of X • Output: Find subset of I with size less than k that maximizes cover of items in query Q • However we further constrain the problem: • space constrained: statically preprocess (X,I) and store a small sketch, much smaller than O(mn) • dynamic: Q is not known apriori during the sketch creation

  10. Notice two twists • dynamic • for each user the set of movies that need to be covered is different • covering is not static • space-constrained • real time, interactive recommendations • the whole netflix graph is huge • 10 million users • 100k movies • popular movies have been viewed many times • cannot process over the entire graph at query time

  11. Ad serving • online advertisers • bid on webpages matching relevancy criteria • target certain user demographics When a user visits a page • Ad servers: • have some (not precise) idea about the demographic of the user (e.g. from click logs) • try to pick a set of ads that cover many user demographics • need to solve the SDC probem

  12. Ad serving • space-constraint: • set system consists of users, webpages and clicks • dynamic: • each user view of each page is associated with different user demographic 1 A User visited pages 2 B 3 C 4 Ads 5 Webpages

  13. Coverage Oracle • Offline stage: • Input: • integer k • items: X = {1,2, ..., n} • sets: I = {S1, ..., Sm}, Si subset of X • Output: Data Structure D • Dynamic stage: • Input: Query Q subset of X • Output: use D to find subset of I with size less than k that maximizes cover of items in query Q

  14. Outline • Covering & Recommendations • Succinct Dynamic Covering • Results: • Upper Bounds • Lower Bounds

  15. Results • given space limitations • interested in approximate solutions for SDC • space vs approximation ratio tradeoffs • ε: [0,1/2] • δ1, δ1: non-negative integers, not both zero

  16. Simple Deterministic Algorithm • For every item, "remember" one set • break ties arbitrarily • m/k approximation, linear space Items Sets Items Sets k=2: OPT = 16 APPROX = 8 ratio = 16/8 =2

  17. Better Deterministic Algorithm • Find unchosen set containing the most uncovered items. Iterate. • similar to previous algorithm, order is fixed • sqrt(n/k) approximation, linear space Items Sets Items Sets k=2: OPT = 16 APPROX = 16 ratio = 16/16 = 1

  18. Randomized Algorithm • mε/sqrt(k) approximation • nm1-2ε space • Find unchosen set containing at least n/(mεsqrt(k)). Choose and Iterate. • For every remaining unchosen set, choose n/m2ε uniformly at random from the uncovered items

  19. Randomized Algorithm • mε/sqrt(k) approximation • nm1-2ε space • Find unchosen set containing at least n/(mεsqrt(k)). Choose and Iterate. • For every remaining unchosen set, choose n/m2ε uniformly at random from the uncovered items

  20. Lower Bound • holds for deterministic oracles only • proof somewhat involved, uses the probabilistic method • matches randomized upper bound • Open problem: randomized lower bound

  21. Related word • distance oracles in graphs, Thorup and Zwick • set cover in streaming model (sets are streams or items are streams) • nearest neighbor (NN) search: • for k=1, SDC and NN are equivalent using the dot product similarity • no locality sensitive hashing for dot product (Charikar). So, no hope for signature schemes for SDC.

  22. Summary • Introduced Succinct Dynamic Covering problem • Applications in many real-world recommendation systems • approximation ratio and space tradeoffs • Deterministic and Randomized upper bounds • Deterministic lower bound

  23. Thank you!

More Related