230 likes | 384 Vues
Dynamic Covering for Recommendation Systems. Ioannis Antonellis Anish Das Sarma Shaddin Dughmi. Outline. Covering & Recommendations Succinct Dynamic Covering Results: Upper Bounds Lower Bounds. Max k-cover Problem. Input: integer k items: X = {1,2, ..., n}
E N D
Dynamic Covering for Recommendation Systems Ioannis Antonellis Anish Das Sarma Shaddin Dughmi
Outline • Covering & Recommendations • Succinct Dynamic Covering • Results: • Upper Bounds • Lower Bounds
Max k-cover Problem • Input: • integer k • items: X = {1,2, ..., n} • sets: I = {S1, ..., Sm}, Si subset of X • Output: Find subset of I with size less than k that maximizes cover of items 1 k=1, Solution: A (size=3) k=2, Solutions: A,C (size=4) A,B (size=4) B,C (size=4) A 2 B 3 C 4 Sets 5 Items
Max k-cover Problem • NP-complete • Greedy Algorithm • pick set that cover more items • iterate • 1 - ((k-1)/k)^k <= 1 - 1/e = 0.67 approximation Items Sets 1 k=1, Solution: A (size=3) k=2, Solutions: A,C (size=4) A,B (size=4) B,C (size=4) A 2 B 3 C 4 5
Max k-cover in Recommendations • Alice views and rates movies • Netflix would like to recommend new movies to Alice for watching • Important problem: • Find users "similar" to Alice • Find users who cover a large set of Alice's likes and dislikes
Netflix example • Each user is identified by subset of movies he likes/viewed • Alice likes {A, B, C} • Fred likes {A, D} • Bob likes {B, E} • Ben likes {C, F} • Jim likes {A, B, F} • James likes {A, B, F} Ben and Jim in conjunction cover all Alice's likes Fred, Bob and Ben in conjunction cover all Alice's likes Jim and James add same value
k-covering vs nearest neighbor • for k=1, equivalent (dot product similarity) • covering allows for diversifying recommendations • want to cover all genres liked by a user • consider a user that likes 100 thriller movies and 10 comedies • want "similar" users to cover as many movies as possible • k-nearest neighbor attempts to find many similar users, not cover as many movies as possible
oDesk example • Online labor marketplace • clients post jobs and/or invite contractors • contractors apply to jobs • Contractor recommendations for clients • Bob invites/interviews/hires contractors • find clients "similar" to Bob • Job recommendations for contractors • Alice applies to jobs • find contractors "similar" to Alice
Succinct Dynamic Covering (SDC) • Input: • integer k • items: X = {1,2, ..., n} • sets: I = {S1, ..., Sm}, Si subset of X • query Q subset of X • Output: Find subset of I with size less than k that maximizes cover of items in query Q • However we further constrain the problem: • space constrained: statically preprocess (X,I) and store a small sketch, much smaller than O(mn) • dynamic: Q is not known apriori during the sketch creation
Notice two twists • dynamic • for each user the set of movies that need to be covered is different • covering is not static • space-constrained • real time, interactive recommendations • the whole netflix graph is huge • 10 million users • 100k movies • popular movies have been viewed many times • cannot process over the entire graph at query time
Ad serving • online advertisers • bid on webpages matching relevancy criteria • target certain user demographics When a user visits a page • Ad servers: • have some (not precise) idea about the demographic of the user (e.g. from click logs) • try to pick a set of ads that cover many user demographics • need to solve the SDC probem
Ad serving • space-constraint: • set system consists of users, webpages and clicks • dynamic: • each user view of each page is associated with different user demographic 1 A User visited pages 2 B 3 C 4 Ads 5 Webpages
Coverage Oracle • Offline stage: • Input: • integer k • items: X = {1,2, ..., n} • sets: I = {S1, ..., Sm}, Si subset of X • Output: Data Structure D • Dynamic stage: • Input: Query Q subset of X • Output: use D to find subset of I with size less than k that maximizes cover of items in query Q
Outline • Covering & Recommendations • Succinct Dynamic Covering • Results: • Upper Bounds • Lower Bounds
Results • given space limitations • interested in approximate solutions for SDC • space vs approximation ratio tradeoffs • ε: [0,1/2] • δ1, δ1: non-negative integers, not both zero
Simple Deterministic Algorithm • For every item, "remember" one set • break ties arbitrarily • m/k approximation, linear space Items Sets Items Sets k=2: OPT = 16 APPROX = 8 ratio = 16/8 =2
Better Deterministic Algorithm • Find unchosen set containing the most uncovered items. Iterate. • similar to previous algorithm, order is fixed • sqrt(n/k) approximation, linear space Items Sets Items Sets k=2: OPT = 16 APPROX = 16 ratio = 16/16 = 1
Randomized Algorithm • mε/sqrt(k) approximation • nm1-2ε space • Find unchosen set containing at least n/(mεsqrt(k)). Choose and Iterate. • For every remaining unchosen set, choose n/m2ε uniformly at random from the uncovered items
Randomized Algorithm • mε/sqrt(k) approximation • nm1-2ε space • Find unchosen set containing at least n/(mεsqrt(k)). Choose and Iterate. • For every remaining unchosen set, choose n/m2ε uniformly at random from the uncovered items
Lower Bound • holds for deterministic oracles only • proof somewhat involved, uses the probabilistic method • matches randomized upper bound • Open problem: randomized lower bound
Related word • distance oracles in graphs, Thorup and Zwick • set cover in streaming model (sets are streams or items are streams) • nearest neighbor (NN) search: • for k=1, SDC and NN are equivalent using the dot product similarity • no locality sensitive hashing for dot product (Charikar). So, no hope for signature schemes for SDC.
Summary • Introduced Succinct Dynamic Covering problem • Applications in many real-world recommendation systems • approximation ratio and space tradeoffs • Deterministic and Randomized upper bounds • Deterministic lower bound