The Load Distance Balancing Problem Eddie Bortnikov (Yahoo!) Samir Khuller (Maryland) Yishay Mansour (Google) Seffi Naor (Technion)
The Load-Distance Balancing Problem • Given n clients and k servers s1, s2,…sk we need to assign each client to a server. • Cost for client i assigned to server sj is as follows: Cost(i) = Distance(i,sj) + Delay(j,Lj) Delay(j,Lj) is a FUNCTION of the number of clients Lj assigned to sj. OBJECTIVE: Min Max Cost(i)
An Example s2 s1 s3 C E D A B • Each server has its own delay function (can be arbitrary, we just assume its non-decreasing). • Note that C is closer to s1, but prefers to attach to s2 since Dist(C,s2)+Delay(s2,2)<Dist(C,s1)+Delay(s1,3) • Objective is to Minimize Max Cost for any client
Related Work • Lots of research on locating facilities. Here the facilities are all given; we just have to compute assignment of clients to facilities. • Notion of capacities has been used for various covering problems such as Vertex Cover, K Centers, Facility Location etc.
Main Results • The problem is NP-hard. • We develop a polynomial time 2 approx. • We show that the bound 2 cannot be improved to 2-ε unless NP=P. • With triangle inequality in the distance function the hardness reduces to 5/3-ε. • When all clients and servers are on a line its solvable in polynomial time. • For Min Sum Cost(i), we can solve in polynomial time using Min-Weight Matching.
NP-hardness by Exact Set Cover • Given N elements and a collection S of K sets (each set has size m). Does there exist a subset S’ of S, such that each element belongs to exactly one set in S’? In other words, we need to pick exactly N/m subsets from S, to cover each element once.
Example of Exact Cover • Here we have N=16 elements and 9 sets (m=4). • The FOUR blue sets form an EXACT COVER, and we discard the FIVE orange sets.
Reduction from Exact Cover • Each element is a client. In addition we create a collection of M(K-N/m) dummy clients. • Subset Sj in S corresponds to server sj. • Dist(dummy,server)=d1 • Dist(i,sj) = d2 if i ε Sj, o.w. ∞ • d2 >> d1 d1 Dummy clients sj d2 i Clients (elements)
Reduction from Exact Cover • Delay functions for servers are basically a step function. • Delay(j,Lj) = Δ-d2, when load is at most m. • Delay(j,Lj)=Δ-d1, when load exceeds m, but is at most M. Δ-d1 Δ-d2 m M
Reduction from Exact Cover • Suppose there is a solution to exact cover, then there is a solution to the LDB problem with delay at most Δ. • For each chosen subset Sj, the corresp. server sj gets m clients each at distance d2.Since the delay is Δ-d2, the total cost is at most Δ. • For the remaining subsets, those dummies are all assigned to the remaining servers (K-N/m), each gets M dummies. • The proof in the other direction requires some work!
Proof (cont.) • Each server can support at most M dummy clients if the total cost does not exceed Δ, and no more than m real clients. • Suppose a server supports both real and dummy clients; then the total number of servers with real clients is k’ > N/m. • These serve at most (mk’-N) dummy clients, while the rest can serve only M(k-k’) dummy clients. • Adding the two shows (some algebra needed) that we can only assign < M(k-N/m) dummy clients if M>m.
Hardness Results follow…. • We can set d1=ε and d2=Δ-ε. A solution to EXACT COVER exists if and only if a solution with cost Δ exists for LDB. • If there is no solution to EXACT COVER, then every solution to LDB has cost 2(Δ-ε). • However, here we do violate triangle inequality in the distance function.
Hardness results with triangle inequality • We need to set d1=⅓ Δ, and d2= Δ-ε. • With these parameters, the distance between a (real) client i and a server sj such that i is not in Sj, is at least 5/3Δ- ε. • A solution to EXACT COVER exists if and only if a solution with cost Δ exists for LDB. • If there is no solution to EXACT COVER, then every solution to LDB has cost 5/3Δ-ε.
Approximation Algorithm with factor 2 • Suppose a solution exists with maximum cost Δ. • For each server sj, we can compute an upper bound on the number of clients that can be served with a delay of at most Δ (say L*j). • For each client i we can compute the subset of servers that are within distance Δ (S*i). • Now its just a flow problem to check if an assignment exists where each client i is assigned to a server from S*i and each sj has load at most L*j. • Minimizing Δ gives a trivial 2 approximation.
All servers and clients on a line • Use dynamic programming!
Minimizing the Sum of Costs • We reduce this to min cost matching in a bipartite graph. • Let G=(X,Y,E) where nodes in X correspond to n clients and there are nk nodes in Y. We have n nodes corresp. to each server. • We ask for a min cost matching to find a solution.
Capacitated K Centers • The related problem of choosing K facilities has been considered (each client should be assigned to a closeby facility and the load on the facility should not be too high): [Khuller & Sussman] (K,L,5R) or ((2/c)K, cL, 2R) related to K-centers clustering.
Conclusions • Can we improve the 2 approximation when triangle inequality holds? • Can we improve the 2 approximation when Delay functions satisfy specific properties? What is a natural delay function? • Are there other special cases that can be solved in polynomial time?