Simultaneous OptimizationAshish GoelUniversity of Southern California Joint work with Deborah Estrin, UCLA (Concave Costs) Adam Meyerson, Stanford/CMU (Convex Costs, Concave Utilities)
Algorithms/Theory of Computation at USC • Len Adleman • Tim Chen • Ashish Goel • Ming-Deh Huang • Mike Waterman • Applications/Variations: • Arbib, Desbrun, Schaal, Sukhatme, Tavare…
My Long Term Research Agenda • Foundations of computer science (computation/interaction/information) • Combinatorial optimization • Approximation algorithms • Discrete Applied Probability • Find interesting connections • Operations research; queueing theory; stochastic processes; functional analysis; game theory • Find interesting application domains • Communication Networks • Self-Assembly
Simultaneous Optimization • Approximately minimize the cost of a system simultaneously for a large family of cost functions • Concave cost functions correspond to economies of scale • Convex cost functions measure congestion on machines, on agents, or in networks • Maximizing utility/profit without knowing the profit function • Allocated bandwidths to customers in a network • Concave profit functions correspond to the "law" of diminishing returns
Example of Concave Utilities:Revenue Maximization in Networks • Let x = hx1,x2, …,xNi denote the bandwidth allocated to N users in a communication network • These bandwidths must satisfy some linear capacity constraints, say Ax · C • Let U(x) denote the total revenue (or the utility) that the network operator can derive from the system • Goal: Maximize U(x), subject to Ax · C • x can be thought of as "wealth" in any resource allocation problem
The Utility Function U • Standard assumptions: • U is concave (law of diminishing returns) • U is non-decreasing (more bandwidth can't harm) • U(0) = 0 • We will also assume that U is symmetric in x1,x2,…,xN • The corresponding optimization problem is easy to solve using Operations Research techniques (eg. interior point methods) • But what if U is not known? • Simultaneous optimization: MaximizeU(x)simultaneouslyfor all Utility functionsU
Why simultaneous optimization? • Often, the utility function is poorly understood, eg. Customer satisfaction • Often, we might want to promote several different objectives, eg. Fairness • Let us focus on fairness. Should we • Maximize the average utility? • Be Max-min fair (steal from the rich)? • Maximize the average income of the bottom half of society? • Minimize the variance? • Do all of the above?
Fair Allocation Problem: Example How to split a pie fairly? But what if there are more constraints? Alice and Eddy want only apple pie Frank: allergic to apples Cathy: equal portions of both pies David: twice as much apple pie as lemon What is a "Fair" allocation? How do we find one?
Simultaneous optimization and Fairness • Consider the following three allocations of bandwidths to two users • ha,bi • hb,ai • h(a+b)/2,(a+b)/2i • If f measures fairness, then intuitively, f(a,b) = f(b,a) · f((a+b)/2,(a+b)/2) • Hence, f is a symmetric concave function • f(0) = 0, and f non-decreasing are also reasonable restrictions ie. f is a utility function • All the fairness measures we found in literature are equivalent to maximizing a utility function • Simultaneous optimization also promotes fairness
Can we do Simultaneous Optimization? • Of course not • But, perhaps we can do it "approximately"? • Aha!! Theoretical Computer Science! • More modest goal: • Find x subject to Ax · C, such that for any utility function U, U(x) is a good approximation to the maximum achievable value of U
Approximate Majorization • Given an allocation y of bandwidths to users, let Pi(y) denote the sum of the i smallest components of y • Let Pi* denote the largest possible value of Pi(y) • Definition: x is said to be a-majorized if Pi(x) ¸ Pi*/a for all 1· i· N • Variant of the notion of majorization • Interpretation: the K poorest individuals in the allocation x are collectively at least 1/a times as rich as the K poorest individuals in any other feasible allocation
Why Approximate Majorization? • Theorem 1: An allocation x is a-majorized if and only if U(x) is an a-approximation to the maximum possible value of U for all utility functions U • ie. a-majorization results in approximate simultaneous optimization • Proof invokes a classic theorem of Hardy, Littlewood, and Polya from the 1920s
Existence • Theorem 2: For the bandwidth allocation problem in networks, there exists an O(log N)-majorized solution • ie. Can simultaneously approximate all utility functions up to a factor O(log N) • Results extend to arbitrary linear (even convex) programs, and not just the bandwidth allocation problem
Tractability • Theorem 3: Given arbitrary linear constraints, we can find (in polynomial time) the smallest a such that an a-majorized solution exists • Can also find the corresponding a-majorized solution • This completes the study of approximate simultaneous optimization for linear programs [Goel, Meyerson; Unpublished] [Bhargava, Goel, Meyerson; short abstract in Sigmetrics '01]
Examples of Utility Functions • Min • Pi(x) • Sum/Average • åi f(xi) where f is a uni-variate utility function • Eg. Entropy, åilog (1+xi) etc. • Variance is also symmetric convex • Can also approximately minimize the variance Can simultaneously approximate capitalism, communism, and many other "ism"s.
Open Problem Distributed Algorithms??
Example of Concave Costs:Data Aggregation in Sensor Networks • There is a single sink and multiple sources of information • Need to construct an aggregation tree • Data flows from the sources to the sink along the tree • When two data streams collide, they aggregate • Let f(k) denote the size of k merged streams • Assume f(0) = 0, f is concave, f is non-decreasing • Concavity corresponds to concavity of entropy/information • Canonical Aggregation Functions • The amount of aggregation might depend on nature of information • f is not known in advance
Another Example of Concave Costs:Buy-at-Bulk Network Design • Single source, several sinks (consumers) of information • If a link serves k sinks, its cost is f(k) • Economies of scale =>f is concave • Also, assume that f is increasing, and f(0) = 0 • Goal: Construct the cheapest distribution tree • Buy-at-Bulk Network Design • Assume f is not known in advance • Same problem as before • Multicast communication: f(k) = 1 • Unicast communication: f(k) = k
A Simple Algorithmic Trick Jensen's Inequality • E[f(X)] · f(E[X]) for any concave function f • Hence, given a fractional/multipath solution, randomized rounding can only help 2 Prob. ½ 1 Prob. ¼ 2 0.5 sink source 0.5 Prob. ¼ 2
Notation • Given graph G=(V,E) and • Cost function c : E!<+ on edges • Sink t • Set S of K sources • Cost of supporting j users on edge e is c(e)f(j), where f is an unknown canonical aggregation function • Given an aggregation tree T, CT(f) = Cost of tree T for function f • C*(f) = minT{CT(f)} • RT(f) = CT(f)/ C*(f)
Problem Definition • Deterministic Algorithms: Problem D • Construct a tree T and give a bound on maxf{RT(f)} • Randomized Algorithms:Two possibilities • Problem R1: Bound on maxf{ET[RT(f)]} • Problem R2: Bound on ET[maxf{RT(f)}] • Will focus on problem R2 (R2 subsumes R1) • Problem R1 does not model "simultaneous" optimization: no one tree needs to be good for all canonical functions. • Problem R1 can be tackled using known techniques • A solution of Problem R2 is likely to result in a solution of problem D using de-randomization techniques
Previous Work • Problem is NP-Hard even when f is known • Randomized O(log K log log K) approximation for problem R1 using Bartal's tree embeddings [Bartal '98; Awerbuch and Azar '97] • Improved to a constant factor [Guha, Meyerson, Munagala '01] • O(log K) approximation when f is known, but can be different for different links [Meyerson, Munagala, Plotkin '00]
Background: Bartal's Result • Randomized algorithm which takes an arbitrary metric space (V,dV) as input and constructs a tree metric (V,dT) such that dV(u,v) · dT(u,v), and E[dT(u,v)] ·a dV(u,v), wherea = O(log n log log n) • Results in an O(log K log log K) guarantee for problem R1 • No obvious way to extend to problem R2 • Quite complicated
Our Results • Simple Algorithm • Gives a bound of 1 + log K for problem R2 • Intreresting rules of thumb • Can be de-randomized using pessimistic estimators and the O(1) approximation algorithm for known f • Quite technical; details omitted
Our Algorithm: Hierarchical Matching • Find the minimum cost matching between sources • The "Matching" Step • cost is measured in terms of shortest path distance • For each matched pair, pick one at random and discard it • The "Random Selection" Step • Pretend that the demand from the discarded node is moved to the remaining node • If two or more sources remain, go back to step 1 • At the end, take a union of all the matchings and also connect the single remaining source to the sink
Example Demands are all 1 Sink
Example Demands are all 1 Matching 1
Example Random Selection Step Demands are all 2
Example Demands are all 2 Matching 2
Example Random Selection Step Demands are all 4
Example Demands are all 4 Matching 3
Example Random Selection Step Demands are all 8
Example: The Final Solution 1 2 1 Mi =Total cost of edges in i-th matching CT(f) = åi Mi f(2i-1) 8 4 1 1 2
Bounding the Cost: Matching Step • Ci*(f) = Cost of optimal tree for function f in the residual problem after i iterations • Claim:Matching cost in step i =Mi¢f(2i-1) · Ci*(f) Sink Optimal aggregation tree for six sources (t1, t2, t3, t4,t5,t6) t4 t3 t6 t5 t1 t2
Bounding the Cost: Random Selection • Consider any edge e in the optimum aggregation tree for function f • Let k(e) be the number of sinks which use e • Focus on the "Random selection" step for one matched pair (u,v) • k'(e) = total demand routed on edge e after this step • For each of u and v, the demand is doubled with probability ½ and becomes 0 otherwise => E[k'(e)] = k(e) • By Jensen's inequality: E[f(k'(e)] · f[E[k'(e)] = f(k(e))
A Bound for Problem R1 • Residual cost of opt. soln. is a super-martingale • E[Ci*(f)] · C*(f) • Expected Matching Cost in each matching step ·åiE[Ci*(f)] · C*(f) • In each matching step, the number of sources goes down by half =>1+log K matching steps =>Theorem 1:E[RT(f)] · 1+log K • Marginal improvement of O(log log K) over Bartal's embeddings for this problem • Not very interesting
Bound for Problem R2 • Atomic aggregation functions • Ai(x) = min{x,2i} • Aiis a canonical aggregation function • Main Idea: Suffices to study the performance of our algorithm just for the atomic functions Details complicated; Omitted Ai(x) 2i x
Open Algorithmic Problems • Multicommodity version (many source-sink pairs) • Preliminary Progress: Can obtain O(log n log K log log n) guarantee using Bartal's embeddings combined with our analysis • Lower Bounds? • Conjecture: W(log K) • Handle arbitrary demands at sinks • Our algorithm yields 1 + log K + log Dguarantee for problem R2 where D is the maximum demand
Open Modeling Problems Realistic models of more general aggregation functions • Information cancellation • One node senses a pest infestation and sends an alarm. • Another node senses high pesticide levels in the atmosphere, and sends another alarm. • An intermediate node might receive both pieces of information and suppress both alarms. • Amount of aggregation may depend on the set of nodes being aggregated rather than just the number • Concave function f(Se) as opposed to f(ke) • Bartal's algorithm still gives an O(log K log log K) guarantee for problem R1
Moral • Why settle for one cost function when you can approximate them all? • Argument against approximate modeling of aggregation functions • Particularly useful for poorly understood or inherently multi-criteria problems • "Information independent" aggregation