310 likes | 466 Vues
CS584 - Software Multiagent Systems Lecture 12 D istributed constraint optimization II: Incomplete algorithms and recent theoretical results. Distributed Constraint Optimization. DCOP (Distributed Constraint Optimization Problem)
E N D
CS584 - Software Multiagent SystemsLecture 12Distributed constraint optimization II: Incomplete algorithms and recent theoretical results
Distributed Constraint Optimization • DCOP (Distributed Constraint Optimization Problem) • Agents cooperate to assign values to variables, subject to constraints, to maximize a global objective. • R(a)=∑ RS(a) for all constraints S Personal Assistant Agents for Scheduling Sensor Networks Multi- Spacecraft Coordination University of Southern California
Algorithms for DCOP • Complete algorithms • Pro: finds optimal solution • Con: not always feasible (exponential in time or space) • Local algorithms (today) • Pro: usually finds a high-quality solution quickly • Con: not optimal (but can guarantee within % of optimal) University of Southern California
k-optimality • k-optimal solution (k-optimum) • No deviation by ≤ k agents can increase solution quality • Local optimum • Globally optimal solution: 000. • 1-optimal solutions: 000, 111 • k-optimal algorithms: DSA (k=1), MGM-2 (k=2), ... University of Southern California
Approach • Decompose DCOP. • Each agent only sees its local constraints. • Agents maximize individual utility. • (individual view of team utility) University of Southern California
1-optimal algorithms • Algorithms for 1 agent: • Broadcast current value to all neighbors and receive neighbors’ values. • Find new value that gives highest gain in utility, assuming that neighbors stay fixed. • Decide whether or not to change value, and act accordingly. University of Southern California
MGM – Maximal Gain Message • Monotonic algo., but gets stuck in a local optimum! • Only one agent in a neighborhood moves at a time 8 10 8 x1 x2 x3 0 0 0 x1 x2 x3 University of Southern California
DSA – Distributed Stochastic Algorithm • One possible path (say p=0.5): 8 10 8 x1 x2 x3 .1 .2 .3 5 16 5 0 0 0 x1 x2 x3 x1 x2 x3 .6 .1 .7 University of Southern California
Experimental Domains • Regular graph coloring (~sensor network) • Cost if neighbors choose same value. • Randomized DCOP • Each combination of neighbors’ values gets uniform random reward • High-stakes scenario (~UAVs) • Large cost if neighbors choose same value. • Otherwise, small uniform random reward is given. • Add “safe” value where all agents start. No reward or penalty if neighbors choose this value. University of Southern California
DSA vs. MGM • Graph coloring and Randomized DCOP: • DSA gives higher solution quality than MGM. • DSA improves more quickly than MGM. • High-stakes scenario: • DSA and MGM give same solution quality. • MGM generally improves more quickly than DSA. • But, these graphs are averages.... University of Southern California
DSA vs. MGM • MGM increases monotonically • Much better for • anytime algorithm • high-stakes domains. University of Southern California
Algorithms with higher k • Until now (DSA, MGM), agents have only acted based on their own, local constraints • a myopic worldview • Now, we look at algorithms where agents form groups, and act based on all constraints in the group. • enlarging the worldview • First step: groups of 2. • “2-optimality” • Maheswaran, Pearce, and Tambe ‘04 University of Southern California
Coordinated Algorithms • All agents are either offerers or receivers with probability q. • Offerers: • Pick neighbor j at random, and calculate my gains from all combinations of values from myself and j. • Send this information (several offers) as a message to j. • < <myGain1, myValue1, yourValue1> • <myGain2, myValue2, yourValue2>,… > • Receivers: • Accept the offer that makes my group’s gain the highest, or just move alone instead. • groupGain = offerersGain + receiversGain - gain in common link. • If I accept an offer, tell the offerer which one I am accepting, and how much our group will gain. University of Southern California
2-optimal algorithms • To improve solution quality, agents can form groups of 2 • Groups move according to group utility • sum of all constraints on any group member • 2-optimal algorithm • any connected group of 2 agents can coordinate to make a joint move. • 2-optimum • state at which no group of up to 2 agents can make a joint move that will increase group reward. • Any 2-optimum is also a 1-optimum University of Southern California
MGM-2 • Form groups of 2 agents and then do: • Send my gain (can be group gain) to all my neighbors. • Receive gain messages from my neighbors. • If I am involved in an accepted offer, • If my gain > neighbors’ gain (not counting my partner), send “yes” to my partner. • If not, then send “no” to my partner. • If I sent “yes” and got “yes”, then make the move in the offer. • If I am not involved in an offer • If my gain > neighbors’ gain, then make my best move. • 5 message cycles per move • (offer, accept, gain, confirm, move). • Monotonically increasing solution quality University of Southern California
MGM-2 Example x1, x2 gain=7 x2, x3 gain=7 x2, x3 gain=12 x1 x2 x3 x1 x2 x3 offerer receiver offerer receiver offerer receiver accepts x1, x2 group gain=2 accepts x2, x3 group gain=12 x1 x2 x3 x1 x2 x3 no gains x1 x2 x3 University of Southern California
SCA-2(Stochastic Coordination Algorithm) • Based on DSA • If offerer • Send out offers to a randomly chosen neighbor. • If offer accepted, prepare to do the move in the offer. • If offer not accepted, prepare to move alone (pick move with highest individual gain). • If receiver • If accepting offer, send acceptance message back to offerer, and prepare to do the move in the offer. • Else, prepare to move alone. • Move, according to probability p. • 3 message cycles per move (offer, accept, move). University of Southern California
Experimental Trends Monotonic (1-opt, 2-opt) Stochastic (1-opt, 2-opt) University of Southern California
Guarantees on Solution Quality • Guarantee of k-optimum as % of global optimum • Factors: • k (how local of an optimum) • m (maximum -arity of constraints) • n (number of agents) • constraint graph structure (if known) • Note: actual costs/rewards on constraints • distributed among agents, not known a priori University of Southern California
Guarantees on Solution Quality • Three results • Guarantees for: • Fully-connected DCOP graphs • Applies to all graphs (i.e. when graph is unknown) • Closed-form equation • Particular graphs • Stars • Rings • Closed-form equation • Arbitrary DCOP graphs • Linear program University of Southern California
Fully-Connected Graph • Reward of k-optimum in terms of global optimum • Independent of rewards • Independent of domain size • Provably tight (in paper) • One assumption: rewards are non-negative For binary graph (m=2), University of Southern California
Proof sketch / example a* = 11111 (global opt) a = 00000 (3-opt) Goal: express R(a) in terms of R(a*). a dominates: Â = {11100 11010 11001 10110 10101 10011 01110 01101 01011 00111} 10R(a) ≥ ∑R(â) 10R(a) ≥ 3R(a*) + 1R(a) Fully connected graph n = 5 agents m = 2 (binary constraints) k = 3 University of Southern California
Other graph types • Ring: Star: • Similar analysis, but exploit graph structure • Only consider  where connected subsets of k agents deviate University of Southern California
Proof sketch / example a* = 11111 (global opt) a = 00000 (3-opt) Goal: express R(a) in terms of R(a*). a dominates: Â = {11100 01110 00111 10011 11001} 5R(a) ≥ ∑R(â) 5R(a) ≥ 2R(a*) + 1R(a) Ring graph n = 5 agents m = 2 (binary constraints) k = 3 University of Southern California
Arbitrary graph • Arbitrary graph = linear program • Minimize R(a)/R(a*) such that: • for all dominated assignments â, R(a) - R(â) ≥ 0. • Each constraint S in DCOP = 2 variables in LP. • 1: RS(a) for reward on S in k-optimal solution • 2: RS(a*) for reward on S in global optimum • All other rewards on S taken as 0 (as before). Why ok? • R(a) and R(a*) don’t change • a still k-optimal (no k agents would change) • a* still globally optimal University of Southern California
Experimental Results • Designer can choose appropriate k or topology! University of Southern California
Experimental Results University of Southern California
Conclusions • Guarantees for k-optima in DCOPs as % of optimum • Despite not knowing constraint rewards • Helps choose algorithm to use • Helps choose topology to use • Big idea: • Single agent: Rationality -> Bounded Rationality • Multi agent: Global Optimality -> k-Optimality • Ability to centralize information (coordinate) is bounded • (only groups of k agents) • Guarantees on performance of “bounded coordination” University of Southern California
Readings • J. P. Pearce and M. Tambe, "Quality Guarantees on k-Optimal Solutions for Distributed Constraint Optimization Problems," in IJCAI-07. • R. T. Maheswaran, J. P. Pearce and M. Tambe, "Distributed Algorithms for DCOP: A Graphical-Game-Based Approach," in PDCS-04. • (just read algorithms - don’t need to read proofs) • W. Zhang, Z. Xing, G. Wang and L. Wittenburg, "An analysis and application of distributed constraint satisfaction and optimization algorithms in sensor networks," in AAMAS-03. University of Southern California