Software Multiagent Systems: Lecture 10
In this lecture, we explore Distributed Constraint Optimization Problems (DCOP) and the Branch and Bound search method within multiagent systems. We discuss the unique challenges of asynchronous backtracking, the prioritization of agents, and strategies for efficient solution revisiting. Key concepts include weak backtracking, lower bounds, and the significance of agent communication in achieving optimal solutions. By understanding these dynamics, students will gain insights into enhancing collaboration among autonomous agents and tackling complex decision-making problems.
Software Multiagent Systems: Lecture 10
E N D
Presentation Transcript
Software Multiagent Systems: Lecture 10 Milind Tambe University of Southern California tambe@usc.edu
Announcements From now on, slides posted on our class web site Password: teamcore Homework answers will be sent out by email next week
DCOP Definition di dj f(di,dj) 1 2 2 0 • Variables {x1,x2,…,xn} distributed among agents • Domains D1,D2,...,DN, • Link functions fij: Di x Dj→ N. Find assignment A* s.t. F(A*) is min, F(A) =Sfij(di,dj), xidi,xj dj in A x1 x1 x1 Cost = 0 Cost = 4 Cost = 7 x2 x2 x2 x4 x3 x4 x3 x3 x4
Branch and Bound Search • Familiar with branch and bound search?
Synchronous Branch and Bound (Hirayama97) di dj f(di,dj) 1 2 2 0 • Agents prioritized into chain • Choose value, send partial solution (with cost) to child • When cost exceeds upper bound, backtrack • Agent explores all its values before reporting to parent Concurrency? Asynchrony? x1 x1 x1 x1 x1 0 ?? x2 x2 x2 x2 x2 1 x3 x3 x3 x3 x3 4 = UB 3 x4 x4 x4 x4 x4
DCOP before ADOPT • Branch and Bound • Backtrack condition - when cost exceeds upper bound • Problem – sequential, synchronous • Asynchronous Backtracking • Backtrack condition - when constraint unsatisfiable • Problem - only hard constraints allowed • Observation: Backtrack only when sub-optimality is proven
Adopt: Idea #1 • Weak backtracking: When lower bound gets too high Why lower bounds? • Allows asynchrony! • Yet allows quality guarantees Downside? • Backtrack before sub-optimality is proven • Cant throw away solutions; need to revisit!
Adopt: Idea #2 • Solutions need revisiting • How could we do that? • Remember all previous solutions • Efficient reconstruction of abandoned solutions
Adopt Overview x1 x2 x3 x4 • Agents are ordered in a DFS TREE • Constraint graph need not be a tree
Adopt Overview x4 di dj f(di,dj) 1 2 2 0 x1 x2 x3 x4 • Agents concurrently choose values • VALUE messages sent down • COST messages sent uponly to parent • THRESHOLDmessages sent downonly to child x1 Constraint Graph VALUE messages COST messages x2 THRESH messages x3
Asynchronous, concurrent search di dj f(di,dj) 1 2 2 0 x1 x2 x3 x4 Each variable has two values: b and w Each initialized with a lower-bound of 0
Asynchronous, concurrent search di dj f(di,dj) 1 2 2 0 x1 x1 x1 x1 0 1 2 x2 x2 x2 x2 1 2 x3 x4 x3 x4 x3 x4 x3 x4 Concurrently report local costs,with context e.g. x3 sends cost 2 with x1=b,x2=b x1 switches to “better?” value • x2, x3 switch to best value, • report cost, with context • x2 disregards x3’s report (context mismatch) OptimalSolution x1 x2 x3 x4 . . . Concurrently choose, send to descendents
Asynchronous, concurrent search Algorithm: • Agents are prioritized into tree • Agents: • Initialize lower bounds of values to zero • Concurrently choose values, send to all connected descendents. • Choose the best value given what ancestors chose: • immediately send cost message to parent • Cost = lower bound + cost with ancestors • Costs asynchronously reach parent • Asynchronous costs: context attachment
Weak Backtracking • Suppose parent has two values, “white” and “black” Explore “white” first Receive cost msg Now explore “black” parent parent parent LB(w) = 0 LB(b) = 0 LB(w) = 2 LB(b) = 0 LB(w) = 2 LB(b) = 0 Receive cost msg Go back to “white” Termination Condition True parent parent parent LB(w) = 2 LB(b) = 3 LB(w) = 2 LB(b) = 3 LB(w)=10 =UB(w) LB(b)=12 . . . .
Key Lemma for soundness/correctness di dj f(di,dj) 1 2 2 0 Lemma: Assuming no context change, an agent’s report of cost is non-decreasing and is never greater than the actual cost. Inductive Proof Sketch: Leaf agents never overestimate cost. Each agent sums the costs from its children and chooses its best choice and reports to parent. x1 x1 x1 5 0 x2 x2 x2 2 1 x4 x4 x3 x3 x4 x3 5 is an OVERestimate! Instead, x2 switches to unexplored value, reports lower bound x2 receives costs from children, computes total cost of 2 + 1 + 2 = 5.
Revisiting Abandoned Solutions Problem • reconstructing from scratch: inefficient • remembering solutions: expensive Solution • remember only lower bounds: polynomial space • use lower bounds to efficiently re-search Chain Ordering parent lower bound = 10 threshold = 10 single child
Revisiting Abandoned Solutions Solution • remember only lower bounds – polynomial space • use lower bounds to efficiently re-search • Suppose parent has two values, “a” and “b” Explore “a” First Now explore “b” Return to “a” LB(a) = 10 LB(b) = 11 parent parent parent LB(a) = 10 LB(b) = 0 threshold = 10 single child single child single child
Backtrack Thresholds • agent i received threshold = 10 from parent Explore “white” first Receive cost msg Stick with “white” agent i agent i LB(w) = 0 LB(b) = 0 threshold = 10 LB(w) = 2 LB(b) = 0 threshold = 10 LB(w) = 2 LB(b) = 0 threshold = 10 Receive more cost msgs Now try black Key Point: Don’t change value until LB(current value) > threshold. LB(w) = 11 LB(b) = 0 threshold = 10 LB(w) = 11 LB(b) = 0 threshold = 10
Tree Ordering lower bound = 10 parent thresh = ? thresh = ? multiple children Idea:Rebalance threshold Time T1 Time T2 Time T3 parent parent parent thresh=5 cost=6 thresh=5 thresh=4 thresh=6
Evaluation of Speedups • Conclusions • Adopt’s asynchrony and parallelism yields significant efficiency gains • Sparse graphs (density 2) solved optimally, efficiently by Adopt.
Metric: Cycles • Cycle = one unit of algorithm progress in which all agents receive incoming messages; perform computation, send outgoing messages • Independent of machine speed, network conditions, etc. Outgoing comm
Number of Messages • Conclusion • Communication grows linearly • only local communication (no broadcast)
Bounded error approximation root lower bound = 10 thresh = 10 + b • Motivation Quality control for approximate solutions • Problem User provides error bound b • Goal Find any solution S where cost(S) cost(optimal soln) + b • Adopt’s ability to provide quality guarantees naturally leads to bounded error approximation!
Evaluation of Bounded Error • Conclusion • Varying b is an effective method for doing time-to-solution/solution-quality tradeoffs.
Adopt summary – Key Ideas • First-ever optimal, asynchronous algorithm for DCOP • polynomial space at each agent • Weak Backtracking • lower bound based search method • Parallel search in independent subtrees • Efficient reconstruction of abandoned solutions • backtrack thresholdsto control backtracking • Bounded error approximation • sub-optimal solutions faster • bound on worst-case performance
Discussion • Can we improve Adopt efficiency? • Can we allow n-ary constraints in Adopt? • Does Adopt preserve privacy? • What are some key applications of Adopt?
New Ideas for Efficiency • Communication Structure • Idea: Reach a solution faster if end-to-end messaging is shorter • Application: Shorter depth trees in ADOPT • Intelligent Preprocessing of Bounds • PASSUP heuristic: bounds via one-time message up the tree • PASSUP extended via a framework of several preprocessing heuristics
Performance (EAV) Orders of Magnitude Speedup!
OptAPO 2004 • OPTAPO
J. Davin, P. J. Modi , "Impact of Problem Centralization in Distributed Constraint Optimization Algorithms," Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2005.
Defining DCOP Centralization • Centralization: Aggregating problem information into a single agent • information was initially distributed among multiple agents, and • aggregation results in a larger local search space. For example, constraints on external variables canbe centralized.
Motivation • Adopt and OptAPO: • Adopt does no centralization. • OptAPO does partial centralization. • OptAPO completes in fewer cycles than Adopt for graph coloring • But, cycles do not capture performance differences • When different levels of centralization.
Metric: Cycles • What is missing in measuring cycles? Outgoing comm
Key Questions • How do we measure performance of DCOP algorithms that differ in their level of centralization? • How do Adopt and OptAPO compare when we use such a measure?
Results • Tested on graph coloring problems, |D|=3 (3-coloring). • # Variables = 8, 12, 16, 20, with link density = 2n or 3n. • 50 randomly generated problems for each size. CCC: Cycles: OptAPO takes fewer cycles, but more constraint checks.