1 / 77

Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents

Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents. Prasanna Velagapudi Pradeep Varakantham Paul Scerri Katia Sycara. Motivation. Search & Rescue. Military C2. Convoy Planning. 100s to 1000s of robots, agents, people Complex, collaborative tasks

nitza
Télécharger la présentation

Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents Prasanna Velagapudi PradeepVarakantham Paul Scerri Katia Sycara D-TREMOR - AAMAS2011

  2. Motivation Search & Rescue Military C2 Convoy Planning • 100s to 1000s of robots, agents, people • Complex, collaborative tasks • Dynamic, uncertain environment • Offline planning Disaster Response D-TREMOR - AAMAS2011

  3. Motivation • Exploit three characteristics of these domains • Explicit Interactions • Specific combinations of states and actions where effects depend on more than one agent • Sparsity of Interactions • Many potential interactions could occur between agents • Only a few will occur in any given solution • Distributed Computation • Each agent has access to local computation • A centralized algorithm has access to 1 unit of computation • A distributed algorithm has access to Nunits of computation D-TREMOR - AAMAS2011

  4. Review: Dec-POMDP 1 2 : Joint Transition : Joint Reward : Joint Observation D-TREMOR - AAMAS2011

  5. Distributed POMDP with Coordination Locales [Varakantham, et al 2009] CL = Nature of time constraint (e.g. affects only same-time, affects any future-time) Time constraint Relevant region of joint state-action space D-TREMOR - AAMAS2011

  6. Distributed POMDP with Coordination Locales [Varakantham, et al 2009] : CL = : D-TREMOR - AAMAS2011

  7. D-TREMOR(extending TREMOR [Varakantham, et al 2009]) Decentralized auction Task Allocation Local Planning EVA POMDP solver Interaction Exchange Policy sub-sampling and Coordination Locale (CL) messages Model Shaping Prioritized/randomized reward and transition shaping D-TREMOR - AAMAS2011

  8. D-TREMOR: Task Allocation • Assign “tasks” using decentralized auction • Greedy, nearest allocation • Create local, independent sub-problem: D-TREMOR - AAMAS2011

  9. D-TREMOR: Local Planning • Solve using off-the-shelf algorithm (EVA) • Result: locally-optimal policies D-TREMOR - AAMAS2011

  10. D-TREMOR: Interaction Exchange Entered corridor in 95 of 100 runs: PrCLi= 0.95 No collision FindPrCLiand ValCLi: • Send CL messages to teammates: [Kearns 2002] +1 ValCLi= -7 Collision -6 D-TREMOR - AAMAS2011

  11. D-TREMOR: Model Shaping • Shape local model rewards/transitions based oninteractions Probability of interaction Interaction model functions Independent model functions D-TREMOR - AAMAS2011 11

  12. D-TREMOR: Local Planning (again) • Re-solve shaped local models to get new policies • Result: new locally-optimal policies new interactions D-TREMOR - AAMAS2011 12

  13. D-TREMOR: Adv. Model Shaping • In practice, we run into three common issues faced by concurrent optimization algorithms: • Slow convergence • Oscillation • Local optima • We can alter our model-shaping to mitigate these by reasoning about the types of interactions we have D-TREMOR - AAMAS2011

  14. D-TREMOR: Adv. Model Shaping • Slow convergence Prioritization • Assign priorities to agents, only model-shape collision interactions for higher priority agents • Can quickly resolve purely negativeinteractions • Negative interaction: when every agent is guaranteed to have a lower-valued local policy if an interaction occurs D-TREMOR - AAMAS2011

  15. D-TREMOR: Adv. Model Shaping • Oscillation Probabilistic shaping • Often caused by time dynamics between agents • Agent 1 shapes based on Agent 2’s old policy • Agent 2 shapes based on Agent 1’s old policy • Each agent only applies model-shaping with probability δ[Zhang 2005] • Breaks out of cycles between agent policies D-TREMOR - AAMAS2011

  16. D-TREMOR: Adv. Model Shaping • Local Optima Optimistic initialization • Agents cannot detect mixed interactions (e.g. debris) • Rescue agent policies can only improve if debris is cleared • Cleaner agent policies can only worsen if they clear debris I’m not going near the debris I’m not clearing the debris Ifno one is going through debris, I won’t clear it D-TREMOR - AAMAS2011

  17. D-TREMOR: Adv. Model Shaping • Local Optima Optimistic initialization • Agents cannot detect mixed interactions (e.g. debris) • Rescue agent policies can only improve if debris is cleared • Cleaner agent policies can only worsen if they clear debris • Let each agent solve an initial model that uses an optimistic assumption of interaction condition D-TREMOR - AAMAS2011

  18. Experimental Setup • D-TREMOR policies • Max-joint-value • Last iteration • Comparison policies • Independent • Optimistic • Do-nothing • Random • Scaling: • 10 to 100 agents • Random maps • Density • 100 agents • Concentric ring maps • 3 problems/condition • 20 planning iterations • 7 time step horizon • 1 CPU per agent D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs. (with some caveats) D-TREMOR - AAMAS2011

  19. Experimental Datasets Scaling Dataset Density Dataset D-TREMOR - AAMAS2011

  20. Experimental Results: Scaling D-TREMOR Policies Naïve Policies D-TREMOR - AAMAS2011

  21. Experimental Results: Density D-TREMOR rescues the most victims D-TREMOR does not resolve every collision +10 ea. -5 ea. D-TREMOR - AAMAS2011

  22. Experimental Results: Time Increase in time related to # of CLs, not # of agents # of CLs Active D-TREMOR - AAMAS2011

  23. Conclusions • D-TREMOR: Decentralized planning for sparse Dec-POMDPs with many agents • Demonstrated complete distributability, fast heuristic interaction detection, and local message exchange to achieve high scalability • Empirical results in simulated search and rescue domain D-TREMOR - AAMAS2011

  24. Future Work • Generalized framework for distributed planning under uncertainty through iterative message exchange • Optimality/convergence bounds • Reduce necessary communication • Better search over task allocations • Scaling to larger team sizes D-TREMOR - AAMAS2011

  25. Questions? D-TREMOR - AAMAS2011

  26. D-TREMOR - AAMAS2011

  27. Motivation • Scaling planning to large teams is hard • Need to plan (with uncertainty) for each agent in team • Agents must consider the actions of a growing number of teammates • Full, joint problem has NEXP complexity [Bernstein 2002] • Optimality is going to be infeasible • Find and exploit structure in the problem • Make good plans in reasonable amount of time D-TREMOR - AAMAS2011

  28. Motivation • Exploit three characteristics of these domains • Explicit Interactions • Specific combinations of states and actions where effects depend on more than one agent • Sparsity of Interactions • Many potential interactions could occur between agents • Only a few will occur in any given solution • Distributed Computation • Each agent has access to local computation • A centralized algorithm has access to 1 unit of computation • A distributed algorithm has access to Nunits of computation D-TREMOR - AAMAS2011

  29. Experimental Results: Density Do-nothing does the best? Ignoring interactions = poor performance D-TREMOR - AAMAS2011

  30. Experimental Results: Time Why is this increasing? D-TREMOR - AAMAS2011

  31. Related Work D-TREMOR Scalability Prioritized Planning TREMOR OC-Dec-MDP Dynamic Networks SPIDER Generality DPC EDI-CR TD-Dec-POMDP Optimal Decoupling Optimality JESP D-TREMOR - AAMAS2011 Structured Dec-(PO)MDP planners • JESP [Nair 2003] • TD-Dec-POMDP [Witwicki 2010] • EDI-CR [Mostafa 2009] • SPIDER [Marecki 2009] • Restrict generality slightly to get scalability • High optimality

  32. Related Work D-TREMOR Scalability Prioritized Planning TREMOR OC-Dec-MDP Dynamic Networks SPIDER Generality DPC EDI-CR TD-Dec-POMDP Optimal Decoupling Optimality JESP D-TREMOR - AAMAS2011 Heuristic Dec-(PO)MDP planners • TREMOR [Varakantham 2009] • OC-Dec-MDP [Beynier 2005] • Sacrifice optimality for scalability • High generality

  33. Related Work D-TREMOR Scalability Prioritized Planning TREMOR OC-Dec-MDP Dynamic Networks SPIDER Generality DPC EDI-CR TD-Dec-POMDP Optimal Decoupling Optimality JESP D-TREMOR - AAMAS2011 Structured multiagent path planners • DPC [Bhattacharya 2010] • Optimal Decoupling [Van den Berg 2009] • Sacrifice generality further to get scalability • High optimality

  34. Related Work D-TREMOR Scalability Prioritized Planning TREMOR OC-Dec-MDP Dynamic Networks SPIDER Generality DPC EDI-CR TD-Dec-POMDP Optimal Decoupling Optimality JESP D-TREMOR - AAMAS2011 Heuristic multiagent path planners • Dynamic Networks [Clark 2003] • Prioritized Planning [Van den Berg 2005] • Sacrifice optimality to get scalability

  35. Related Work D-TREMOR Scalability Prioritized Planning TREMOR OC-Dec-MDP Dynamic Networks SPIDER Generality DPC EDI-CR TD-Dec-POMDP Optimal Decoupling Optimality JESP D-TREMOR - AAMAS2011 Our approach: Fix high scalability and generality Explore what level of optimality is possible

  36. A Simple Rescue Domain Unsafe Cell Rescue Agent Clearable Debris Narrow Corridor Cleaner Agent Victim D-TREMOR - AAMAS2011

  37. A Simple (Large) Rescue Domain D-TREMOR - AAMAS2011

  38. Distributed POMDP with Coordination Locales (DPCL) • Often, interactions between agents are sparse Only fits one agent Passable if cleaned [Varakantham, et al 2009] D-TREMOR - AAMAS2011

  39. Distributed, Iterative Planning • Inspiration: • TREMOR [Varankantham 2009] • JESP [Nair 2003] • Reduce the full joint problem into a set of smaller, independent sub-problems • Solve independent sub-problems with local algorithm • Modify sub-problems to push locally optimal solutions towards high-quality joint solution D-TREMOR - AAMAS2011

  40. Distributed Team REshaping of MOdels for Rapid execution (D-TREMOR) • Reduce the full joint problem into a set of smaller, independent sub-problems (one for each agent) • Solve independent sub-problems with existing state-of-the-art algorithms • Modify sub-problems such that local optimum solution approaches high-quality joint solution Task Allocation Local Planning Interaction Exchange Model Shaping D-TREMOR - AAMAS2011

  41. D-TREMOR(extending [Varakantham, et al 2009]) Decentralized auction Task Allocation Local Planning EVA POMDP solver Interaction Exchange Policy sub-sampling and Coordination Locale (CL) messages Model Shaping Prioritized/randomized reward and transition shaping D-TREMOR - AAMAS2011

  42. D-TREMOR: Task Allocation • Assign “tasks” using decentralized auction • Greedy, nearest allocation • Create local, independent sub-problem: D-TREMOR - AAMAS2011

  43. D-TREMOR: Local Planning • Solve using off-the-shelf algorithm (EVA) • Result: locally-optimal policies D-TREMOR - AAMAS2011

  44. D-TREMOR: Interaction Exchange Finding PrCLi • Evaluate local policy • Compute frequency of associated si, ai [Kearns 2002]: Entered corridor in 95 of 100 runs: PrCLi= 0.95 D-TREMOR - AAMAS2011

  45. D-TREMOR: Interaction Exchange No collision Finding ValCLi • Sample local policy value with/without interactions • Test interactions independently • Compute change in value if interaction occurred [Kearns 2002]: +1 ValCLi= -7 Collision -6 D-TREMOR - AAMAS2011

  46. D-TREMOR: Interaction Exchange • Send CL messages to teammates: • SparsityRelatively small # of messages D-TREMOR - AAMAS2011

  47. D-TREMOR: Model Shaping • Shape local model rewards/transitions based on remote interactions Probability of interaction Interaction model functions Independent model functions D-TREMOR - AAMAS2011 47

  48. D-TREMOR: Local Planning (again) • Re-solve shaped local models to get new policies • Result: new locally-optimal policies new interactions D-TREMOR - AAMAS2011 48

  49. D-TREMOR: Adv. Model Shaping • In practice, we run into three common issues faced by concurrent optimization algorithms: • Slow convergence • Oscillation • Local optima • We can alter our model-shaping to mitigate these by reasoning about the types of interactions we have D-TREMOR - AAMAS2011

  50. D-TREMOR: Adv. Model Shaping • Slow convergence Prioritization • Majority of interactions are collisions • Assign priorities to agents, only model-shape collision interactions for higher priority agents • From DPP: prioritization can quickly resolve collision interactions • Similar properties for any purely negative interaction • Negative interaction: when every agent is guaranteed to have a lower-valued local policy if an interaction occurs D-TREMOR - AAMAS2011

More Related