60 likes | 201 Vues
This study presents an accelerated gradient method for multi-agent planning within factored Markov Decision Processes (MDPs). We aim to optimize both shared resource constraints and individual objectives, providing a solution that is both efficient and distributed. By employing piece-wise linear constraints and using Lagrangian relaxation, we demonstrate a fast solver through value iteration. Additionally, we introduce FISTA for factored MDPs with linear objectives, leveraging causal entropy regularization for improved policy uniformity. Our findings reveal that convergence gains can outweigh smoothing errors.
E N D
An Accelerated Gradient Method for Multi-Agent Planning in Factored MDPs Sue Ann Hong Geoff Gordon Carnegie Mellon University
Multi-agent planning Optimize Shared constraints resources Individual constraints Individual objective
Factored MDPs[Guestrin et al., 2002] • Want: an efficient, distributed solver Piece-wise linear constraints on shared resources Optimize Shared constraints resources MDP: maximize linear reward Fast solver: value iteration Individual constraints Individual objective
Distributed optimizationLagrangian relaxation Resource 1 @ $100 • How to set the prices? Gradient-based methods. 2 NO NO 1 2 Solve in a distributed fashion $300 $50 $80 $200 $100
FISTA for factored MDPs • linear objective : augment with a strongly convex function: causal entropy[Ziebart et al., 2010] • Usually regularization towards a more uniform policy • Retains a fast individual planner (softmax value iteration) • Introduces smoothing error (to the linear objective) • We show that the gain in convergence can outweigh the approximation (smoothing) error.