200 likes | 297 Vues
Coordination of heterogeneous teams in uncertain environments requiring real-time adaptation and human supervision. Includes tasks with different resources and unpredictable events, addressing complex control challenges with scalable algorithms.
E N D
Hierarchical mission control of automata with human supervision Prof. David A. Castañon Boston University
Problem of Interest • Coordination of heterogeneous teams to accomplish tasks in uncertain, risky environments • Vehicles with different capabilities, resources • Some resources are renewable (sensors), others are not • Tasks are spatially distributed, require combinations of capabilities • Successful completion of tasks not guaranteed • Likelihood of success depends on resources assigned • Tasks arrive, depart randomly • Task types may be unknown until observed • Vehicles may fail randomly, depending on trajectories • Key aspect: Real-time adaptation to events • Human Supervision • Determine task priority/value • Modify individual vehicle task assignments when desired • Determine specific vehicle schedules when desired
Experiment model • Multiple robots search for and perform tasks at BU’s Mechatronics Lab
Why is this a hard problem • Uncertain environment and dynamics • Unknown targets • Uncertain effectiveness of sensing, actions Requires highly adaptive system, anticipative of and responsive to new information Hedge against loss of assets, new arrivals, action failures, … • Diverse set of vehicles with multiple capabilities • Dynamic role selection, ad hoc teaming • Dual control problems: Manage both information acquisition and action • Trade off search and sensing versus actions • Dynamic coupling of available capabilities to achieve desired effects • Support and adapt to human control inputs • Goals, constraints, fixed decisions • Provide information to assess effects of changes
Classes of algorithms • Operations Research • Deterministic and stochastic multi-vehicle task assignment and scheduling • Large vehicles, small tasks, limited cooperation, homogeneous activities • No risk, limited uncertainty to new task arrivals, departures independent of vehicle actions • Search theory and sensor management • Large-scale resource allocation and integer programming • Stochastic Control • Control of stochastic queuing systems in communications • Single vehicle routing and low level vehicle trajectory control • Swarm control approaches with stability and performance guarantees • Homogeneous vehicles • Approximate dynamic programming techniques • Not focused on combinatorial optimization in general, rare exceptions • Model predictive control of complex stochastic systems • Artificial Intelligence/Computer Science • Constraint satisfaction, temporal planning systems • Non-real time, off-line combinatorial constraint-based search • Limited incorporation of risk/reward, information dynamics • Behavioral control in robotics for simple tasks • Reinforcement learning for stochastic planning in well-defined repeated environments (e.g. games)
Proposed Approach: Hierarchical Model Predictive Control • Hierarchical approach: avoid combinatorial explosion of complexity through decomposition Team strategy selection: address uncertainty • Allocate team capabilities to tasks, hedging against task type uncertainty, new task arrivals, action success probabilities • Simplify distribution of resources across vehicles Team activity scheduling: address combinatorial complexity • Allocate team activities to platforms • Select schedules and routes • Model Predictive Control: resolve algorithms in response to new information or human directives • Receding horizon control • Respond to new tasks, changes in task status, platform loss, …. • Adapt to human guidance and constraints Requires fast algorithms for real-time control
Task 1 Task 1 Task 1 Task N+1 Type 1 Type 2 Type 3 Type 4 Task N Task N Task N Task N+M Team Strategy Selection • Stochastic dynamic programming formulation • Multistage formulation, with outcomes observed after each stage Resources Stage 1 Stage 2 Stage 3
Notation • N tasks i = 1, …, N • M resource types j = 1, …, M • Assume independence of all task completion events
Example: Two-Stage Single Resource Problem • Define a task completion state after each stage • Task completion state observed after each stage • Decisions are now feedback policies • Task completion state dynamics: Controlled Markov chain • Resources assigned determine transition probabilities • Independence of completion event outcomes decouples transition dynamics across tasks
Two-Stage Problem Statement • Objective: minimize expected uncompleted task value plus expected resource use costs • Constraints: Resource limits
Relaxed Two-Stage Problem • Original problem is stochastic integer program • P-space complete, hard • Expand set of admissible feedback strategies in second stage • Generates lower bound to optimal value function • New constraint on average number of resources • Relaxes exponential number of constraints to a single constraint • Simple result: All feasible strategies in original problem are feasible in current problem • Lower bound on original performance • Idea: select optimal strategies for lower bound
Characterization of Optimal Strategies • Important concept: Mixed local strategies • Local strategies: feedback strategies such that the actions on a given task depend only on the state of that task • Mixed strategy: random combination of pure strategies • Mixed strategies may achieve better performance than pure strategies in relaxed problem • Theorem: In relaxed problem, for every pure strategy, there is a mixed local strategy which uses same resources and achieves same expected performance • Proven by construction • Restricts search to local mixed strategies • Fast algorithm for solution of optimal strategies using convex optimization principles! • Can solve exactly in Complexity O((M1+N)log(N))
Comments and Extensions • MPC approach guarantees feasibility of approximate problem solution in terms of original problem • Obtain approximate solution, but implement only first stage allocations • Resolve problem when new observations are available, with receding horizon • Fast algorithm allows for rapid computation • Main extensions: • Multiple stages • Multiple resource types • Multiple renewable and non-renewable resources • Solution NP-hard, but can solve approximately • Multiple task types: sensing and action • Must sense to observe outcomes • New task arrivals, discovered by searching • Unknown task types: Detect presence, but must observe to determine task type • Task departures, deadlines
Team Activity Scheduling • Inputs from team strategy selection • Desired resources assigned to each task in current period • Desired resources held in reserve when future information is collected • Guidance and constraints from human operators • Task values, select platform task assignments, select task resource assignments • Known parameters • Vehicle locations and resources in each vehicle, task locations • Problem: assign resource deliveries for tasks to individual vehicles, and select sequence of activities for vehicle • Deterministic multi-vehicle routing problem (VRP) • NP-hard, with many useful approximate approaches available
Team Activity Assignment Formulation • VRP is anNP-hard problem (traveling salesman) wrapped in an NP-hard problem (bin packing). • Classical Application: Truck Routing Problem Formulation Subject to: Visit Customers N vehicles to route Integrality where Discounted Cost
Team Activity Assignment Algorithm • Candidate algorithm: Tabu Search • Locally perturbs trial solutions • Uses “Tabu” list to avoid local minima • Evaluated by AFIT for UAV routing • Fast replanning, leads to rapid response to events • Handles time window constraints instead of precedence constraint • Significant extensions to date • Multiple task types • Multiple resource types • Compound tasks involving multiple vehicles • Alternative algorithms (AFOSR-sponsored) • Mixed Integer-Linear Programming, J. How, MIT • Receding horizon controller, C. Cassandras, BU
Comments • Algorithms available for dynamic control of automata performing tasks in uncertain, risky environments • Fast generation of desired courses of action • Hedge against uncertain outcomes, adapt to new information • Operator interaction through value structure, plus fixed decision variables and constraints • Allows for “micro”-management • Very limited insight into effects of operator inputs on automata behavior and performance • Fundamental problem for this MURI research: prediction of course of action in the presence of uncertainty • Not a single plan, but a contingency tree of possible actions/responses • Hard to modify, approve
Experimental Platform for Research • Multiple robots search for and perform tasks at BU’s Mechatronics Lab • Can provide operator control of some platforms: human-automata teams • Control information displayed, risk to each operator using video
Future Activities • Implement research experiments involving tasks with performance uncertainty in test facility • Vary tempo, size, uncertainty, information • Develop algorithms to interact with operators in alternative roles • Supervisory control • Team partners • Extend existing algorithms to different classes of tasks • Area search, task discovery, risk to platforms • Develop algorithms to assist operators in predicting behavior of automata teams in uncertain environments • Collaborate with MURI team to design and analyze experiments involving alternative structures for human-automata teams