1 / 55

DARPA ITO/MARS Program

DARPA ITO/MARS Program. Control and Coordination of Multiple Autonomous Robots Vijay Kumar GRASP Laboratory University of Pennsylvania http://www.cis.upenn.edu/mars. Motivation. We are interested in coordinated control of robots manipulation vision-based control Large number of modes

lawson
Télécharger la présentation

DARPA ITO/MARS Program

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DARPA ITO/MARS Program • Control and Coordination of Multiple Autonomous Robots • Vijay Kumar • GRASP Laboratory • University of Pennsylvania • http://www.cis.upenn.edu/mars

  2. Motivation • We are interested in coordinated control of robots • manipulation • vision-based control • Large number of modes • Scalability • Individual modes (behaviors) are • well understood, but the interaction between them is not. • Software design: modes - bottom up, protocols - top down

  3. Learning Algorithms Analysis MARS CHARON Code (High level language) CHARON to Java Translator Java Libraries Drivers Java Code Control Code Generator Simulator Code Generator Human Interface

  4. Outline of the Talk • Language and software architecture • CHARON • agents and modes • examples • Reactive control algorithms • mode switching • hierarchical composition of reactive control algorithms • results • From reactive to deliberative schemes • Simulation • Reinforcement learning • learn mode switching and composition rules • Future work

  5. Participants • Rajeev Alur • Aveek Das • Joel Esposito • Rafael Fierro • Radu Grosu • Greg Grudic • Yerang Hur • Vijay Kumar • Insup Lee • Ben Southall • John Spletzer • Camillo Taylor • Lyle Ungar

  6. Architectural Hierarchy in CHARON Agent Agent1 Agent2 sensor sensor processor processor actuator actuator Input Port Output Port Each agent can be represented as a parallel composition of sub-agents

  7. Each agent consists of modes or behaviors Modes can in turn consist of sub modes Behavioral Hierarchy in CHARON Modes main awayTarget atTarget sensing control Entry Port Exit Port

  8. CHARON • Individual components described as agents • Composition, instantiation, and hiding • Individual behaviors described as modes • Encapsulation, instantiation, and Scoping • Support for concurrency • Shared variables as well as message passing • Support for discrete and continuous behavior • Well-defined formal semantics

  9. Pursue Motion Controller Avoid Obstacle Collision Recovery Range Mapper Robot Position Estimator Target Detector Edge Detector Collision Detector Color Blob Finder Actuators Frame Grabber Reactive Behaviors based on Vision

  10. Robot Agent robotController = rmapper || cdetector || explore rmapper = rangeMapper() [rangeMap = rM]; cdetector = collisionDetector() [collisionDetected = cD]; explore = obstacleAvoider() [collisionDetected, rangeMap = cD, rM]; agent explore(){ mode top = exploreTopMode() } agent rangeMapper(){ mode top = rangeMapperTopMode() } agent collisionDetector(){ mode top = collisionDetectorTopMode() }

  11. Collision Recovery mode collisionRecoveryMode(real recoveryDuration, real c) { entry enPt; exit exPt; readWrite diff analog real x; readWrite diff analog real phi; readWrite diff analog real recoveryTimer; diffEqn dRecovery { d(x) = -c; d(phi) = 0; d(recoveryTimer) = 1.0 } inv invRecovery { 0.0 <= recoveryTimer && recoveryTimer <= recoveryDuration } } // end of mode collisionRecoveryMode

  12. Obstacle Avoidance mode obAvoidanceMode(){ entry enPt; exit exPt; read discrete bool collisionDetected; read RangeMap rangeMap; readWrite diff analog real x; readWrite diff analog real phi; diffEqn dObAvoidance {d(x) = computeSpeed(rangeMap); d(phi) = computeAngle(rangeMap)} inv invObAvoidance {collisionDetected = false} initTrans from obAvoidanceMode to obAvoidanceMode when true do{x = 0.0; phi = 0.0} }

  13. Explore mode exploreTopMode() { entry enPt; read discrete bool collisionDetected; read RangeMap rangeMap; private diff analog real recoveryTimer; mode obAvoidance = obAvoidanceMode() mode collisionRecovery = collisionRecoveryMode(recoveryDuration, c) initTrans from obstacleAvoiderTopMod to obAvoidance when true do {recoveryDuration = 10.0; c = 1.0} // initialization trans OaToCr from obAvoidance.exPt to collisionRecovery.enPt when (collisionDetected == true) do {} trans CrToOa from collisionRecovery.exPt to obAvoidance.enPt when (recoveryTimer == recoveryDuration) do {recoveryTimer = 0.0} // reset the timer }

  14. . . . phi = -k2 phi = 0 . x=k1r . x=-c recoveryTimer = 1 Explore Explore rangeMap collisionDetected collisionRecovery dRecovery obAvoidance dObAvoidance collision timeOut

  15. Mode switching Multiple levels of abstraction of data from sensor Parallel composition of software agents obstacle avoidance wall following Vision Based Control with Mobile Robots

  16. Explore: Wall Following with Obstacles

  17. Explore, Search, and Pursue

  18. pos = target local diff analogtimer r2Est1 Robot1 dTimer r2Est2 . pos.x = v * cos(phi) pos.y = v * sin(phi) . r1Est1 r1Est2 timer/updateFreq = 0 . pos timer = 1 omega = k * (theta – phi) Multiagent Control awTarget dPlan iAway atTarget dStop iAt arrive moving dSteer aOmega iFreq sense sensing dStop iConst arrive move

  19. Multiagent Control

  20. Modular Simulation • Goal • Simulation is efficient and accurate • Integration of modes at different time scales • Integration of agents at different time scales • Modes are simulated using local information • Submodes are regarded as black-boxes • Submodes are simulated independently of other ones • Agents are simulated using local information • Agents are regarded as black-boxes • Agents are simulated independently of other ones

  21. . . . y x z Time Round of a Mode (Agent) 1. Get integration time d and invariants from the supermode (or the scheduler). d, xInv 2. While (time t = 0; t <= d) do: dt, yInv - Simplify all invariants. - Predict integration step dt based on dand the invariants. - Execute time round of the active submode and get state s and time elapsed e. e, sz - Integrate for time e and get new state s. sy t+e, - Return s and t+e if invariants were violated. - Increment t = t+e. 3. Return s and d

  22. time t+dt d e t Agents A1 A3 A2 Modular Simulation - Global execution • 1. Pick up the agents with minimum and second • minimum reached time. • 2. Compute the time round intervald for • the minimum agent, i.e. A2, such that its • absolute time may exceed with at most dt • the time reached by the second minimum • agent, i.e. A1. • 3. The time round may end before the time • intervald was consumed if the invariants • of A2 were violated. Then, an actual time • increment would be e. • 4. The agent A2 executes an update round to • synchronize the discrete variables with the • analog ones. • 5. The state of A2 get visible to other agents.

  23. x1 x2 ratio of largest to smallest step size constant x3 time step size coupling Modular Multi-rate Simulation Use a different time step for each component to exploit multiple time scales, to increasing efficiency. • “Slowest-first” order of integration • Coupling is accommodated by using interpolants for slow variables • Tight error bound: O( hm+1 )

  24. Synthesis of controllers include models of uncertainty Sensor fusion include models of noise Modular simulation Automatic detection of events mode switching transitions, guards Simulation and Analysis of Hierarchical Systems with Mode Switching

  25. NASREM Architecture [Albus, 80] Implementations Demo III NASA robotic systems Traditional Model of Hierarchy

  26. dynamics output input output dynamics Event detection Given: g(x) x(t) We re-parameterize time by controlling the integration step size: Event ! Using feedback linearization we select our “speed” (step-size) along the integral curves to converge to the event surface

  27. . y = 2u x1 < a x2 = -1 Env u Hyst Hysteresis strMinus dY iStrM aStrM . x1 = u inc dX1 s2u up dY iUp aUp dec inc -a a+2 u2p 1 dec dX1 strPlus dY iStrP aStrP -1 -(a+2) a

  28. -a a+2 1 -1 -(a+2) a Global versus Modular Simulation • Hysteresis example • 2 levels of hierarchy • global state is two dimensional • Significant potential for more complex systems

  29. Modular Simulation Error

  30. Current Implementation Status CHARON Specification • Work to date • CHARON semantics • Parser for CHARON • Internal representation • Type checker • Current work • Modular simulation scheme • Internal representation generator CHARON Parser Type Checker Syntax Tree Internal Representation Generator Internal Representation Control Code Generator Simulator Generator Model Checker

  31. Reactive to Deliberative Schemes • Reactive scheme is a composition of • go to target • collision avoidance • Deliberative scheme • preplanned path around the nominal model • Reactive schemes • robust • easy to implement • may be limited in terms of being able to accomplish complex tasks • may not compare favorably to recursive implementations of deliberative controllers Nominal Model Obstacle

  32. Toward a composition of reactive and deliberative decision making • u1 - vector field specified by a reactive planner • u2 - vector field specified by a deliberative planner • If u1 Î U, u2 Î U, then • au1 + (1- a) u2Î U

  33. Composition of reactive and deliberative planners • Framework for decision making • U is the set of available control policies • Y is the uncertainty set • uncertainty in the environment model • uncertainty in dynamics • uncertainty in localization • Best decision under the worst uncertainty

  34. Results • Minimization • weighting prior information and current information • resolving the discrepancy between prior and current plans • Closure property of “basis behaviors” ensures robustness • Requires a priori calculation of roadmap Worst Case Outcome Better than Worst-Case Outcomes

  35.  cross-section x cross-section Detailed Analysis • Global saddle point does not exist due to non-smooth solution Min-Max Max-Min Cost Function

  36. More Results Open Loop Recursive Best under Worst Case Uncertainty Open Loop

  37. Obstacle dynamics are known, exact inputs are not Deliberative and Reactive Behaviors in a Dynamic Setting obstacle robot target

  38. Paradigm for Learning • Hierarchical structure allows learning at several levels • Lowest level • parameter estimation within each mode • algorithms for extracting the information state (features, position and velocity, high level descriptors) • Intermediate level • select best mode for a situation • determine the best partitioning of states for a given information state • Advanced level • transfer knowledge (programs, behaviors) between robots and human • Learning at any level forces changes at others Information state Situation partitions Modes Sensory information Action space

  39. Successful RL (Kaelbling, et al 96) Low dimensional discrete state space 100,000’s training runs necessary Stochastic search required Robotics Systems Large, continuous state space A large number of training runs (e.g, 100, 000) may not be practical Stochastic search not desirable Reinforcement Learning and Robotics

  40. Boundary Localized Reinforcement Learning • Our approach to robot control • Noisy state space • Deterministic modes • Our approach to Reinforcement Learning • Search only mode boundaries • Ignore most of the state space • Minimize stochastic search • RL using no stochastic search during learning

  41. Mode Switching Controllers Mode of operation (action ai executed) Parameterization of boundary Mode Boundaries State Space

  42. Reinforcement Learning for Mode Switching Controllers Initial Guess (prior knowledge) R Reinforcement Feedback “Optimal” parameterization

  43. Reinforcement Learning • Markov Decision Process • Policy • Reinforcement Feedback (environment): rt • Goal: modify policy to maximize performance • Policy Gradient Formulation

  44. Why Policy Gradient RL? • Computation linear in the number of parameters q • avoids blow-up from discretization as with other RL methods • Generalization in state space is implicitly defined by the parametric representation • generalization is important for high dimensional problems

  45. Key Result #1 • Any q parameterized probabilistic policy can be transformed into a approximately deterministic policy parameterized by q • Deterministic everywhere except near mode boundaries.

  46. Key Result #2 • Convergence to a locally optimal mode switching policies is obtained by searching near mode boundaries • All other regions of the state space can be ignored • This significantly reduces the search space

  47. Stochastic Search Localized to Mode Boundaries Stochastic Search Regions State Space

  48. Key Result #3 • Reinforcement learning can be applied to locally optimizing deterministic mode switching policies without using stochastic search if • robot takes small steps • value of executing actions (Q) is smooth w.r.t. state • These conditions are met almost everywhere in typical robot applications

  49. Deterministic Search at Mode Boundaries Search Regions State Space

  50. Simulation • State Space: Robot Position • Boundary Definitions: Gaussian centers and widths • 2 parameters per Gaussian for each dimension. • 20 parameters. • 2 types of modes: Toward a goal, away from an obstacle. • Reward: +1 for reaching a goal, -1 for hitting an obstacle.

More Related