1 / 21

Planning Under Uncertainty

Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents. Three Strategies for Handling Uncertainty. Reactive strategies Replanning Proactive strategies Anticipating future uncertainty Shaping future uncertainty (active sensing).

Télécharger la présentation

Planning Under Uncertainty

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Planning Under Uncertainty

  2. Sensing error Partial observability Unpredictable dynamics Other agents

  3. Three Strategies for Handling Uncertainty • Reactive strategies • Replanning • Proactive strategies • Anticipating future uncertainty • Shaping future uncertainty (active sensing)

  4. Uncertainty in Dynamics • This class: the robot has uncertaindynamics • Calibration errors, wheel slip, actuator error, unpredictable obstacles • Perfect (or good enough) sensing

  5. Modeling imperfect dynamics • Probabilistic dynamics model • P(x’|x,u): a probability distribution over successors x’, given state x, control u • Markov assumption • Nondeterministic dynamics model • f(x,u) -> a set of possible successors

  6. target robot Target Tracking • The robot must keep a target in its field of view • The robot has a prior map of the obstacles • But it does not know the target’s trajectory in advance

  7. target robot Target-Tracking Example • Time is discretized into small steps of unit duration • At each time step, each of the two agents moves by at most one increment along a single axis • The two moves are simultaneous • The robot senses the new position of the target at each step • The target is not influenced by the robot (non-adversarial, non-cooperative target)

  8. ([i,j], [u,v], t) • ([i+1,j], [u,v], t+1) • ([i+1,j], [u-1,v], t+1) • ([i+1,j], [u+1,v], t+1) • ([i+1,j], [u,v-1], t+1) • ([i+1,j], [u,v+1], t+1) right Time-Stamped States (no cycles possible) • State = (robot-position, target-position, time) • In each state, the robot can execute 5 possible actions : {stop, up, down, right, left} • Each action has 5 possible outcomes (one for each possible action of the target), with some probability distribution [Potential collisions are ignored for simplifying the presentation]

  9. Rewards and Costs The robot must keep seeing the target as long as possible • Each state where it does not see the target is terminal • The reward collected in every non-terminal state is 1; it is 0 in each terminal state [ The sum of the rewards collected in an execution run is exactly the amount of time the robot sees the target] • No cost for moving vs. not moving

  10. Expanding the state/action tree ... horizon h horizon 1

  11. Assigning rewards • Terminal states: states where the target is not visible • Rewards: 1 in non-terminal states; 0 in others • But how to estimate the utility of a leaf at horizon h? ... horizon h horizon 1

  12. d target robot Estimating the utility of a leaf ... • Compute the shortest distance d for the target to escape the robot’s current field of view • If the maximal velocity v of the target is known, estimate the utility of the state to d/v [conservative estimate] horizon h horizon 1

  13. Selecting the next action • Compute the optimal policy over the state/action tree using estimated utilities at leaf nodes • Execute only the first step of this policy • Repeat everything again at t+1… (sliding horizon) ... Real-time constraint: h is chosen so that a decision can be returned in unit time [A larger h may result in a better decision that will arrive too late !!] horizon h horizon 1

  14. Pure Visual Servoing

  15. Computing and Using a Policy

  16. Markov Decision Process • Finite state space X, action space U • Reward function R(x) • Optimal value function V(x) Bellman equations • If x is terminal: V(x) = R(x) • If x is non-terminal: V(x) = R(x) + maxuUSx’XP(x’|x,u)V(x’) • P*(x) = arg maxuUSx’XP(x’|x,u)V(x’)

  17. Solving Finite MDPs • Value iteration • Improve V(x) by repeatedly “backing-up” value of best action • Policy iteration • Improve p(x) by repeatedly computing best action • Linear programming

  18. Continuous Markov Decision Processes • Discretize state and action space • E.g., grids, random samples • => Perform value/policy iteration/LP formulation on discrete space • Or discretize value function • Basis functions • => Iterative least squares approaches

  19. Extra Credit for Open House • +5% of final project grade • Preliminary demo & description by next Tuesday • Open House on April 16, 4-7pm

  20. Upcoming Subjects • Sensorless planning (Goldberg, 1994) • An example of a nondeterministic uncertainty model • Planning to sense (Gonzalez-Banos and Latombe 2002)

  21. Comments on Uncertainty • Reasoning with uncertainty requires representing and transforming sets of states • This is challenging in high-D spaces • Hard to get accurate, sparse representation • Multi-modal, nonlinear, degenerate distributions • Breakthroughs would have dramatic impact

More Related