460 likes | 579 Vues
This work focuses on the execution of manipulation programs that utilize belief-based planning to achieve goals under uncertainty. It discusses discrete POMDP (Partially Observable Markov Decision Process) formulation, handling current states, and action selection based on observed information. The paper highlights the challenges of continuous state spaces, complex models, and how to achieve effective robot grasping in uncluttered environments. Key strategies include online belief-space search, using temporally extended actions, and parameterizing actions based on current beliefs to optimize decision-making processes.
E N D
Robust Belief-based Execution ofManipulation Programs Kaijen Hsiao Tomás Lozano-Pérez Leslie Pack Kaelbling MIT CSAIL
Achieving Goals under Uncertainty • Two kinds of uncertainty: • current state: • need to plan in information space • results of future actions: • search branches on outcomes as well as actions • Choice of action must be dependent on current information state
Discrete POMDP Formulation • states • actions • observations • transition model • observation model • reward
POMDP Controller Controller • State estimation is discrete Bayesian filter • Policy maps belief states to actions belief SE sensing action Environment
Action selection in POMDPs • Off-line optimal policy generation • Intractable for large spaces • On-line search: finite-depth expansion of belief-space tree from current belief state to select single action • Tractable in broad subclass of problems
Challenges for action selection • Continuous state spaces • Requirement to select action for any belief state • Long horizon • Action branching factor • Outcome branching factor • Computationally complex observation and transition models
Grasping in uncluttered environments • Points of leverage: • Robot pose is approximately observable • Robot dynamics are nearly deterministic • Bounded uncertainty over unobserved object parameters • Room to maneuver
Online belief-space search • Continuous state space: discretize object state space
Discretize object configuration space workspace configuration space belief state
Online belief-space search • Continuous state space: discretize object state space • Action for any belief: search forward from current belief state
Search forward from current belief • Low entropy belief states enable reliable grasp • Use entropy as static evaluation function at leaves • Actions can be useful for information gathering
Online belief-space search • Continuous state space: discretize object state space • Action for any belief: search forward from current belief state • Long horizon: use temporally extended actions
Use temporally extended actions • Primitive actions Entire trajectories • Reduce horizon Observations at end
Online belief-space search • Continuous state space: discretize object state space • Action for any belief: search forward from current belief state • Long horizon: use temporally extended actions • Large action branching factor: parameterize small set of action types by current belief
Parameterize actions with belief • Actions are entire world-relative trajectories • In current belief state, • execute with respect to most likely object configuration • terminate on contact or end of trajectory
Online belief-space search • Continuous state space: discretize object state space • Action for any belief: search forward from current belief state • Long horizon: use temporally extended actions • Large action branching factor: parameterize small set of action types by current belief • Computationally complex observation and transition models: precompute models
Precompute models • Execute WRT • with respect to estimated state e • in world state w • Expected observation,transition • Based on geometric simulation
Online belief-space search • Continuous state space: discretize object state space • Action for any belief: search forward from current belief state • Long horizon: use temporally extended actions • Large action branching factor: parameterize small set of action types by current belief • Computationally complex observation and transition models: precompute models • Large observation branching factor: canonicalize observations for each discrete state and action
Canonicalize observations • Any (e, w) pair with same relative transformation has same world-relative outcomes and observations • Only sample for one e with w varying within initial range of uncertainty • Cluster observations and represent each bin of object configurations by a single representative one • Only branch on canonical observations
Algorithm • Off-line: • plan WRTs for grasping and info gathering • compute models • On-line: • while current belief state doesn’t satisfy goal • compute expected info gain of each WRT • execute best WRT until termination • use observation to update current belief • return to initial pose • execute final grasp trajectory
Application to grasping with simulated robot arm • Initial conditions (ultimately from vision) • Object shape is roughly known (contacted vertices should be within ~1 cm of actual positions) • Object is on table and pose (x, y, rotation) is roughly known (center of mass std ~5 cm, 30 deg) • Achieve specific grasp of object
Observations • Fingertips: 6-axis force/torque sensors • position • normal • Additional contact sensors: • just contact • Swept non-colliding path rules out poses that would have generated contact
Grasping a Box Most likely robot-relative position Where it actually is
Trying again, with new belief Back up Try again
Final state and observation Observation probabilities Grasp
Updated belief state: Success! Goal: variance < 1 cm x, 15 cm y, 6 deg theta
Simulation Experiments • Methods tested: • Single open-loop execution of goal-achieving WRT with respect to the most likely state • Repeated execution of goal-achieving WRT with respect to the most likely state • Online selection of information-gathering and goal-achieving grasps (1-step lookahead)
Box experiments • Allowed variation in goal grasp: 1 cm, 1 cm, 5 deg • Initial uncertainty: 5 cm, 5 cm, 30 deg
Cup experiments • Goal 1 cm x, 1 cm y, rotation doesn’t matter (no info-grasps used) • Start uncertainty 30 deg theta (x,y varies) Increasing uncertainty
Grasping a Brita Pitcher Target grasp: Put one finger through the handle and grasp
Brita Pitcher results Increasing uncertainty
Other recent probabilistic approaches to manipulation • Off-line POMDP solution for grasping (Hsiao et al. 2007) • Bayesian state estimation using tactile sensors to locate object before grasping (Petrovskaya et al. 2006) • Finding a fixed trajectory that is most likely to succeed under uncertainty (Alterovitz et al. 2007, Burns and Brock 2007)
Timing For Brita Pitcher • (2.16 GHz processor, 3.24 GB RAM running Python, times in seconds)
Creating Information-gain Trajectories • Trajectory generation • Generate endpoints, use randomized planner (such as OpenRAVE) to find nominal collision-free path • Sweep through entire workspace • Choose a small set based on information gain from start uncertainty