Uncertainty in Sensing (and action)

Uncertainty inSensing (and action)

Agenda • Planning with belief states • Nondeterministic sensing uncertainty • Probabilistic sensing uncertainty

Belief State • A belief state is the set of all states that an agent think are possible at any given time or at any stage of planning a course of actions, e.g.: • To plan a course of actions, the agent searches a space of belief states, instead of a space of states

Sensor Model (definition #1) • State space S • The sensor model is a function SENSE: S  2Sthat maps each state s  S to a belief state (the set of all states that the agent would think possible if it were actually observing state s) • Example: Assume our vacuum robot can perfectly sense the room it is in and if there is dust in it. But it can’t sense if there is dust in the other room SENSE( ) = SENSE( ) =

Sensor Model (definition #2) • State space S, percept space P • The sensor model is a function SENSE: S  Pthat maps each state s  S to a percept (the percept that the agent would obtain if actually observing state s) • We can then define the set of states consistent with the observation P CONSISTENT(P) = { s if SENSE(s)=P } SENSE( ) = ? ? CONSISTENT( ) =

Right Appl if sIn(R1) {s1= s - In(R1) + In(R2), s2= s}[Rightdoes either the right thing, or nothing] Left Appl ifsIn(R2) {s1= s - In(R2) + In(R1), s2= s - In(R2) + In(R1) - Clean(R2)} [Leftalways move the robot to R1, but it may occasionally deposit dust in R2] • The robot perfectly senses the room it is in and whether there is dust in it • But it can’t sense if there is dust in the other room Suck(r) ApplsIn(r) {s1= s+Clean(r)}[Suck always does the right thing] Vacuum Robot Action and Sensor Model State s : any logical conjunction of In(R1), In(R2), Clean(R1), Clean (R2) (notation: + adds an attribute, - removes an attribute)

Transition Between Belief States • Suppose the robot is initially in state: • After sensing this state, its belief state is: • Just after executing Left, its belief state will be: • After sensing the new state, its belief state will be: or if there is no dust in R1 if there is dust in R1

Transition Between Belief States • Suppose the robot is initially in state: • After sensing this state, its belief state is: • Just after executing Left, its belief state will be: • After sensing the new state, its belief state will be: or if there is no dust in R1 if there is dust in R1 Left Clean(R1) Clean(R1)

Transition Between Belief States • How do you propagate the action/sensing operation to obtain the successors of a belief state? Left Clean(R1) Clean(R1)

Computing the Transition between belief states • Given an action A, and a belief state S = {s1,…,sn} • Result of applying action, without sensing: • Take the union of all SUCC(si,A) for i=1,…,n • This gives us a pre-sensing belief state S’ • Possible percepts resulting from sensing: • {SENSE(si’) for si’ in S’} (using SENSE definition #2) • This gives us a percept set P • Possible states both in S’ AND consistent with each possible percept pj in P: • Sj = {si | SENSE(si’)=pj for si’ in S’}i.e.,Sj = CONSISTENT(pj) ∩ S’

Left Right Suck loop Suck goal goal AND/OR Tree of Belief States An action is applicable to a belief state B if its precondition is achieved in all states in B A goal belief state is one in which all states are goal states

Planning With Probabilistic Uncertainty in Sensing No motion Perpendicular motion

Partially Observable MDPs • Consider the MDP model with states sS, actions aA • Reward R(s) • Transition model P(s’|s,a) • Discount factor g • With sensing uncertainty, initial belief state is a probability distributions over state: b(s) • b(si)  0 for all siS, i b(si) = 1 • Observations are generated according to a sensor model • Observation space oO • Sensor model P(o|s) • Resulting problem is a Partially Observable Markov Decision Process (POMDP)

POMDP Utility Function • A policy p(b)is defined as a map from belief states to actions • Expected discounted reward with policy p: Up(b) = E[t gtR(St)]where St is the random variable indicating the state at time t • P(S0=s) = b0(s) • P(S1=s) = ?

POMDP Utility Function • A policy p(b)is defined as a map from belief states to actions • Expected discounted reward with policy p: Up(b) = E[t gtR(St)]where St is the random variable indicating the state at time t • P(S0=s) = b0(s) • P(S1=s) = P(s|p(b0),b0) = s’ P(s|s’,p(b0)) P(S0=s’) = s’ P(s|s’,p(b0)) b0(s’)

POMDP Utility Function • A policy p(b)is defined as a map from belief states to actions • Expected discounted reward with policy p: Up(b) = E[t gtR(St)]where St is the random variable indicating the state at time t • P(S0=s) = b0(s) • P(S1=s) = s’ P(s|s’,p(b)) b0(s’) • P(S2=s) = ?

POMDP Utility Function • A policy p(b)is defined as a map from belief states to actions • Expected discounted reward with policy p: Up(b) = E[t gtR(St)]where St is the random variable indicating the state at time t • P(S0=s) = b0(s) • P(S1=s) = s’ P(s|s’,p(b)) b0(s’) • What belief states could the robot take on after 1 step?

b0 Choose action p(b0) b1 Predict b1(s)=s’ P(s|s’,(b0)) b0(s’)

b0 Choose action p(b0) b1 Predict b1(s)=s’ P(s|s’,(b0)) b0(s’) Receiveobservation oA oB oD oC

Belief-space search tree • Each belief node has |A| action node successors • Each action node has |O| belief successors • Each (action,observation) pair (a,o) requires predict/update step similar to HMMs • Matrix/vector formulation: • b(s): a vector b of length |S| • P(s’|s,a): a set of |S|x|S| matrices Ta • P(ok|s): a vector ok of length |S| • ba= Tab(predict) • P(ok|ba) = okTba(probability of observation) • ba,k = diag(ok)ba / (okTba) (update) • Denote this operation as ba,o

Receding horizon search • Expand belief-space search tree to some depth h • Use an evaluation function on leaf beliefs to estimate utilities • For internal nodes, back up estimated utilities:U(b) = E[R(s)|b] + gmaxaA oO P(o|ba)U(ba,o)

QMDP Evaluation Function • One possible evaluation function is to compute the expectation of the underlying MDP value function over the leaf belief states • f(b) = sUMDP(s) b(s) • “Averaging over clairvoyance” • Assumes the problem becomes instantly fully observable after 1 action • Is optimistic: U(b)  f(b) • Approaches POMDP value function as state and sensing uncertainty decreases • In extreme h=1 case, this is called the QMDP policy

QMDP Policy (Littman, Cassandra, Kaelbling 1995)

Worst-case Complexity • Infinite-horizon undiscounted POMDPs are undecideable (reduction to halting problem) • Exact solution to infinite-horizon discounted POMDPs are intractable even for low |S| • Finite horizon: O(|S|2 |A|h|O|h) • Receding horizon approximation: one-step regret is O(gh) • Approximate solution: becoming tractable for |S| in millions • a-vector point-based techniques • Monte Carlo tree search • …Beyond scope of course…

Schedule • 11/29: Robotics • 12/1: Guest lecture: Mike Gasser, Natural Language Processing • 12/6: Review • 12/8: Final project presentations, review

Uncertainty in Sensing (and action)

Uncertainty in Sensing (and action)

Presentation Transcript

Decision Making under Uncertainty

Uncertainty

Distributed Sensing, Control, and Uncertainty (Maryland Accomplishments)

Choice under uncertainty – complete ignorance

Uncertainty in Sensing (and action)

Uncertainty in Measurement

Decision-Making Under Uncertainty

Uncertainty

Managing Uncertainty

Uncertainty in Uncertainty

Decision Making Under Uncertainty Lec #4: Planning and Sensing

Decision Making Under Uncertainty Lec #4: Planning and Sensing

Uncertainty

Uncertainty

Distributed Sensing, Control, and Uncertainty (Maryland Overview)

Sensing Uncertainty and the Role of Constrained Actuation

Decision-Making with Probabilistic Uncertainty

Decision making under uncertainty – EMV, EOL and Decision trees

Uncertainty

Uncertainty

Uncertainty

Uncertainty