Probabilistic Robotics: A Tutorial

Probabilistic Robotics: A Tutorial Juan Antonio Fernández Madrigal October 2004 System Engineering and Automation Dpt. University of Málaga (Spain)

Contents • Introduction • 1.1 Probabilistic Robotics? • 1.2 The Omnipresent Core: Bayes’ Rule • 1.3 Let’s Filter! (Bayes Filter) • You Will Find Them Everywhere: Basic Mathematical Tools • 2.1 A Visit to the Casino: MonteCarlo Methods • 2.2 Partially Unknown Uncertainty: the EM Algorithm • 2.3 Approximating Uncertainty Efficiently: Particle Filters • The Foundations: The Common Bayesian Framework • 3.1 Graphs plus Uncertainty: Graphical Models • 3.2 Arrows on Arcs: Bayesian Networks • 3.3 Let it Move: Dynamic Bayesian Networks (DBNs) • Forgetting the Past: Markovian Models • 4.1 It is Easy if it is Gaussian: Kalman Filters • 4.2 On the Line: Markov Chains • 4.3 What to Do?: Markov Decision Processes (MDPs) • 4.4 For Non-Omniscient People: Hidden Markov Models (HMMs) • 4.5 For People that Do Things: POMDPs Juan Antonio Fernández Madrigal

1. Introduction 1.1 Probabilistic Robotics? What’s Probabilistic Robotics? • Robotics that use probability calculus for modeling and / or • reasoning about robot actions and perceptions. • -State-of-the-art Robotics Why Probabilistic Robotics? • For coping with uncertainty on the robot’s environment. • For coping with uncertainty / noise on robot’s perceptions. • For coping with uncertainty / noise on robot’s actions. Juan Antonio Fernández Madrigal

1. Introduction Prior probability (belief) (The probability that the random variable R would take the value r anyway) P(e | R=r) P(R=r) all r Posterior probability (The probability that a random variable R takes the value r, given that the event e has occurred) Conditional probability (The probability that the event e occurs if the random variable R would take the value r) =P(e) (Normalizing factor for the posterior P(R=r) to add up to 1) 1.2 The Omniscient Core: Bayes’ Rule (~1750) • Rule for updating your existing belief (probability of some variable) given new evidence (the occurrence of some event). • In spite of its simplicity, it is the basis for most probabilistic approaches in robotics and other sciences. P(e | R=r) P(R=r) P(R=r | e) = Juan Antonio Fernández Madrigal

1. Introduction P(e t=t2 | R=r t=t2) P(R=r t=t2 ) P(e | R=r) P(R=r) P(R=r t=t2 | e t from t1 to t2) = This is called “filtering” = Bayes Filter This is called “prediction” This is called “fixed-lag smoothing” all r, t2 t t2 testimate t1 testimate testimate 1.3 Let’s Filter! (Bayes Filter) • Bayes’ Rule can be iterated for improving estimate over time: • In general, there are the following possibilities: Known evidence Juan Antonio Fernández Madrigal

2. You’ll Find Them Everywhere: Basic Mathematical Tools 4)Since U is uniform, p(u)=1, so: 3)It follows from probability calculus that: 5)The standard error is: where p(u) is the probability density function of u. where sigma is the standard deviation of each sample (unknown). 1)Take a uniform distribution over the region of integration: 2)Calculate the expectation of f(U) by statistical sampling (m samples): 2.1 A Visit to the Casino: MonteCarlo Methods (1946) • A set of methods based on statistical sampling for approximating some value(s) (any quantitative data) when analytical methods are not available or computationally unsuitable. • Error in aproximation does not depend on dimensionality of data. • In its general form, it is a way to approximate integrals: Given the difficult integral (function “f” is known): It can be approximated by the following steps: 6)The error diminishes with many samples, but maybe slowly... There are techniques of “variance reduction” to reduce also sigma: -Antitethic variates -Control Variates -Importance Sampling -Stratified Sampling... Juan Antonio Fernández Madrigal

2. You’ll Find Them Everywhere: Basic Mathematical Tools 2.2 Partially Unknown Uncertainty: The EM Algorithm -The Expectation-Maximization algorithm (1977) can be used in general for estimating any probability distribution from real measurements that can be incomplete. -The algorithm works in two steps (E and M) that are iterated, improving the likelihood of the estimate over time. It can be demonstrated that the algorithm converges to a local optimum. 1. E-step (Expectation). Given the current estimate of the distribution, calculate the expectation of the measurements with respect to that estimate. 2. M-step (Maximization). Produce a new estimate that improve (maximizes) that expectation. Juan Antonio Fernández Madrigal

2. You’ll Find Them Everywhere: Basic Mathematical Tools In general, E[ h(W) | R=r) ] = h(w) p(w | r) dw all w from W Thus, E[ log p(X,Y | M) | X,M(i-1) ] = log p(X,y | M) p(y | X, M(i-1)) dy all y from Y unknown to be optimized 2.2 Partially Unknown Uncertainty: The EM Algorithm -Mathematical formulation (in the general case): All the data (both missing and measured) Measured data Z=(X,Y) Missing or hidden data (not measured) p(Z | M) = p(X,Y | M) Complete-data likelihood given a model M (we will maximize the expectation of this) -E-step: -M-step: M(i) = argmax(on M) E[ log p(X,Y | M) | X,M(i-1) ] Generalized EM (GEM), also converges Variation: M(i) = any M that makes expectation greater than M(i-1) Juan Antonio Fernández Madrigal

2. You’ll Find Them Everywhere: Basic Mathematical Tools 2.3 Approximating Uncertainty Efficiently: Particle Filters -MonteCarlo Filter (i.e.: iterated over time). -It is useful due to its efficiency. -Represent probability distributions by samples with associated weights, and yield information from the distributions by computing on those samples. -As the number of samples increases, the accuracy of the estimate increases. -There are a diversity of particle filter algorithms depending on how to select the samples. Juan Antonio Fernández Madrigal

3. The Foundations: The Common Bayesian Framework 3.1 Graphs plus Uncertainty: Graphical Models • A common formalism that copes with both uncertainty and complexity, two problems commonly found in applied mathematics and engineering. • Many specific derivations: mixture models, factor analysis, hidden Markov models, Kalman filters, etc. • A graphical model is a graph with associated probabilities. Nodes represent random variables. An arc between two nodes indicates a statistical dependence between two variables. • Three basic types: A) Undirected graphs (=Markov Random Fields): in Physics, Computer Vision, ... B) Directed (=Bayesian Networks): in Artificial Intelligence, Statistics, ... C) Mixed (=Chain Graphs). Juan Antonio Fernández Madrigal

3. The Foundations: The Common Bayesian Framework Cloudy (C) Sprinklet(S) Rain (R) wet grass (W) P(W=true | S=true, R=true)=0.99 P(W=true | S=true, R=false)=0.9 P(W=true | S=false, R=true)=0.9 P(W=true | S=false, R=false)=0 P(W=false | S=true, R=true)=0.01 P(W=false | S=true, R=false)=0.1 P(W=false | S=false, R=true)=0.1 P(W=false | S=false, R=false)=1 3.2 Arrows on Graphs: Bayesian Networks P(C=true)=0.5 P(C=false)=0.5 Nodes (variables) can hold discrete or continuous values. Arcs represent causality (and conditional probability) P(S=true | C=true)=0.1 P(S=true | C=false)=0.5 P(S=false | C=true)=0.9 P(S=false | C=false)=0.5 P(R=true | C=true)=0.8 P(R=true | C=false)=0.2 P(R=false | C=true)=0.2 P(R=false | C=false)=0.8 The model is completely defined by its graph structure, the values of its nodes (variables) and the conditional probabilities of the arcs Juan Antonio Fernández Madrigal

3. The Foundations: The Common Bayesian Framework Cloudy (C) Sprinklet(S) Rain (R) Causes wet grass (W) P(S=true,W=true) P(S=true | W=true) = P(W=true) Effect 3.2 Arrows on Graphs: Bayesian Networks - Inference 1) Bottom-up Reasoning or Diagnostic: from effects to causes -For example: given that the grass is wet (W=true), which is more likely, the sprinklet being on (S=true) or the rain (R=true)? -We seek P(S=true | W=true) and P(R=true | W=true) -Using the definition of conditional probability: -In general, using the chain rule: P(C,S,R,W) = P(C) P(S|C) P(R|S,C) P(W|S,R,C) -But by the graph structure: P(C,S,R,W) = P(C) P(S|C) P(R|C) P(W|S,R) Juan Antonio Fernández Madrigal

3. The Foundations: The Common Bayesian Framework Cloudy (C) Sprinklet(S) Rain (R) Causes wet grass (W) P(S=true,W=true) P(S=true | W=true) = P(W=true) Effect P(W=true) = P(C=c,S=s,R=r,W=true) all c,s,r 3.2 Arrows on Graphs: Bayesian Networks - Inference 1) Bottom-up Reasoning or Diagnostic: from effects to causes -For example: given that the grass is wet (W=true), which is more likely, the sprinklet being on (S=true) or the rain (R=true)? -We seek P(S=true | W=true) and P(R=true | W=true) -Using the definition of conditional probability: -By marginalization: Juan Antonio Fernández Madrigal

3. The Foundations: The Common Bayesian Framework 3.2 Arrows on Graphs: Bayesian Networks - Inference 2) Top-down Reasoning or Causal / Generative Reasoning: from causes to effects -For example: given that it is cloudy (C=true), which is the probability that the grass is wet (W=true)? -We seek P(W=true | C=true) -The inference is similar. Juan Antonio Fernández Madrigal

3. The Foundations: The Common Bayesian Framework 3.2 Arrows on Graphs: Bayesian Networks - Causality -It is possible to formalise if a variable (node) is a cause for another or if they are merely correlated. -It would be useful, for example, for a robot to learn the effects of its actions... Juan Antonio Fernández Madrigal

3. The Foundations: The Common Bayesian Framework 3.3 Let it Move: Dynamic Bayesian Networks (DBNs) -Bayesian Networks with time, not dynamical in the sense that the graph structure or parameters vary. -(Very) simplified taxonomy of Bayesian Networks: Graphical Models Undirected = Markov Random Fields Directed = Bayesian Networks Temporal = Dynamic Bayesian Networks Non-temporal Markov Processes (independence of future w.r.t. all past) Totally Observable Partially Observable Gaussian Models No actions Markov Chains Hidden Markov Models (HMM) Kalman Filters Markov Decision Processes Partially Observable Markov Decision Processes (POMDP) Actions Juan Antonio Fernández Madrigal

4. Forgetting the Past: Markovian Models 4.1 It is Easy if it is Gaussian: Kalman Filters (1960) -They model dynamic systems, with partial observability, and gaussian probability distributions. -It is of interest: -To estimate the current state. Thus, the EM algorithm can be thought as an alternative not subjected to gaussians. -Applications:any in which it is needed to estimate the state of a known dynamical system under gaussian uncertainty / noise: computer vision (tracking), robot SLAM (if the map is considered part of the state), ... -Extensions: to reduce computational cost (e.g.: when the state has a large description), to cope with more than one hypothesis (e.g.: when two indistinguishable landmarks yield a two-modal distribution for the pose of a robot), to cope with non-linear systems (through linearization: EKF), ... Juan Antonio Fernández Madrigal

4. Forgetting the Past: Markovian Models mc = 0 (mean of control noise) Sc(covariance matrix of control noise) mm = 0 (mean of observation noise) Sm(covariance matrix of observation noise) 4.1 It is Easy if it is Gaussian: Kalman Filters (1960) -Mathematical formulation: P(x | u,x’) = Ax’ + Bu + ec gaussian noise in system actions current state of the system actions performed by the system last state known linear model of the system P(z | x) = Cx + em gaussian noise in observations current observations of the system current state of the system known linear model of the observations White Noise Juan Antonio Fernández Madrigal

4. Forgetting the Past: Markovian Models m’t-1 = mt-1 + But S’t-1 = St-1 + Sc Kt = S’t-1CT(C S’t-1CT+ Sc)-1 mt = m’t-1 + Kt (zt-C m’t-1) St = (I- KtC) S’t-1 4.1 It is Easy if it is Gaussian: Kalman Filters (1960) -Through a Bayes Filter: state estimate Juan Antonio Fernández Madrigal

4. Forgetting the Past: Markovian Models 4.2 On the Line: Markov Chains -Nodes of the network represent a random variable X in a given instant of time (unknown except for the first node). -Arc from node Xn to node Xn+1 represent the conditional probability that the random variable X takes a probability distribution Xn+1 given that it exhibited distribution Xn at the last instant (no other past instant is considered since the model is markovian). -Instants of time are discrete. All conditionals are known. -It is of interest:a)causal reasoning: to obtain Xn from all its past, and b)whether the probability distribution converges over time: assured if the chain is ergodic (any node is reachable in one step from any other node). { -Direct:physics, computer networks, ... -Applications: -Indirect:as part of more sophisticated models. Juan Antonio Fernández Madrigal

4. Forgetting the Past: Markovian Models 4.3 What to Do?: Markov Decision Processes -Markov Processes with actions (output arcs) that can be carried out at each node (state), with some reward as a result of a given action on a given state. -It is of interest:to obtain the best sequence of actions (markov chain) that optimize the reward. Any sequence of actions (chain) is called a policy. A Case of Reinforcement Learning (RL) -In every MDP, it can be demonstrated that there always exists an optimal policy (the one that optimizes the reward). -Obtaining the optimal policy is expensive (polynomial). There are several algorithms for solving it, some of them reducing that cost (by hierarchies, etc.). The most classical one: value iteration. -Applications:decision making in general, robot path planning, travel route planning, elevator scheduling, bank customer retention, autonomous aircraft navigation, manufacturing processes, network switching and routing, ... Juan Antonio Fernández Madrigal

4. Forgetting the Past: Markovian Models 4.4 For Non-Omniscient People: Hidden Markov Models (~1960-70) Markov Processes without actions and with partial observability. States of the network are not directly accessable, except through some stochastic measurement. That is, observations are a probabilistic function of the state -It is of interest: How good is a given model? (not really interesting for us) a) which is the probability of the sequence of observations, given the network? Where are we in the model? (little use: model known) b) which states have we visited more likely, given observations and network parameters? Which is the model? (robot mapping/localisation) c) which are the network parameters that maximize the probability of having obtained those observations? -Applications:speech processing, robot SLAM, bioinformatics (gene finding, protein modeling, etc.), image processing, finance, traffic, ... Juan Antonio Fernández Madrigal

4. Forgetting the Past: Markovian Models 4.4 For Non-Omniscient People: Hidden Markov Models -Elements in a HMM: N = nº of states in the network (i-th state=si). M = nº of different possible observations (k-th observation=ok). A = matrix (N x N) of state transition probabilities: axy=P(qt+1=sy | qt=sx) B = matrix (N x M) of observation probabilities: bx(ok)=P(ot=ok | qt=sx) P = matrix (1 x N) of initial state probabilities: Px=P(q0=sx) l = HMM model = (A,B, P) Juan Antonio Fernández Madrigal

4. Forgetting the Past: Markovian Models 4.4 For Non-Omniscient People: Hidden Markov Models -Solution to Problema): which is the probability of a sequence of observations (of length T), given the network? -Direct approach:enumerate all the possible sequences of states (paths) of length T in the network, and calculate for each one the probability that the given sequence of observations is obtained if the path is followed. Then, calculate the probability of that path, and thus the joint probability of the path and of the observations for that path. Finally, sum up for all the possible paths In the network. It depends on the number of paths of length T: O(2TNT ), unfeasible for T long enough. Juan Antonio Fernández Madrigal

4. Forgetting the Past: Markovian Models i=N 2. at+1(j)=[ at(i)aij] bj(Ot+1), for all states i from 1 to N i=1 3.P(O | l)= aT(i) i=N i=1 4.4 For Non-Omniscient People: Hidden Markov Models -Efficient approach: the forward-backward procedure. -A forward variable is definedat(i)=P(O1,O2,...,Ot,qt=si | l)as the probability of the observation sequenceO1,O2,...,Otfollowed by reaching statesi. -It is calculated recursively: 1. a1(i)=Pibi(O1), for all states i from 1 to N -This calculation isO(N2T). Juan Antonio Fernández Madrigal

4. Forgetting the Past: Markovian Models 3.P(O | l)= b1(i) j=N 2. bt(i)= aijbj(Ot+1)bt+1(j), for all states i from 1 to N j=1 i=N i=1 4.4 For Non-Omniscient People: Hidden Markov Models -Efficient approach: the forward-backward procedure. -Alternativelty, a backward variable can be definedbt(i)=P(Ot+1,Ot+2,...,OT,qt=si | l)as the probability of the observation sequenceOt+1,Ot+2,...,OTpreceded by having reached statesi. -It is calculated recursively: 1. b1(i)=1, for all states i from 1 to N Juan Antonio Fernández Madrigal

4. Forgetting the Past: Markovian Models dt(i)= max P(q1,q2,...,qt=si,O1,O2,...,Ot | l) q1,q2,...,qt-1 Recursively:dt(j)=[ max (dt-1(i)aij) ] bj(Ot) i=1,2,...,N yt(j)= argmax (dt-1(i)aij) i=1,2,...,N 4.4 For Non-Omniscient People: Hidden Markov Models -Solution to Problemb): which states have we visited more likely, given a sequence of observations of length T and the network? -There is not a unique solution (as in problema)): it depends on the optimality criteria chosen. But when one is chosen, a solution can be found analitycally. -The Viterbi Algorithm: itfinds the best single-state sequence of states (the one that maximizes the probability of each single state of the sequence, independently on the other states, at each step). -The following two variables are defined: (It calculates all the sequences of states that reach state si at the end and produce the given observations; Then it returns the maximum probability found) (traces the state that maximizes the expression, that is, that maximizes the likelihood of passing through single state sj) Juan Antonio Fernández Madrigal

4. Forgetting the Past: Markovian Models 2. dt(j)=)=[ max (dt-1(i)aij) ] bj(Ot) for all states j from 1 to N yt(j)= argmax (dt-1(i)aij), for all states j i=1,2,...,N i=1,2,...,N 3.P*= max (dT(i)) qT*= argmax (dT(i)) (the ending state) i=1,2,...,N i=1,2,...,N 4.4 For Non-Omniscient People: Hidden Markov Models -The algorithm works as follows: 1. d1(i)=Pibi(O1), for all states i from 1 to N y1(i)=0, for all states i 4.qt*= yt+1(qt+1*) (for retrieving all the other states in the sequence) Juan Antonio Fernández Madrigal

4. Forgetting the Past: Markovian Models 4.4 For Non-Omniscient People: Hidden Markov Models -Solution to Problemc): which are the network parameters that maximize the probability of having obtained the observations? -Not only there is no one unique solution (as in problemb)), but there are not any analitycal procedure to obtain it: only approximations are available. -Approximation algorithms that obtain locally optimal models exist. Most popular: EM(Expectation-Maximization), which is called Baum-Welch when adapted to HMMs: -The sequence of observations are considered the measured data. -The sequence of states that yield those observations, the missing or hidden data. -The matrices A, B, P are the parameters to approximate. -The number of states is known a priori. Juan Antonio Fernández Madrigal

4. Forgetting the Past: Markovian Models 4.5 For People that Do Things: POMDPs -”Partially Observable Markov Decision Processes”. -Markov Processes with both actions and partial observability. -It is of interest: Modeling and Interpreting Perceptions -The three problems of HMMs: likelihood of the model, localisation in a given model, and calculation of the model itself. -The basic problem of MDPs: best policy to do (through actions) for obtaining the greatest benefit. Acting Optimally -Applications:any in which it is needed both to model some real process (or environment) and to act optimally with that model. Only recently applied to robotics (1996) Juan Antonio Fernández Madrigal

References • Thrun S. (2002), “Robotic Mapping: A Survey”, Technical Report CMU-CS-02-111 • Murphy K. (1998), “A Brief Introduction to Graphical Models and Bayesian Networks”, http://www.ai.mit.edu/~murphyk/Bayes/bayes.html. • Murphy K. (2000), “A Brief Introduction to Bayes’ Rule”, http://www.ai.mit.edu/~murphyk/Bayes/bayesrule.html. • Contingency Analysis (2004), “MonteCarlo Method”, http://www.riskglossary.com/articles/monte_carlo_method.htm. • Bilmes J.A. (1998), “A Gentle Tutorial of the EM Algorithm and its Applications to Parameter Estimation for Gaussian Mixture and Hidden Markov Models”, International Computer Science Institute, Technical Report, CA (USA). • Arulampalam S., Maskell S., Gordon N., Clapp T. (2001), “A Tutorial on Particle Filters for On-Line Non-Linear / Non-Gaussian Bayesian Tracking”, IEEE Transactions on Signal Processing vol. 50, no 2. • West M. (2004), “Elements of Markov Chain Structure and Convergence”, Notes of Fall 2004 course, http://www.stat.duke.edu/courses/Fall04/sta214/Notes/214.5.pdf. • Moore A. (2002), “Markov Systems, Markov Decision Processes, and Dynamic Programming”, teaching slides at CMU. • Hannon M.E., Hannon S.S. (2000), “Reinforcement Learning: A Tutorial”, Reading of New Bulgarian University, www.nbu.bg/cogs/events/2000/Readings/Petrov/rltutorial.pdf. • Cassandra T. (1999), “POMDP for Dummies”, http://www.cs.brown.edu/research/ai/pomdp/tutorial/index.html. • Rabiner L. (1989), “A Tutorial in Hidden Markov Models and Selected Applications in Speech Recognition”, Proceedings of the IEEE, vol. 77, no. 2. • Cassandra A.R., Kaelbling L.P., Kurien J.A. (1996), “Acting under Uncertainty: Discrete Bayesian Models for Mobile Robot Navigation”, Proceedings of the IROS’96. Juan Antonio Fernández Madrigal

Probabilistic Robotics: A Tutorial

Probabilistic Robotics: A Tutorial

Presentation Transcript

Medical Robotics

Intel Design and Discovery Robotics Module

Particle Filters In Robotics or: How the World Became To Be One Big Bayes Network

Treebank-Based Wide Coverage Probabilistic LFG Resources

SoCs and Control System Integration

UCL Tutorial on: Deep Belief Nets (An updated and extended version of my 2007 NIPS tutorial)

Foundations of Probabilistic Answers to Queries

Cell Decomposition Course: Introduction to Autonomous Mobile Robotics

Probabilistic Seismic Hazard Analysis

Conditional Random Fields and Direct Decoding for Speech and Language Processing

Particle Filter/Monte Carlo Localization

Treebank-Based Wide Coverage Probabilistic LFG Resources

Learning probabilistic finite automata

INDUSTRIAL ROBOTICS U7MEA38

Metrics for real time probabilistic processes

Non-interference Properties for Probabilistic Processes

IPG Tutorial

Unit 1 topic 2:

Probabilistic Roadmaps

Overview of Our Sensors For Robotics