MCMC: Particle Theory

MCMC: Particle Theory By Marc Sobel

Particle Theory: Can we understand it?

The Dynamic Model Setup: Heavy Theory

Particle Filters: Lighter Theory • More familiar dynamic model setting: • Even more familiar: (for unknown g,h)

Particle Filter: Goal • The Goal in particle filter theory is to simulate (all or part of) the (unobserved) posterior distribution of the signal process {Xt: t=1,…}=X1:t • (i.e., (X1:t |Y1:t) ) (as well as additional parameters, if necessary). Specifically, we would like to simulate the signal process at time t (i.e., the posterior distribution of X1:t. The particles referred to above take the form, • These particles are designed to approximate the posterior distribution, (X1:t |Y1:t) through the introduction of appropriate weights (see next).

Particle Filter: Weights • The (normalized) weights W(1),…,W(k) attached to particles are designed so that the posterior probability • at a particle X(i) is given by the weight W(i) (i=1,…,k). Thus, for example, • Since weights are always assumed to be normalized, we need only specify them up to a constant of proportionality.

Convergence • The goal of convergence is that the estimated posterior distribution of xn’s given observations Y1:n • corresponds to what it should be. Typically the posterior distribution is defined via weighting wt[k]. • This induces, using ‘K’ as a kernel, the density,

convergence • We’d like the resulting measures to converge,

Particle Filters: Bootstrap • Assume that the (Y,X)’s are independent of each other. Then we can build particle filters sequentially by defining the weights by: • for particles formed by: • But, by normalization the term drops out. So, we can define the weights on the new Xt’s by

Particles PSLAM (continued) • To get the bootstrap particle filter, we assume that the prior distribution of Xt can be based on simulating X’s from the prior distribution. This leaves us to define the weights by: for Xt’s selected from the prior. We might then cull the particles, choosing only those with reasonably large weights.

Particle Selection: Elementary • Shephard and Pitt (in 1994) proposed dividing up the particles into homogeneous groups (i.e., based on the values of functions of particles), and resampling from each group separately. This has the advantage that we aren’t working with collections of particles which combine apples and oranges.

A Probabilistic Approach • The following algorithms take a probabilistic approach

Full vs. Online SLAM: Full Slam is an example of a particle filter which is not a bootstrap filter. • Full SLAM calculates the robot state over all time up to time t • Online SLAM calculates the robot state for the current time t

Weights • We’ve already said that weights reflect the posterior probability of a particle. In Sequential MCMC we construct these weights sequentially. In bootstrap particle analysis we simulate using the prior, but sometimes this doesn’t make sense. (see examples later). One mechanism is to use another distribution (i.e., less prone to outliers) to generate particle extensions.

Weights (continued) • Liu (2001) recommends using a target distribution q to reconstruct the current posterior. Use ‘q’ to simulate the X’s. The weights then take on the form,

Kernel Density Estimate-based Particle Filters • Use a kernel (gaussian or otherwise) as the proposal density q. This puts likelihood weights on the points selected by the prior:

Doucet Particle Filters • Doucet recommends using the kernel K as a proposal density. His weighting becomes:

Effective Sample Size • The effective sample size of a weighted distribution is the effective number of unique particles. • When it gets too small, we devise a threshold; A particle survives if: • A) it’s weight is above the threshold, or • B) it surves with probability (w/thresh). • All rejected weights are started from time t=0.

Culling or Sampling Particles • We can see that for non-bootstrap particle filters, particles tend to multiply beyond all restriction if no culling is used. Culling (or resampling) removes some (unlikely) particles so they don’t multiply too rapidly. • A) residual resampling: At step t-1, extend particles via with reconstruct weights for the new particles, retain • copies of the particle . Draw m0=k-m(1)-…-m(k) from the particle stream. Residual resampling has the effect of killing particles with small weights and emphasizing particles with large weights.

Cull Particles (continued) Thus, a particle with weight .01 in a stream of size 50, is killed. Project: Should you first ‘extend’ particles and then resample or vice versa? • B) Simple Resampling: Extend particles as above. Sample particles from the full stream according to the new weights. • This has the effect of ‘keeping’ particles with low weights. • Thus, a particle with weight .01 in a stream of size 50, has a .01 chance of being selected (whereas for residual resampling) it has a chance of 0.

Resampling (continued) • C) General Resampling: Define ‘rescaled’ probability weights: (a(1),…,a(k)) (usually related to the wt weights). Choose particles based on these weights and assign the ‘new weights’ wt(*,j)= (wt(j)/a(j)) to the corresponding particles. Rescale the new weights. • D) Effective Sample size together with General Sampling: Sample until the effective sample size is below a threshold. Accept particles whose weights are bigger than an appropriate weight threshold c (i.e., weight median). Accept particles whose weights w are below the threshold c with probability (w/c) and otherwise reject them.

General Resampling: We can generalize residual sampling by choosing only large weights to resample from, and then resample with scaled weights. Suppose some particles have different numbers of continuations than others. In this case we might want to select them with weighted probabilities that take this into account – i.e., make the a’s larger for more continuations and smaller for fewer continuations. If there are e.g., more continuations, once these have been implemented, we reweight by • w*=(w/a).

The Central Problem with Particle Filters • At each time stage ‘t’ we are trying to reconstruct the posterior distribution and we biuld future particles on this estimate. Mistakes at each stage amplify mistakes later on. Posterior distribution is typically done using weights. For this reason, there are many algorithms supporting ways to improve the posterior distribution reconstruction: • A) Use kernel density estimators to reconstruct the posterior. • B) Use Metropolis Hastings to reconstruct the posterior. • C) Divide particles into separate streams, reconstructing the posterior for each one separately.

Pitt – Shephard Particle Theory • Suppose new X’s which are sampled aposteriori are heterogeneous i.e., they can be divided into homogeneous streams. Call the streams s=1,…,k. Define, for each stream s, with ‘Zi=s’ denoting that Xi is in the s’th stream: • We then divide sampling into two parts. First we sample a stream using

Pitt Shephard • Then, we sample using weights: • Suppose we have very heterogeneous particles, and there is a way to divide them into homogeneous streams. Then this makes sense. • (e.g., mixture kalman filter type examples).

Example of a Linear Dynamic Model • In what follows we consider the following linear dynamic model: • We compute • The Kalman Filter gives us a solution to the posterior distribution. We can check this against the bootstrap and other filters.

Comparing Bootstrap Particles with the real thing computing Λt over 500 time periods.

Histogram of the difference between bootstrap particle and real parameters

A More Complicated Example of a Dynamic Model • In what follows we consider the following quadratic dynamic models: • The Kalman Filter again gives us stepwise solutions to the posterior distribution. We can check this against the bootstrap and other filters.

Quadratic Model for κ1=.005. Bootstrap versus real parameters.

Absolute difference between the bootstrap particle filter and the real parameter for κ1=.005

Switch to Harder Model • For the second quadratic model when kappa=.01, the bootstrap particle filter breaks down entirely before time t=50. • We switch to residual resampling.

Residual Resamling versus real parameters when κ2=.01.

Histogram: Note how much larger differences are in this case

Switch to General Sampling • Now we use residual sampling as long as the effective sample size stays above 500. • When it falls below 500, we use only those particles: • A) with weights above .002, or • B) with probability (w/.002) we accept the particles.

General Sampling versus real parameters for the quadratic model: Coefficient=.01.

Histogram of the differences between general sampling and real parameters: Note how small the differences are.

Mixture Kalman Filters: Tracking Models • Define models which have a switching mechanism P(KFi|KFj) in going from Kalman filter KFj to KFi (i,j=1,…,d). Updates use standard Kalman Filters for a given model KFj and then with probability P(KFi|KFj) switch to filter KFi.

Kalman Filters • We have the updating formula: • Which can generate particles consisting of Kalman Filters. • We can use Pitt-Shepard to sample these models. In effect, we divide the Kalman Filters into homogeneous groups, and then sample them as separate streams.

MKF’s • Mixture Kalman Filters are useful for tracking. The Kalman Filters represent different objects at a particular location. The probabilities represent the chance that one object versus others are actually visible subject to the usual occlusion. The real posterior distribution for MKF’s is easy to calculate.

Matlab Code: Part I • % Particle Model • % X[t]=.5X[t-1]+3*V[t] • % Y[t]=2*X[t]+2*W[t] • XP=zeros(100,1000); • X(1,1:1000)=5+sqrt(3)*normrnd(0,1,1,1000); • Y(1,1:1000)=2*X(1,1:1000)+sqrt(2)*normrnd(0,1,1,1000); • W=normpdf(Y(1,:),2*X(1,:),ones(1,1000)); • WW=W/sum(W'); • XBP(1,:)=randsample(X(1,:),1000,true,WW); • mm=((2*Y(1,:)/4)+(5/9))/(1+(1/9)); • sg=sqrt(1/(1+(1/9))); • XP(1,:)=normrnd(mm,sg*ones(1,1000));

Matlab Code: Part II • for tim=2:100 • for jj=1:1000 • X(tim,jj)=.5*X(tim-1,jj)+5+3*randn; • Y(tim,jj)=2*X(tim,jj)+2*randn; • end • W=normpdf(Y(tim,:),2*X(tim,:),ones(1,1000)); • WW=W/sum(W'); • XBP(tim,:)=randsample(X(tim,:),1000,true,WW); • mm=((2*Y(tim,:)/4)+(5/9)+.5*X(tim,:)/9)/(1+(1/9)); • sg=sqrt(1/(1+(1/9))); • XP(tim,:)=normrnd(mm,sg*ones(1,1000)); • end

MCMC: Particle Theory