html5-img
1 / 51

Introduction to Monte Carlo Methods

Introduction to Monte Carlo Methods. D.J.C. Mackay. Rasit Onur Topaloglu Ph.D. candidate rot@ucsd.edu. Outline. Problems Monte Carlo (MC) deals with. Uniform Sampling Importance Sampling Rejection Sampling Metropolis Gibbs Sampling Speeding up MC : Hybrid MC and Over-relaxation.

yamal
Télécharger la présentation

Introduction to Monte Carlo Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Monte Carlo Methods D.J.C. Mackay Rasit Onur Topaloglu Ph.D. candidate rot@ucsd.edu

  2. Outline • Problems Monte Carlo (MC) deals with • Uniform Sampling • Importance Sampling • Rejection Sampling • Metropolis • Gibbs Sampling • Speeding up MC : Hybrid MC and Over-relaxation

  3. Definition of Problems Problem2: Estimate expectation of a function given probability distribution for a variable Problem1: Generate samples from a distribution

  4. Monte Carlo for High Dimensions • Solving first problem => solving second one • Just evaluate function using the samples and average them out • The accuracy is independent of the dimensionality of the space sampled

  5. Sampling from P(x) • Assume we can evaluate P(x) within a multiplicative constant • If P*(x) can be evaluated, still a hard problem as: • Z is not known • Not easy to draw at high dimensions (except Gaussian) • ex: A sample from univariate Gaussian can be calculated as : u1 and u2 are uniform in [0,1]

  6. Difficulty of Sampling at High Dimensions • Can discretize the function (figure on right) and sample the discrete samples • This is costly at high dimensions: BNfor B bins and N dimensions

  7. Tries to solve the 2nd problem Uniform Sampling • Draw samples uniformly from state space • ZR is the normalizing constant: • Estimate by: • For distributions that have peaks at a small region, lots of points must be sampled so that (x) is calculated a number of times => requires lots of samples • Thus, uniform distribution seldom useful

  8. Tries to solve the 2nd problem Importance Sampling • Introduce a simpler density Q • Values of x where Q(x)>P(x) are over-represented; values of x where Q(x)<P(x) are under-represented=> • introduce weights

  9. The estimate  vs number of samples Gaussian sampler Cauchy Sampler Reliability of Importance Sampling • An importance sampler should have heavy tails in problems where infrequent samples might be influential

  10. Rejection Sampling • Again, a Q (proposal density) assumed • Also assume we know a constant c s.t.: • Generate x using Q*(x) • Find r.v. u uniformly in interval [0,cQ*(x)] • If u<=P*(x), accept and add x to the random number list

  11. Transition to Markov Chain MC • Importance and rejection sampling work well if Q is similar to P • In complex problems, it is difficult to find a single such Q over the state space

  12. Proposal density Q depends on current state x(t):Q(x`;x(t)) Metropolis Method Accept new state if a1, Else; accept with probability a: • In comparison to rejection sampling, the rejected points are not discarded and hence influential on consequent samples => samples correlated => Metropolis may have to be run more to generate independent samples from P(x)!

  13. Useful for high dimensions Disadvantages of Large Step Size • Uses a length scale  much smaller than state space L • Because: • Large steps are highly unlikely to be accepted • => Limited or no movement in space! => biased

  14. A Lower Bound for Independent Samples • Metropolis will explore using a random walk • Random walks take a long time • After T steps of size , state only moves T • As Monte Carlo is trying to achieve independent samples, requires ~(L/ )2 samples to get a sample independent of the initial condition • This rule of thumb be used for a lower bound

  15. An Example using this Bound 100th iteration time 400th iteration 1200th iteration • Takes ~ 102=100 steps to reach an end state • Metropolis still provides biases estimates even after a large number of iterations

  16. Gibbs Sampling • As opposed to previous methods, at least 2 dimensional distributions required • Q defined in terms of conditional distributions of joint distribution P(x) • The assumption is P(x) complex to evaluate but P(xi|{xj}ji) is tractable

  17. Gibbs Sampling on an Example • Start with x=(x1,x2), fix x2(t) and sample x1 from P(x1|x2) • Fix x1 and sample x2 from P(x2|x1)

  18. Gibbs Sampling with K Variables • In comparison to Metropolis, every proposal is always accepted • In bigger models, groups of variables jointly sampled:

  19. Comparison of MC Methods in High Dimensions • Importance and rejection sampling result in high weights and constants respectively, resulting in inaccurate or length simulations => not practical • Metropolis requires at least (max/ min)2 samples to acquire independent samples => might be lengthy • Gibbs sampling has similar properties with Metropolis. No adjustable parameters => most practical

  20. Practical Questions About Monte Carlo • Can we predict how long it takes for equilibrium: • Use the simple bound proposed • Can we determine convergence in a running simulation: yet another difficult problem • Can we speed up convergence time and time between independent samples?

  21. Reducing Random Walk in Metropolis : Hybrid MC • Most probabilities can be written in the form: • Introduce a momentum variable p: • K is kinetic energy: • Create asymptotic samples from joint distribution: • Pick p randomly from: • Update x and p: • This results in linear time convergence

  22. An Illustrative Example for Hybrid MC Hybrid Hybrid Random walk Random walk • Over a number of iterations, hybrid trajectories indicate less • correlated samples

  23. Reducing Random Walk in Gibbs : Over-relaxation • Use former value x(t) as well to calculate x(t+1) : • Suitable to speed up the process when variables are highly correlated • Useful for Gaussians, not straightforward for other types of conditional distributions

  24. An Illustrative Example for Over-Relaxation • State space for a bi-variate Gaussian for 40 iterations • Over-relaxation samples better covers the state space

  25. Reducing Random : Simulated Annealing • Introduce a parameter T (temperature) and gradually reduce it to 1: • High T corresponds to being able to make transitions easier • As opposed to its use in optimization, T is not reduced to 0 but 1

  26. Differential Equations Applications of Monte Carlo Ex: Steady-state temperature distribution of an annulus: The finite difference approximation using grid dimension h: • With ¼ probability, one of the states is selected • The value when a boundary is reached is kept, the mean of many such runs gives the solution

  27. Integration Applications of Monte Carlo To evaluate: • Bound the function with a box • Ratio of points under function to all points within the box gives the ratio of the areas of the function and the box

  28. Image Processing :Automatic eye-glass removal: Applications of Monte Carlo • MCMC used instead of a gradient-based methods to solve • MAP criterion that is used to locate points of eye-glasses [C. Wu, C. Liu, H.-Y. Shum, Y.-Q. Xu and Z. Zhang, “Automatic Eyeglasses Removal from Face Images”, IEEE Tran. On Pattern Analysis and Machine Intelligence, Vol.26, No.3, pp. 322-336, Mar.2004]

  29. Image segmentation: Applications of Monte Carlo • Data-driven techniques such as edge detection, tracing and clustering combined with MCMC to speed up the search [Z. Tu and S.-C. Zhu, “Image Segmentation by Data-Driven Markov Chain Monte Carlo”, IEEE Tran. On Pattern Analysis and Machine Intelligence, Vol.24, No.5, pp. 657-673, May 2002]

  30. Forward Discrete Probability Propagation Rasit Onur Topaloglu Ph.D. candidate rot@ucsd.edu

  31. Lowest Level: The Problem Ex: gm=(2*k*Id) High level: • The tree relates physical parameters to circuit parameters • Structured according to SPICE formula hierarchy • Given pdf’s for lowest level parameters; find pdf’s at highest level

  32. Motivation for Probability Propagation • Estimation of distributions of high level parameters needed to examine effects of process variations • Gaussian assumption attributed to these parameters no longer accurate in current technologies Find a novel propagation method GOALS Determinism : a stochastic output using known formulas Algebraic tractability : enabling manual applicability Speed & Accuracy : be comparable or outperform Monte Carlo and other parametric methods

  33. Parametric Belief Propagation Calculations handled at each node: • Each node receives and sends messages to parents and children until equilibrium • Parent to child () : causal information • Parent to parent () : diagnostic information

  34. Parametric Belief Propagation • When arrows in the hierarchy tree indicate linear addition operations on Gaussians, analytic formulations possible • Not straightforward for other distributions or non- standard distributions

  35. Shortcomings of Monte Carlo • Non-determinism : Not manually applicable • Limited for certain distributions : Random number generators in most CAD packages only provide certain distributions • Accuracy : May miss points that are less likely to occur due to random sampling unless very large number of samples used; limited by the performance of random number generator

  36. Monte Carlo – FDPP Comparison one-to-many relationships and custom pdf’s • Non-standard pdf’s not possible without a custom random number generator P1 • Monte Carlo overestimates in one-to-many relationships as same sample is used P2 P3 P4

  37. Operations Necessary to Implement FDPP Analytic operation on continuous distributions difficult; instead operations on discrete distributions implemented : • F (Forward) : Given a function, estimates the distribution of next node in the formula hierarchy • Q (Quantize) : Discretize a pdf to operate on its samples • B (Bandpass) : Decrements number of samples for computational efficiency • R (Re-bin) : Reduces number of samples for computational efficiency

  38. Q Q NSUB T F B,R PHIf Necessary Operators (Q, F, B, R) on a Hierarchical Tree • Repeated until we acquire the high level distribution (ex. G)

  39. spdf(X) or (X) • QN band-pass filter pdf(X) and divide into bins N in QN indicates number or bins Probability Discretization Theory: QN Operator; p and r domains pdf(X) p-domain r-domain Certain operators easy to apply in r-domain

  40. Characterizing an spdf • can write spdf(X) as : where : pi : probability for i’th impulse wi : value of i’th impulse spdf(X) or (X) r-domain

  41. F Operator spdf(X) or (X) Xi, Y : random variables pXs : Set of all samples s belonging to X • F operator implements a function over spdf’s • Function applied to individual impulses • Individual probabilities multiplied

  42. Band-pass, Be, Operator Margin-based Definition: • Eliminate samples having values out of range Error-based Definition: • Eliminate samples having probabilities least likely to occur

  43. Impulses after F Unite into one  bin • Samples falling into the same bin congregated in one where : Resulting spdf(X) Re-bin, RN, Operator

  44. Error Analysis Variance of quantization error: Distortion caused by representing samples in a bin by a single sample: mi : center or i’th bin Total distortion: • If quantizer uniform and  small, quantization error random variable Q is uniformly distributed

  45. Algorithm Implementing the F Operator While each random variable has its spdf computed For each rv. which has all ancestor spdf’s computed For each sample in X1 For each sample in Xr Place an impulse with height p1,..,pr at x=f(v1,..,vr) Apply B and R algorithms to this rv.

  46. Algorithm for the B and R Operators Find maximum and minimum values wi within impulses Divide this range into M bins For each bin Place a quantizing impulse at the center of the bin with a height pi equal to the sum of all impulses within bin Find maximum probability, pi-max, of quantized impulses within bins Eliminate impulses within bins which have a quantized impulse with smaller probability than error-rate*pi-max Find new maximum and minimum values wi within impulses Divide this range into N bins For each bin Place an impulse at the center of the bin with height equal to sum of all impulses within bin

  47. Pdf of Vth Pdf of ID Monte Carlo – FDPP Comparison solid : FDPP dotted : Monte Carlo • A close match is observed after interpolation

  48. Monte Carlo – FDPP Comparison with a Low Sample Number Pdf of F Pdf of F solid : FDPP,100 samples noisy : Monte Carlo, 1000 and 100000 samples respectively • Monte Carlo inaccurate for moderate number of samples • Indicates FDPP can be manually applied without major accuracy degradation

  49. Benchmark example Pdf of n7 Monte Carlo – FDPP Comparison solid : FDPP dotted : Monte Carlo triangles:belief propagation • Edges define a linear sum, ex: n5=n2+n3

  50. Benchmark example Pdf of n7 Faulty Application of Monte Carlo solid : FDPP dotted : Monte Carlo triangles:belief propagation • Distributions at internal nodes n4, n5, n6 should be re-sampled using Monte Carlo • Not optimal for internal nodes with non-standard distributions

More Related