1 / 64

Regularization of energy-based representations

This document outlines the process of regularization for energy-based representations using stabilizing functions such as membrane stabilizers and thin plate stabilizers. The energy function measures compatibility between observations and data. The overall energy is minimized by finding the solution for u, obtained by minimizing E(u).

adrianac
Télécharger la présentation

Regularization of energy-based representations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regularization of energy-based representations • Minimize total energy lEp(u) + (1-l)Ed(u,d) • Ep(u) : Stabilizing function - a smoothness constraint • Membrane stabilizer: • Ep(u) = 0.5Si,j [(ui,j+1– ui,j)2 + (ui+1,j– ui,j) 2] • Thin plate stabilizer • Ep(u) = 0.5Si,j [(ui,j+1+ ui,j-1– 2ui,j)2 + (ui+1,j+ ui-1,j– 2ui,j)2 + (ui+1,j+1+ ui,j– ui+1,j – ui,j+1)2] • Linear combinations of the two • Ed(u,d) : Energy function, measures compatibility between observations and data • Ed(u,d) = 0.5Si,j ci,j (di,j– ui,j)2 • ci,j is the inverse of the variance in measurement di,j

  2. Stabilizing function – membrane stabilizer Ep(u) = 0.5Si,j[ (ui,j+1– ui,j)2 + (ui+1,j– ui,j)2 ] ui,j i j

  3. Stabilizing function – membrane stabilizer Ep(u) = 0.5Si,j[ (ui,j+1– ui,j)2 + (ui+1,j– ui,j)2 ] ui,j i j

  4. Stabilizing function – membrane stabilizer Ep(u) = 0.5Si,j[ (ui,j+1– ui,j)2 + (ui+1,j– ui,j)2 ] ui,j i j

  5. Stabilizing function – membrane stabilizer ATOM Ep(u) = 0.5Si,j[ (ui,j+1– ui,j)2 + (ui+1,j– ui,j)2 ] ui,j i j

  6. Stabilizing function – membrane stabilizer ATOM • ui,jui,j + ui,j+1ui,j+1 – ui,jui,j+1 – ui,j+1ui,j Ep(u) = 0.5Si,j[ (ui,j+1– ui,j)2 + (ui+1,j– ui,j)2 ] ui,j i j 1 -1

  7. Stabilizing function – membrane stabilizer ATOM • ui,jui,j + ui+1,jui+1,j – ui,jui+1,j – ui+1,jui,j Ep(u) = 0.5Si,j[ (ui,j+1– ui,j)2 + (ui+1,j– ui,j)2 ] ui,j i j -1 2 -1

  8. Stabilizing function – membrane stabilizer ATOM • ui,jui,j + ui,j+1ui,j+1 – ui,jui,j+1 – ui,j+1ui,j Ep(u) = 0.5Si,j[ (ui,j+1– ui,j)2 + (ui+1,j– ui,j)2 ] ui,j i j -1 -1 3 -1

  9. Stabilizing function – membrane stabilizer ATOM • ui,jui,j + ui+1,jui+1,j – ui,jui+1,j – ui+1,jui,j Ep(u) = 0.5Si,j[ (ui,j+1– ui,j)2 + (ui+1,j– ui,j)2 ] ui,j i j -1 -1 4 -1 -1

  10. Stabilizing function – membrane stabilizer ATOM • ui,jui,j + ui+1,jui+1,j – ui,jui+1,j – ui+1,jui,j Ep(u) = 0.5Si,j[ (ui,j+1– ui,j)2 + (ui+1,j– ui,j)2 ] ui,j i j -1 -1 4 -1 -1 = u

  11. Stabilizing function – membrane stabilizer ATOM • ui,jui,j + ui+1,jui+1,j – ui,jui+1,j – ui+1,jui,j Ep(u) = 0.5Si,j[ (ui,j+1– ui,j)2 + (ui+1,j– ui,j)2 ] ui,j i j -1 -1 4 -1 -1 = u • Ep(u) = 0.5uTApu • Rows of Ap have the form • 0 0 0 –1 0 0 …. 0 –1 4 –1 0 … 0 0 –1 0 ..

  12. Stabilizing function – thin plate stabilizer Ep(u) = 0.5Si,j{[(ui,j+1– ui,j) + (ui,j-1– ui,j)]2 + [(ui+1,j– ui,j) + (ui-1,j– ui,j)]2+ 2[(ui+1,j+1– ui,j) + (ui,j– ui+1,j) + (ui,j–ui,j+1)]2} ui,j i j

  13. Stabilizing function – thin plate stabilizer Ep(u) = 0.5Si,j{[(ui,j+1– ui,j) + (ui,j-1– ui,j)]2 + [(ui+1,j– ui,j) + (ui-1,j– ui,j)]2+ 2[(ui+1,j+1– ui,j) + (ui,j– ui+1,j) + (ui,j–ui,j+1)]2} ui,j i j

  14. Stabilizing function – thin plate stabilizer Ep(u) = 0.5Si,j{[(ui,j+1– ui,j) + (ui,j-1– ui,j)]2 + [(ui+1,j– ui,j) + (ui-1,j– ui,j)]2+ 2[(ui+1,j+1– ui,j) + (ui,j– ui+1,j) + (ui,j–ui,j+1)]2} ui,j i j

  15. Stabilizing function – thin plate stabilizer Ep(u) = 0.5Si,j{[(ui,j+1– ui,j) + (ui,j-1– ui,j)]2 + [(ui+1,j– ui,j) + (ui-1,j– ui,j)]2+ 2[(ui+1,j+1– ui,j) + (ui,j– ui+1,j) + (ui,j–ui,j+1)]2} ui,j i j

  16. Stabilizing function – thin plate stabilizer Ep(u) = 0.5Si,j{[(ui,j+1– ui,j) + (ui,j-1– ui,j)]2 + [(ui+1,j– ui,j) + (ui-1,j– ui,j)]2+ 2[(ui+1,j+1– ui,j) + (ui,j– ui+1,j) + (ui,j–ui,j+1)]2} ui,j i j 1 -8 2 2 ATOM: 1 -8 -8 1 20 -8 2 2 1 • Ep(u) = 0.5uTApu • Rows of Ap have the form • 0 0 1 0 0 ... 0 2 –8 2 0 0 .. 1 –8 20 –8 1 0 0 …0 0 2 –8 2 0 .. 0 –1 0 ..

  17. Stabilizing function – Examples (1-D) points membrane thin plate thin plate + membrane

  18. Stabilizing function – Examples (2-D) membrane Samples from u thin plate membrane + thin plate

  19. Stabilizing function – Examples (2-D) membrane Samples from u thin plate membrane + thin plate

  20. Energy function • Data on grid • di,j= ui,j +ei,j (ei,j is N(0,s2)) • Ed(u,d) = 0.5Si,j ci,j (di,j– ui,j)2 (ci,j = s-2) • Data off grid • dk = h0,0 ui,j +h0,1 ui,j+1 + h1,0 ui+1,j + h1,1 ui+1,j+1 +ei,j • Ed(u,d) = 0.5Sk ck (dk,– Hku)2 • In all examples here we assume data on grid • Ed(u,d) = 0.5 (u-d)TAd(u-d) • Ad = s-2 I : measurement variance assumed constant for all data

  21. Overall energy • E(u) = lEp(u) + (1-l)Ed(u,d) (l isregularization factor) = 0.5{luTApu + (1-l)(u-d)TAd(u-d)} = 0.5uTAu – uTb + const • Where A = Ap + (1-l)Ad b = (1-l) Ad d • Solution for u can be directly obtained by minimizing E(u) u = A-1 b

  22. Minimizing overall energy 1-D (l = 0.5) From noisy observation membrane No observation noise From noisy observation thin plate No observation noise

  23. Minimizing overall energy 2-D (l = 0.5) Original Noisy Added 0 mean unit variance Gaussian noise to all elements

  24. Minimizing overall energy 2-D (l = 0.5) Original From Noisy membrane thin plate

  25. Minimizing overall energy 2-D (l = 0.5) Original Noisy Added 0 mean unit variance Gaussian noise to all elements

  26. Minimizing overall energy 2-D (l = 0.5) Original From Noisy membrane thin plate

  27. Minimizing energy by Relaxation • Direct computation of A-1is inefficient • Large matrices: for a 256x256 grid, Ahas size 65536 x 65536 • Sparseness of A not utilized: only a small fraction of elements have non zero values • Relaxation replaces inversion of A with many local estimates ui+ = ai,i-1(bi – Sai,juj) • Updates can be done in parallel • All local computations very simple • Can be slow to converge

  28. Minimizing energy by relaxation 1-D (l = 0.5) Membrane 100 iters 500 iters 1000 iters

  29. Minimizing energy by relaxation 1-D (l = 0.5) Thin plate: much slower to converge 1000 iters 10000 iters 100000 iters

  30. Minimizing energy by relaxation 2-D (l = 0.5) Original 1000 iters Membrane 10000 iters 100000 iters

  31. Minimizing energy by relaxation 2-D (l = 0.5) Original 1000 iters Thin plate: much slower to converge 10000 iters 100000 iters

  32. Prior Models • A Boltzmann distribution based on the stabilizing function P(u) = K.exp(-Ep(u)/Tp) • K is a normalizing constant, Tp is temperature • Samples can be generated by repeated sampling of local distributions: P(ui|u) • P(ui|u) = Ziexp(-ai,i-1(ui – ui+)/2Tp) • ui+ = ai,i-1(bi – Sai,juj) • This is the local estimate of ui in the relaxation method • The variance of the local sample is Tp/ai,i

  33. Samples from prior distribution 1-D Membrane stabilizer based Boltzmann Thin plate stabilizer based Boltzmann

  34. Samples from prior distribution 2-D Membrane prior

  35. Samples from prior distribution 2-D Thin plate prior

  36. Sampling prior distributions • Samples are fractal • Tend to favour high frequencies • Multi-grid sampling to get smoother samples: Initially generate sample for a very coarse grid

  37. Sampling prior distributions • Samples are fractal • Tend to favour high frequencies • Multi-grid sampling to get smoother samples: Interpolate from coarse grid to finer grid, use the interpolated values to initilize gibbs sampling for a less coarse grid.

  38. Sampling prior distributions • Samples are fractal • Tend to favour high frequencies • Multi-grid sampling to get smoother samples: Repeat process on a finer grid

  39. Sampling prior distributions • Samples are fractal • Tend to favour high frequencies • Multi-grid sampling to get smoother samples: Final sample for entire grid

  40. Multigrid sampling of prior distribution Membrane prior Thin plate prior

  41. Sensor models • Sparse data model • Uses a simple energy function • Assumption: data points are all on grid • Only use sparse data model used in examples • Others such as force field models, optical flow, image intensity etc. not simulated for this presentation • Measurement variance assumed constant for all data points

  42. Posterior model • Simple Bayes’ rule: P(u|d) = K.exp(-Ep(u)/Tp - Ed(u)) • Also a Gibbs distribution • 1/Tp is the equivalent of the regularization factor Tp = (1-l)/ l • In following figures only thin plate prior considered

  43. Sampling the posterior model (T=1)

  44. MAP estimation from the Gibbs posterior • Restate Gibbs posterior distribution as P(u) = K.exp(-E(u)/T) • E(u) is the total energy • T again is temperature • Not to be confused with regularization term Tp • Reduce T with iterations • iteration is defined as a complete sweep through the data • Guaranteed convergence to MAP estimate as T goes to 0, provided T does not go down faster than 1/log(iter), where iter is the iteration number • In practice, much faster cooling is possible • For simple sparse data sensor model, MAP estimate must be identical to that obtained using relaxation or matrix inversion

  45. MAP estimates from posterior 1-D Relaxation 100000 iters Annealed Gibbs sampling 100000 iters

  46. MAP estimates from posterior 2-D Actual MAP solution Annealed Gibbs Sampling based MAP solution

  47. The contaminated Gaussian sensor model • Also a sparse data sensor model • Assumes measurement error has two modes • 1. A high probability, low variance Gaussian • 2. A low probability, high variance Gaussian P(di,j |u) = (1-e)N(ui,j ,s12) + e N(ui,j , s22) • 0.05 < e < 0.1 and s22 >> s12 • Posterior probability is also a mixture of Gaussians (1-e)P1(di,j |u) + eP2(di,j |u)

  48. Samples from posterior using contaminated Gaussian

  49. MAP estimates of contaminated Gaussian 1-D • For contaminated Gaussian there is no closed form estimate MAP estimate • Gibbs sampling provides a MAP estimate MAP estimate using single Gaussian sensor model MAP estimate using contaminated Gaussian sensor model

  50. MAP estimates of contaminated Gaussian 2-D MAP estimate using a single Gaussian sensor model MAP estimate using a contaminated Gaussian sensor model • For contaminated Gaussian MAP estimate obtained using annealed Gibbs sampling

More Related