1 / 63

Temporal Probabilistic Models

Temporal Probabilistic Models. Reminders. HW5 due on Thursday HW6 out on Tuesday ~1 month until final projects due. Motivation. Observing a stream of data Health monitoring (of people, computer systems, etc) Surveillance, tracking Finance & economics Science Questions:

illias
Télécharger la présentation

Temporal Probabilistic Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Temporal Probabilistic Models

  2. Reminders • HW5 due on Thursday • HW6 out on Tuesday • ~1 month until final projects due

  3. Motivation • Observing a stream of data • Health monitoring (of people, computer systems, etc) • Surveillance, tracking • Finance & economics • Science • Questions: • Modeling & forecasting • Unobserved variables

  4. Time Series Modeling • Time occurs in steps t=0,1,2,… • Time step can be seconds, days, years, etc • State variable Xt, t=0,1,2,… • For partially observed problems, we see observations Ot, t=1,2,… and do not see the X’s • X’s are hidden variables (aka latent variables)

  5. Modeling Time • Arrow of time • Causality? Bayesian networks to the rescue Causes Effects

  6. X0 X1 X2 X3 Probabilistic Modeling • For now, assume fully observable case • What parents? X0 X1 X2 X3

  7. X0 X0 X0 X0 X1 X1 X1 X1 X2 X2 X2 X2 X3 X3 X3 X3 Markov Assumption • Assume Xt+k is independent of all Xi for i<tP(Xt+k | X0,…,Xt+k-1) = P(Xt+k | Xt,…,Xt+k-1) • K-th order Markov Chain Order 0 Order 1 Order 2 Order 3

  8. X0 X1 X2 X3 1st order Markov Chain • MC’s of order k>1 can be converted into a 1st order MC[left as exercise] • So w.o.l.o.g., “MC” refers to a 1st order MC

  9. Inference in MC • What independence relationships can we read from the BN? X0 X1 X2 X3 Observe X1 X0 independent of X2, X3, … P(Xt|Xt-1) known as transition model

  10. Inference in MC • Prediction: the probability of future state? • P(Xt) = Sx0,…,xt-1P (X0,…,Xt-1) = Sx0,…,xt-1P (X0) Px1,…,xt P(Xi|Xi-1)= Sxt-1P(Xt|Xt-1) P(Xt-1) • “Blurs” over time, and approaches stationary distribution as t grows • Limited prediction power • Rate of blurring known as mixing time

  11. How does the Markov assumption affect the choice of state? • Suppose we’re tracking a point (x,y) in 2D • What if the point is… • A momentumless particlesubject to thermal vibration? • A particle with velocity? • A particle with intent, likea person?

  12. How does the Markov assumption affect the choice of state? • Suppose the point is the position of our robot, and we observe velocity and intent • What if: • Terrain affectsspeed? • Battery levelaffects speed? • Position is noisy, e.g. GPS?

  13. Is the Markov assumption appropriate for: • A car on a slippery road? • Sales of toothpaste? • The stock market?

  14. X0 X1 X2 X3 Partial Observability • Hidden Markov Model (HMM) • Roughly equivalent to a Dynamic Bayesian Network (DBN) Hidden state variables Observed variables O1 O2 O3 P(Ot|Xt) called the observation model (or sensor model)

  15. X0 X1 X2 X3 Inference in HMMs • Filtering • Prediction • Smoothing, aka hindsight • Most likely explanation O1 O2 O3

  16. Inference in HMMs • Filtering • Prediction • Smoothing, aka hindsight • Most likely explanation Query variable X0 X1 X2 O1 O2

  17. Filtering • Name comes from signal processing • P(Xt|o1:t) = Sxt-1P(xt-1|o1:t-1)P(Xt|xt-1,ot) • P(Xt|Xt-1,ot) = P(ot|Xt-1,Xt)P(Xt|Xt-1)/P(ot|Xt-1) = a P(ot|Xt)P(Xt|Xt-1) Query variable X0 X1 X2 O1 O2

  18. Filtering • P(Xt|o1:t) = aSxt-1P(xt-1|o1:t-1) P(ot|Xt)P(Xt|xt-1) • Forward recursion • If we keep track of P(Xt|o1:t)=> O(1) updates for all t! Query variable X0 X1 X2 O1 O2

  19. Inference in HMMs • Filtering • Prediction • Smoothing, aka hindsight • Most likely explanation Query X0 X1 X2 X3 O1 O2 O3

  20. Prediction • P(Xt+k|o1:t) • 2 steps: P(Xt|o1:t), then P(Xt+k|Xt) • Filter then predict as with standard MC Query X0 X1 X2 X3 O1 O2 O3

  21. Inference in HMMs • Filtering • Prediction • Smoothing, aka hindsight • Most likely explanation Query X0 X1 X2 X3 O1 O2 O3

  22. Standard filtering to time k Smoothing • P(Xk|o1:t) for k < t • P(Xk|o1:k,ok+1:t)= P(ok+1:t|Xk,o1:k)P(Xk|o1:k)/P(ok+1:t|o1:k)= aP(ok+1:t|Xk)P(Xk|o1:k) Query X0 X1 X2 X3 O1 O2 O3

  23. Backward recursion Smoothing • Computing P(ok+1:t|Xk) • P(ok+1:t|Xk) = Sxk+1P(ok+1:t|Xk,xk+1) P(xk+1|Xk)= Sxk+1P(ok+1:t|xk+1) P(xk+1|Xk)= Sxk+1P(ok+2:t|xk+1)P(ok+1|xk+1)P(xk+1|Xk) Given prior states X0 X1 X2 X3 What’s the probability of this sequence? O1 O2 O3

  24. Inference in HMMs • Filtering • Prediction • Smoothing, aka hindsight • Most likely explanation Query returns a path through state space x0,…,x3 X0 X1 X2 X3 O1 O2 O3

  25. MLE: Viterbi Algorithm • Recursive computation of max likelihood of path to all xt in Val(Xt) • mt(Xt) = maxx1:t-1 P(x1,…,xt-1,Xt|o1:t) =a P(ot|Xt) maxxt-1P(Xt|xt-1) mt-1(xt-1) • Previous ML stateargmaxxt-1P(Xt|xt-1) mt-1(xt-1) Does this sound familiar?

  26. MLE: Viterbi Algorithm • Do the “logarithm trick” • log mt(Xt) = log a P(ot|Xt) + maxxt-1 [logP(Xt|xt-1) + log mt-1(xt-1) ] • View: • log a P(ot|Xt) as a reward • logP(Xt|xt-1) as a cost • log mt(Xt) as a value function • Bellman equation

  27. MLE: Viterbi Algorithm • Do the “logarithm trick” • log mt(Xt) = log a P(ot|Xt) + maxxt-1 [logP(Xt|xt-1) + log mt-1(xt-1) ] • View: • log a P(ot|Xt) as a reward • logP(Xt|xt-1) as a cost • log mt(Xt) as a value function • Bellman equation

  28. Applications of HMMs in NLP • Speech recognition • Hidden phones(e.g., ah eh ee th r) • Observed, noisy acoustic features (produced by signal processing)

  29. Phone Observation Models Phonet Model defined to be robust over variations in accent, speed, pitch, noise Featurest Signal processing Features(24,13,3,59)

  30. Phone Transition Models Phonet Phonet Good models will capture (among other things): Pronunciation of wordsSubphone structure Coarticulation effects Triphone models = order 3 Markov chain Featurest

  31. Word Segmentation • Words run together when pronounced • Unigrams P(wi) • Bigrams P(wi|wi-1) • Trigrams P(wi|wi-1,wi-2) Random 20 word samples from R&N using N-gram models Logical are as confusion a may right tries agent goal the was diesel more object then information-gathering search is Planning purely diagnostic expert systems are very similar computational approach would be represented compactly using tic tac toe a predicate Planning and scheduling are integrated the success of naïve bayes model is just a possible prior source by that time

  32. Tricks to improve recognition • Narrow the # of variables • Digits, yes/no, phone tree • Training with real user data • Real story: “Yes ma’am”

  33. Kalman Filtering • In a nutshell • Efficient filtering in continuous state spaces • Gaussian transition and observation models • Ubiquitous for tracking with noisy sensors, e.g. radar, GPS, cameras

  34. Linear Gaussian Transition Model • Consider position and velocity xt, vt • Time step h • Without noise xt+1 = xt + h vtvt+1 = vt • With Gaussian noise of std s1 P(xt+1|xt)  exp(-(xt+1 – (xt + h vt))2/(2s12) i.e. xt+1 ~ N(xt + h vt, s1)

  35. vh s1 Linear Gaussian Transition Model • If prior on position is Gaussian, then the posterior is also Gaussian N(m,s)  N(m+vh,s+s1)

  36. Linear Gaussian Observation Model • Position observation zt • Gaussian noise of std s2 zt ~ N(xt,s2)

  37. Observation probability Linear Gaussian Observation Model • If prior on position is Gaussian, then the posterior is also Gaussian Posterior probability Position prior •  (s2z+s22m)/(s2+s22) s2s2s22/(s2+s22)

  38. Multivariate Case • Transition matrix F, covariance Sx • Observation matrix H, covariance Szmt+1 = F mt + Kt+1(zt+1 – HFmt)St+1 = (I - Kt+1)(FStFT + Sx)WhereKt+1= (FStFT + Sx)HT(H(FStFT + Sx)HT +Sz)-1 • Got that memorized?

  39. Properties of Kalman Filter • Optimal Bayesian estimate for linear Gaussian transition/observation models • Need estimates of covariance… model identification necessary • Extensions to nonlinear transition/observation models work as long as they aren’t too nonlinear • Extended Kalman Filter • Unscented Kalman Filter

  40. Properties of Kalman Filter • Optimal Bayesian estimate for linear Gaussian transition/observation models • Need estimates of covariance… model identification necessary • Extensions to nonlinear systems • Extended Kalman Filter: linearize models • Unscented Kalman Filter: pass points through nonlinear model to reconstruct gaussian • Work as long as systems aren’t too nonlinear

  41. Non-Gaussian distributions • Gaussian distributions are a “lump” Kalman filter estimate

  42. Non-Gaussian distributions • Integrating continuous and discrete states “up” “down” Splitting with a binary choice

  43. Example: Failure detection • Consider a battery meter sensor • Battery = true level of battery • BMeter = sensor reading • Transient failures: send garbage at time t • Persistent failures: send garbage forever

  44. Example: Failure detection • Consider a battery meter sensor • Battery = true level of battery • BMeter = sensor reading • Transient failures: send garbage at time t • 5555500555… • Persistent failures: sensor is broken • 5555500000…

  45. Dynamic Bayesian Network (Think of this structure “unrolled” forever…) Batteryt-1 Batteryt BMetert BMetert ~ N(Batteryt,s)

  46. Dynamic Bayesian Network Batteryt-1 Batteryt BMetert BMetert ~ N(Batteryt,s) Transient failure model P(BMetert=0 | Batteryt=5) = 0.03

  47. With model Without model Results on Transient Failure Meter reads 55555005555… Transient failure occurs E(Batteryt)

  48. Results on Persistent Failure Meter reads 5555500000… Persistent failure occurs E(Batteryt) With transient model

  49. Persistent Failure Model Brokent-1 Brokent Batteryt-1 Batteryt BMetert BMetert ~ N(Batteryt,s) P(BMetert=0 | Batteryt=5) = 0.03 P(BMetert=0 | Brokent) = 1

  50. With persistent failure model Results on Persistent Failure Meter reads 5555500000… Persistent failure occurs E(Batteryt) With transient model

More Related