1 / 31

Efficiently Learning Linear-Linear Exponential Family Predictive Representations of State

This paper presents a tractable learning algorithm for efficient learning of linear-linear exponential family predictive state representations. The algorithm is evaluated through experiments and conclusions.

hisey
Télécharger la présentation

Efficiently Learning Linear-Linear Exponential Family Predictive Representations of State

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficiently LearningLinear-Linear Exponential FamilyPredictive Representations of State David Wingate* wingated@mit.edu University of Michigan Satinder Singh baveja@umich.edu University of Michigan *Now a postdoc at MIT

  2. Outline • The Exponential Family PSR • The Linear-Linear Exponential Family PSR • A tractable learning algorithm (> NIPS) • Experiments and conclusions

  3. The Exponential Family PSR • The Exponential Family PSR • The Linear-Linear Exponential Family PSR • A tractable learning algorithm • Experiments and conclusions

  4. aot–2 aot–1 aot AOt+1 AOt+2 AOt+3 AOt+4 Modeling Dynamical Systems ... ... ht 8 F | ht The goal: model the conditional distribution of the infinite future given any history

  5. aot–2 aot–1 aot AOt+1 AOt+2 AOt+3 AOt+4 Modeling Dynamical Systems ... ... ht Fn | ht Central PSR assumption: the parameters describing the conditional distribution of the short-term future are state

  6. Examples of PSRs Central assumption: the parameters describing the conditional distribution of the short-term future are state • Discrete Observations • State: Expectations of core set of indicator random variables or probabilities of specific coretests • HMMs / POMDPs • Continuous Observations • State: Parameters of Gaussian distribution modeling future n observations • Linear Dynamical System (Kalman Filter Systems)

  7. Distribution of Short-Term Future Second (EFPSR) assumption: the distribution over the short-term future has an exponential family form: These parameters will be our state!

  8. Maintaining State ... ... aot aot aot+1 AOt+1 AOt+2 AOt+2 AOt+3 AOt+3 AOt+4 AOt+4 aot AOt+1 AOt+2 AOt+3 AOt+4 ... ... ... ... st+ = extend( st, q) st+1 = condition( st+, ot+1)

  9. Learning an EFPSR Given a trajectory of T data there are three things to learn: Model parametersq Dimension n Features of the future f ( Fn ) st+ = extend( st, q)

  10. The Linear-Linear Exponential Family PSR • The Exponential Family PSR • The Linear-Linear Exponential Family PSR • A tractable learning algorithm • Experiments and conclusions

  11. Linear-Linear EFPSR Linear conditioning: Linear extension function: st+ = extension( st, q) = q st st+1 = G( ot+1 ) st+ Overall state update is linear: st+1 = G( ot+1 )q st Two useful properties: 1) Maximum likelihood gradients are easy to derive 2) Linearity will make efficient approximations possible

  12. Exact Likelihood for ML Learning Exact likelihood of the data: Importantly, the model is fully observed No latent states involved in the expression for likelihood

  13. Results of Exact ML on POMDPs Maximum likelihood learning via gradient ascent Metric: data likelihood under true model compared to likelihood under learned EFPSR Model quality is the amount of the gap between naïve and true models which the EFPSR closes Unfortunately, exact learning is intractable in large domains

  14. A Tractable Learning Algorithm • The Exponential Family PSR • The Linear-Linear Exponential Family PSR • A tractable learning algorithm • Experiments and conclusions

  15. Why is Exact ML Intractable? Cannot perform exact inference for gradients Naïve parameterization yields too many parameters st+ = extension( st, q ) = q st Approximate inference Low-rank approximation Cannot do exact inference T times per gradient step Approximate likelihood

  16. Approximate Likelihood for ML Learning Exact likelihood of the data: Double application of Jensen’s inequality Zero covariance assumption Approximate lower bound on likelihood of the data: Could be used for other models / algorithms

  17. Interpretation of Approximate Likelihood The unconditionalexpected parameters For the EFPSR, this is the stationary distribution of states! For LL-EFPSR, can be computed as solution to linear system of equations based on stationary distribution of observations

  18. Interpretation of Approximate Likelihood The unconditionalexpected features This is the stationary distribution of features Computed once from data

  19. Interpretation of Approximate Likelihood The log-partition function using the stationary distribution of states

  20. Interpretation of Approximate Likelihood Gradient: Expected features of the future induced by the model Expected features of the future observed in the data

  21. Interpretation of Approximate Likelihood Find model parametersqsuch that the model’s stationary distribution of features match the empirical stationary distribution of features Tractable because model is fully observed and state update is linear

  22. Experiments • The Exponential Family PSR • The Linear-Linear Exponential Family PSR • A tractable learning algorithm • Experiments and conclusions

  23. Evaluating with RL Approximate inference / likelihood creates a problem: how do you evaluate the learned model? Solution: use reinforcement learning Cheesemaze Maze 4x3

  24. Example: Bouncing Ball Problem Noise-free observations 110 possible observations 2nd order Markov Noisy observations 2110 possible observations No longer 2nd order Markov

  25. Example: Bouncing Ball Problem Each observation is an array of binary random variables Features of the future f( F2 | ht ) Ot+1 Ot+2 Fn | ht

  26. Example: Bouncing Ball Problem Each observation is an array of binary random variables Features of the future f( F3 | ht ) Ot+1 Ot+2 Ot+3 Fn+1 | ht

  27. Bouncing Ball Results Generalizes across observations Learning is efficient

  28. The Robot Domain ... ... ... Ot+1 Ot+2 Ot+3 Observations are camera images Extract about 1,000 binary features n=3 200,000 samples NMF inference Low-rank approximation Rank-aware line search f( Fn | ht ) constructs about 12,000 features

  29. Robot Domain Results Outperforms best 1st-order Markov and random policies A significant accomplishment for PSRs

  30. Conclusions Learning a LL-EFPSR model is straightforward Expression for ML is defined in terms of observable quantities Gradient is analytically tractable Compatible with approximate likelihood Interpretation based on stationary distributions Encouraging experimental results Almost perfect models of small systems Able to start tackling domains larger than any other PSR model Future work Tractability; approximations; convexity; feature extraction

  31. Thank you!

More Related