Predictive State Representations
E N D
Presentation Transcript
Predictive State Representations Hui Li July 7, 2006
Outline • What are the advantages of predictive state representation • What’s predictive state representation (PSR) • How to learn PSR model • Conclusions
What are the advantages of PSR • PSR are expressed entirely on observable quantities • PSR avoids the problems of local minima and saddle points in learning the model of POMDP • PSR attain generality and compactness at least equal to POMDP
What are predictive state representations (1/9) Two notations in PSR • History (h) • History is the sequence of action-observation (ao) pair that the agent has already experienced, beginning at the first time step • Test (t) • Test is a sequence of ao pair that begins immediately after a history
History Test … … o2 ok o1 a2 ak a1 o1 a2 o2 a3 o3 aj oj a1 What are predictive state representations (2/9) Prediction of a test p(t|h)
What are predictive state representations (3/9) System-dynamics matrix D
What are predictive state representations (4/9) Order of all possible tests in D hi Properties of the predictions in each row of D hi
What are predictive state representations (5/9) Relation between PSR and POMDP Belief state is updated according to Bayes rule Constructing D from a POMDP
What are predictive state representations (7/9) Since the rank of D k, there must exit at most k linearly independent columns or rows in D. • Core tests QT • The tests corresponding to the k linearly independent columns • are called core tests. • Core histories Qh • The histories corresponding to the k linearly independent rows • are called core histories.
What are predictive state representations (9/9) Linear PSR model Definition D(Q) is a linear sufficient statistic of the histories since all the columns of D are a linear combination of the columns in D(Q). PSR State update
How to learn PSR model (1/6) Two subproblems in learning PSR model • Discovery: find the core tests QT which predictions constitutes state (sufficient statistic) • Learning: learn the parameters maot that define the system dynamics.
How to learn PSR model (2/6) The set of tests and histories corresponding to a set of linearly independent columns and rows of any submatrix of Dare subsets of core-tests and core-histories respectively. Infinite Matrix Finite, small matrix
How to learn PSR model (3/6) Analytical Discovery and Learning Algorithm (ADL) • Assumption: the exact D is obtained • Analytical discovery algorithm (AD) • Analytical learning algorithm (AL)
All tests up to length 1 Until converge Linearly independent T1 Extend one step . . . H1 All histories up to length 1 How to learn PSR model (4/6) • Analytical discovery algorithm (AD)
How to learn PSR model (5/6) 2. Analytical learning algorithm (AD) Since Then
How to learn PSR model (6/6) Estimate the system-dynamic matrix D
Conclusions • New dynamical systems – predictive state representations (PSR) is introduced which is grounded in actions and observations. • An algorithm is introduced – analytical discovery and learning (ADL) to learn the PSR model
References • James, M. R., & Singh, S. (2004). Learning and discovery of predictive state representations in dynamical systems with reset. Proceedings of the 21st International Conference on Machine Learning (ICML) (pp. 719–726). • Littman, M., Sutton, R. S., & Singh, S. (2002). Predictive representations of state. Advances in Neural Information Processing Systems 14 (NIPS) (pp. 1555–1561). MIT Press. • McCracken, P., & Bowling, M. (2006). Online learning of predictive state representations. Advances in Neural Information Processing Systems 18 (NIPS). MIT Press. To appear. • Singh, S., James, M. R., & Rudary, M. R. (2004). Predictive state representations: A new theory for modeling dynamical systems. Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI) (pp. 512–519). • Singh, S., Littman, M., Jong, N., Pardoe, D., & Stone, P.(2003). Learning predictive state representations. Proceedings of the Twentieth International Conference on Machine Learning (ICML) (pp. 712–719). • Wiewiora, E. (2005). Learning predictive representations from a history. Proceedings of the 22nd International Conference on Machine Learning (ICML) (pp. 969–976). • Wolfe, B., James, M. R., & Singh, S. (2005). Learning predictive state representations in dynamical systems without reset. Proceedings of the 22nd International Conference on Machine Learning (ICML) (pp. 985–992). • Bowling, M., McCracken, P., James, M., Neufeld J., & Wilkinson, D. (2006). Learning predictive state representations using non-blind polices. ICML 2006