300 likes | 413 Vues
This project explores the effects of state abstraction and errors within the Markov Decision Process (MDP) and Partially Observable MDP (POMDP) frameworks, particularly in a maze domain context. It evaluates the impact of lookahead search policies when faced with these errors, including problems with state transition and value function approximations. By conducting a series of experiments with varied state abstractions using artificial neural networks, the study aims to identify the optimal conditions under which effective policies can be generated and potential pitfalls mitigated. The findings will inform future research directions in the field.
E N D
CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML)
OUTLINE • Project Overview • Analytical results • Maze Domain • Experiments • Results • Conclusions and Future Work
MDP environment:Maze domain • states • actions • transitions between states • immediate rewards • Markov property
MDP environment:Maze domain • states • actions • transitions between states • immediate rewards • Markov property • optimal value V* of each state
Project Overview • Analyze MDP/POMDP domain in the presence of: • State Abstraction • Errors in state transition function • Errors in V* function due to • State Abstraction • Machine Learning • Evaluate effectiveness of lookahead search policy in the presence errors.
Questions • When is the problem MDP ? • if not MDP: can we recast the Markov property? • limited lookahead: does it helps?
No state abstraction,imperfect value function V • MDP • V now can be used as a heuristic • limited lookahead: usually admissible heuristic function • Combining lookahead with learning: • Learning Real Time A* • Real Time Dynamic Programming
“Abstracted” value function • We know where we are but the value function is the same for all states in abstracted state G
“Abstracted” value function In given abstracted state: value is the average over V* of all states in the abstracted state • not admissible • lookahead may help you to get outside the abstraction boundary
Does lookahed always help? G Depth 1
Does lookahed always help? G Depth 1
Does lookahed always help? G Depth 1
Does lookahed always help? G Depth 3
Does lookahed always help? G Depth 3
Does lookahed always help? G Depth 3
Does lookahed always help? G Depth 3
State abstraction • not Markovian • special case of POMDP • transition from one abstracted state to another and rewards depend on a history • some special cases when it are Markovian
How to recast Markov property? • If we know underlying MDP: updating belief over states Fully observed MDP in belief space • solve the belief MDP • use V* of underlying states as heuristic • Real-Time Dynamic Programming in belief space
How to recast Markov property? • If we do not know the underlying MDP: use the history as part of a state description How long path do we need to use? In general: the whole history Special cases: only part
Error in transition function • Can be crucial • Agent can be easily trapped in loops
Error in the transition function example: no state abstraction, perfect V* G 10 1 2 3 4 5 6 7 8 9 Two actions: right: left:
100% 100% Error in the transition function example: no state abstraction, perfect V* G 10 1 2 3 4 5 6 7 8 9 Two actions: right: left: real:
100% 100% Error in the transition function example: no state abstraction, perfect V* G 10 1 2 3 4 5 6 7 8 9 Two actions: right: left: real: what we think: 35% 65% 65% 35%
Experimental Setup • 48x48 cell maze • 3 Experiments • State Abstraction • Machine Learning (ANN) • State Abstraction and Machine Learning • Error Measurements • Relative Score (global policy error) • Distance to goal (sample score error)
State Abstraction Error(s) • Abstraction Tile size varied • k = 1, 2, 3, 4, 6, 8, 12, 24, 48 • Ply Depth 1 – 7 @ 10 games/ply depth
Machine Learning Error • 2 – h – 100 ANN, inputs (x,y), out V*(s) • Error achieved by varying the number of hidden nodes (h) within a NN (1-20)
Conclusion Most important results: • analysis of lookahead for “abstracted” value function: especially experimentally • demonstration of possible adverse effects of errors in transition function • answers for questions about Markov property and investigation of ways to restore it
Future Work • Improve Policy Error Evaluation Measures • Further analytical work on lookahead