1 / 37

Trade off between Exploration and Exploitation in Satisficing Planning

Trade off between Exploration and Exploitation in Satisficing Planning. Fan Xie. Outline. What is Satisficing Planning Heuristic Search in Planning Why we need exploration? Analysis of Arvand Arvand -LTS: Arvand with Local MCTS Experiments. Outline. What is Satisficing Planning

herman
Télécharger la présentation

Trade off between Exploration and Exploitation in Satisficing Planning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Trade off between Exploration and Exploitation in Satisficing Planning Fan Xie

  2. Outline • What is Satisficing Planning • Heuristic Search in Planning • Why we need exploration? • Analysis of Arvand • Arvand-LTS: Arvand with Local MCTS • Experiments

  3. Outline • What is Satisficing Planning • Heuristic Search in Planning • Why we need exploration? • Analysis of Arvand • Arvand-LTS: Arvand with Local MCTS • Experiments

  4. AI Planning

  5. Satisficing Planning • Deterministic environment • Only require sub-optimal solutions • Domain Independent Planning • Implicit Representation of the search space (why not explicit representation?) • Impossible in most cases, because of huge state space • Example: • An initial state: s0 • A set of actions: A • A set of requirements of a goal state: G

  6. Outline • What is Satisficing Planning • Heuristic Search in Planning • Why we need exploration? • Analysis of Arvand • Arvand-LTS: Arvand with Local MCTS • Experiments

  7. Some Background • What is a Heuristic? • Here, tell you how close this node to objects • Greedy Best-First Search: • When expanding node n, take each successor n' and place it on one list ordered by h(n’) • Hill Climbing Search: • check neighbor nodes of current node, select the node has lower h-value than current node. (if many, the lowest) • Terminates when no neighbor node has lower h-value

  8. Heuristic Search As Planning • FF Planner • Hill climbing • FF heuristic: not admissble • Enforced Hill climbing: more exploration in hill climbing to escape from local mimima • LAMA Planner • Greedy Best-First Search (WA*) • Mixed heuristic: FF+Landmark

  9. Outline • What is Satisficing Planning • Heuristic Search in Planning • Why we need exploration? • Analysis of Arvand • Arvand-LTS: Arvand with Local MCTS • Experiments

  10. Why we need exploration? • Best First Search and Hill Climbing, mostly do greedy exploitation. • Problem: Local Minima and Plateaus

  11. Local Minima and Plateaus • Local minima: local best h-value • Plateaus: an area all nodes have the same h-value

  12. More Exploration • Current algorithms or planners directly address the tradeoff between exploration and exploitation: • RRT(not for satisficing planning) • Identidem (stochastic hill climbing) • Diverse best-first search (not published yet) • Arvand(Monte-Carlo random walk)

  13. Rapidly-Exploring Random Tree(RRT) • RRT gradually builds a tree in the search space until a path to the goal state is found. At each step the tree is either expanded towards the goal, which corresponds to exploitation, or towards a randomly selected point in the search space for exploration

  14. RRT example

  15. RRT example

  16. RRT example

  17. RRT example

  18. RRT • RRT requires complete model of the environment to generate random points for exploration. • However, current planning domains mostly provide implicit representation of the search space. • Random points might be invalid. (one possible way to do is assume it is valid) • Distribution of random points is not uniformed.

  19. Identidem • Coles and Smith’s Identidem introduces exploration by stochastic local search (SLS). • Algorithm: • Local search • action sequences chosen probabilistically from the set of all possible actions in each state • evaluates the FF heuristic after each action and immediately jumps to the first state that improves on the start state

  20. Diverse best-first search (DBFS) • diversify search directions by probabilistically selecting a node that does not have the best heuristic estimate (not published yet)

  21. Arvand Exploration using random walks helps to overcome the problem of local minima and plateaus. Jumping greedily exploits the knowledge gained by the random walks. Diff with Identidem: only the end-states of random walks are evaluated

  22. Outline • What is Satisficing Planning • Heuristic Search in Planning • Why we need exploration? • Analysis of Arvand • Arvand-LTS: Arvand with Local MCTS • Experiments

  23. Analysis of Arvand • Fast Exploration: • Exploration using random walks • Only end-states evaluated makes faster exploration (computing heuristic value takes 90% of time) • Greedy Exploitation: • Jump to the best obtained node

  24. Advantages of Arvand • escape from local minima and plateaus and quickly

  25. Coverage of Arvand(current ipc problems not hard enough)

  26. Still some problem • Problem: • Waste a lot of knowledge • Sometimes a lot of duplications

  27. Outline • What is Satisficing Planning • Heuristic Search in Planning • Why we need exploration? • Analysis of Arvand • Arvand-LTS: Arvand with Local MCTS • Experiments

  28. Arvand-LTS: Arvand with Local MCTS • Motivation: • Use more knowledge we get from random walks? • Selectively growing a search tree while running random walks

  29. Monte-Carlo Random Walk-based Local Tree Search (MRW-LTS)

  30. Framework of MCTS

  31. MRW-LTS • Every local search build a local search tree • Random walks are required starting from leaf nodes of the search tree. • Nodes in tree store the minimum h-value obtained by random walks starting from their subtrees (not node h-value) • It selects a leaf node by following an ε-greedy strategy in each node.

  32. Some Change

  33. Outline • What is Satisficing Planning • Heuristic Search in Planning • Why we need exploration? • Analysis of Arvand • Arvand-LTS: Arvand with Local MCTS • Experiments

  34. Experiments • 1, IPC-2008 • 2, big search spaces

  35. Coverage on IPC-6

  36. Coverage

  37. Summary • 1, exploration is important in satisficing planning • 2, A good balancing between exploration and exploitation might make a big difference!

More Related