1 / 52

Intelligent Systems: Advanced Adversarial Search

Notes adapted from lecture notes for CMSC 421 by B.J. Dorr. Intelligent Systems: Advanced Adversarial Search. Stefan Schlobach With slides from Tom Lenaerts and others. Planet wars. Players Information (imperfect). Game states (perfect). Part 1. Recap Minmax Heuristics.

isaacdavis
Télécharger la présentation

Intelligent Systems: Advanced Adversarial Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Notes adapted from lecture notes for CMSC 421 by B.J. Dorr Intelligent Systems: Advanced Adversarial Search Stefan Schlobach With slides from Tom Lenaerts and others

  2. Planet wars IS: games

  3. Players Information (imperfect) Game states (perfect) IS: games

  4. Part 1 RecapMinmaxHeuristics IS: games

  5. Important: No online search yet While we apply MinMax, the environment does NOT change! IS: Advanced Search

  6. Minimax Algorithm function MINIMAX-DECISION(state) returns an action inputs: state, current state in game vMAX-VALUE(state) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  - ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s)) return v function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s)) return v IS: games

  7. Utility versus heuristics • Utility: value based on quality of the state • Wins with 1 innings and three wickets • Player X gets 33 points, player I 69 • Player X wins with 3 points, by 1 point • Heuristics: value based on estimation of the quality of the state • 2 pawns and a bishop is stronger than a castle. • Playing the trump As is better than a random jack (disputable) IS: games

  8. Restrict search depth (and estimate quality of nodes) 3 MAX MIN 3 0 2 MAX 3 9 0 7 2 6 MIN 2 3 5 9 0 7 4 2 1 5 6

  9. From perfect to imperfect information • Minimax requires too much leaf-node evaluations. • May be impractical within a reasonable amount of time. • SHANNON (1950): • Sacrifice perfect information for performance Interestingenoughthis is theopposite of what we will do withPhase 1 later this week: turn imperfect information into perfect one, and sample over all belief states IS: games

  10. Heuristic EVAL • Idea: produce an estimate of the expected utility of the game from a given position. • Performance depends on quality of EVAL. • Requirements: • EVAL should order terminal-nodes in the same way as UTILITY. • Computation may not take too long. • For non-terminal states the EVAL should be strongly correlated with the actual chance of winning. • Only useful for quiescent (no wild swings in value in near future) states IS: games

  11. Heuristic EVAL example Addition assumes independence Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s) IS: games

  12. Heuristic difficulties: The immortal game (21 June 1851) IS: games

  13. Horizon effect Fixed depth search thinks it can avoid the queening move IS: games

  14. Week 3: Learning Heuristics IS: games

  15. The good news (Schnapsen phase 2) I X Max Min Max Min Max IS: Problem Solving

  16. The bad news 1 (Schnapsen phase 2) I X Max Min Max 5! * 5! = 14.400 6! * 6! = 518.400 Min Max IS: Problem Solving

  17. The bad news 2 (Schnapsen phase 1) ? ? ? ? ? I X Max ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Min Max ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Min ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Max IS: Problem Solving

  18. What’s next? • Trees are too big to systematically search (alpha-beta pruning) • Imperfect Information Games by Perfect Information Monte-Carlo Sampling IS: Problem Solving

  19. But before that…. (25 minutes refreshment) IS: games

  20. Part 1 alpha-betapruning: efficientMinmax IS: games

  21. The taming of the beast (Part 2) IS: games

  22. The bad news 1 (Schnapsen phase 2) I X Max Min Max 5! * 5! = 14.400 6! * 6! = 518.400 Min Max IS: Problem Solving

  23. Problem of minimax search • Number of games states is exponential to the number of moves. • Solution: Do not examine every node • ==> Alpha-beta pruning • Alpha = value of best choice found so far at any choice point along the MAX path • Beta = value of best choice found so far at any choice point along the MIN path • Revisit example … IS: games

  24. Alpha-Beta Example Do DF-search until first leaf Range of possible values [-∞,+∞] [-∞, +∞] IS: games

  25. Alpha-Beta Example (continued) [-∞,+∞] [-∞,3] IS: games

  26. Alpha-Beta Example (continued) [-∞,+∞] [-∞,3] IS: games

  27. Alpha-Beta Example (continued) [3,+∞] [3,3] IS: games

  28. Alpha-Beta Example (continued) [3,+∞] This node is worse for MAX [3,3] [-∞,2] IS: games

  29. Alpha-Beta Example (continued) , [3,14] [3,3] [-∞,2] [-∞,14] IS: games

  30. Alpha-Beta Example (continued) , [3,5] [3,3] [−∞,2] [-∞,5] IS: games

  31. Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [−∞,2] IS: games

  32. Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [-∞,2] IS: games

  33. Pauze? IS: games

  34. Alpha-Beta Algorithm function ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game vMAX-VALUE(state, - ∞ , +∞) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  - ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s,  , )) ifv ≥ then returnv  MAX( ,v) return v IS: games

  35. Alpha-Beta Algorithm function MIN-VALUE(state,  , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  + ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s,  , )) ifv ≤ then returnv  MIN( ,v) return v IS: games

  36. Comments about Alpha-Beta Pruning • Pruning does not affect final results • Entire subtrees can be pruned. • Good move ordering improves effectiveness of pruning • With “perfect ordering,” time complexity is O(bm/2) • Alpha-beta pruning can look twice as far as minimax in the same amount of time IS: games

  37. More on milestone 1 • We needtoimplementPhase 2 extremelyefficiently (youwillsee later why). • So, on top of standard MinMaxyoushouldalsoimplementalpha-betapruning. • (andmaybe we willnotuseeither) IS: games

  38. Part 3 Search withno or partial information IS: Advanced Search

  39. Search with no or partial information • Partial knowledge of states and actions: • contingency problem • Percepts provide new information about current state; often interleave search and execution. • If uncertainty is caused by actions of another agent: • exploration problem • When states and actions of the environment are unknown. • sensorless or conformant problem • Agent may have no idea where it is; solution (if any) is a sequence. IS: Advanced Search

  40. Sensorless problems • start in {1,2,3,4,5,6,7,8} e.g Right goes to {2,4,6,8}. Solution?? • [Right, Suck, Left,Clean] -> 7 • When the world is not fully observable: reason about a set of states that might be reached =belief state IS: Advanced Search

  41. Sensorless problems • Search space of belief states • Solution = belief state with all members goal states. • If S states then 2S belief states. IS: Advanced Search

  42. Belief state of vacuum-world IS: Advanced Search

  43. Part 3 Games withpartial information SchnapsenPhase 1 IS: games

  44. The bad news (Schnapsen phase 1) ? ? ? ? ? I X Max ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Min Max ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Min ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Max IS: Problem Solving

  45. Uncertainty in Schnapsen • There is no chance (once the cards are distributed) just uncertainty • Uncertainty implies Imperfect Information Game. IS: Problem Solving

  46. Players Information (imperfect) Game states (perfect) IS: games

  47. Will simple MinMax work? IS: games

  48. Belief states (Many of them) IS: games

  49. The full search tree for Schnapsen? 14 over 5 * 4 Schnapsen: a simple game? A simple problem? IS: games

  50. Perfect Information Monte-Carlo Sampling Phase 1 Allpossible belief spaces MinMax MinMax MinMax Phase 2 IS: games

More Related