1 / 46

AI in game (IV)

AI in game (IV). Oct. 11, 2006. So far. Artificial Intelligence: A Modern Approach Stuart Russell and Peter Norvig Prentice Hall, 2 nd ed. Chapter 1: AI taxonomy Chapter 2: agents Chapter 3: uninformed search Chapter 4: informed search. From now on.

rayya
Télécharger la présentation

AI in game (IV)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AI in game (IV) Oct. 11, 2006

  2. So far • Artificial Intelligence: A Modern Approach • Stuart Russell and Peter Norvig • Prentice Hall, 2nd ed. • Chapter 1: AI taxonomy • Chapter 2: agents • Chapter 3: uninformed search • Chapter 4: informed search

  3. From now on • Artificial Intelligence: A Modern Approach • Chapter 4. • Chapter 6: adversarial search • Network part • Learning (maybe from the same textbook) • Game AI techniques

  4. Outline • Ch 4. informed search • Online search • Ch 6. adversarial search • Optimal decisions • α-β pruning • Imperfect, real-time decisions

  5. Offline search vs. online search • Offline search agents • Compute a solution before setting foot in the real world • Online search agents • Interleave computation and action • E.g. takes an action and then observes environments and then computes the next action • Necessary for an exploration problem • States and actions are unknown • E.g. robot in a new building, or labyrinth

  6. online search problems • Agents are assumed to know only • Actions(s): returns a list of actions allowed in states • c(s,a,s’): this step-cost cannot be used until the agent knows that s’ is the outcome • Goal-test(s) • The agent cannot access the successors of a state except by actually trying all the actions in that state • Assumptions • The agent can recognize a state that it has visited before • Actions are deterministic • Optionally, an admissible heuristic function

  7. online search problems • If some actions are irreversible, the agent may reach a dead end • If some goal state is reachable from every reachable state, the state space is “safely explorable”

  8. online search agents • Online algorithm can expand only a node that it physically occupies • Offline algorithms can expand any node in fringe • Same principle as DFS

  9. Online DFS function ONLINE_DFS-AGENT(s’) return an action input:s’, a percept identifying current state static: result, a table of the next state, indexed by action and state, initially empty unexplored, a stack that lists, for each visited state, the action not yet tried unbacktracked, a stack that lists, for each visited state, the predecessor states to which the agent has not yet backtracked s, a, the previous state and action, initially null if GOAL-TEST(s’) then return stop ifs’ is a new state thenunexplored[s’]  ACTIONS(s’) ifs is not null thendo result[a,s]  s’ add s to the front of unbackedtracked[s’] ifunexplored[s’] is emptythen ifunbacktracked[s’] is empty then return stop elsea an action b such that result[b, s’]=POP(unbacktracked[s’]) else a  POP(unexplored[s’]) s  s’ returna

  10. Online DFS, example • Assume maze problem on 3x3 grid. • s’ = (1,1) is initial state • result, unexplored (UX), unbacktracked (UB), … are empty • s, a are also empty s’

  11. Online DFS, example • GOAL-TEST((1,1))? • s’ != G thus false • (1,1) a new state? • True • ACTION((1,1)) → UX[(1,1)] • {RIGHT,UP} • s is null? • True (initially) • UX[(1,1)] empty? • False • POP(UX[(1,1)]) → a • a=UP • s = (1,1) • Return a s’=(1,1) s’

  12. Online DFS, example • GOAL-TEST((1,2))? • s’ != G thus false • (1,2) a new state? • True • ACTION((1,2)) → UX[(1,2)] • {DOWN} • s is null? • false (s=(1,1)) • result[UP,(1,1)] ← (1,2) • UB[(1,2)]={(1,1)} • UX[(1,2)] empty? • False • a=DOWN, s=(1,2) • return a s’=(1,2) s’ s

  13. Online DFS, example • GOAL-TEST((1,1))? • s’ != G thus false • (1,1) a new state? • false • s is null? • false (s=(1,2)) • result[DOWN,(1,2)] ← (1,1) • UB[(1,1)] = {(1,2)} • UX[(1,1)] empty? • False • a=RIGHT, s=(1,1) • return a s’=(1,1) s s’

  14. Online DFS, example • GOAL-TEST((2,1))? • s’ != G thus false • (2,1) a new state? • True, UX[(2,1)]={RIGHT,UP,LEFT} • s is null? • false (s=(1,1)) • result[RIGHT,(1,1)] ← (2,1) • UB[(2,1)]={(1,1)} • UX[(2,1)] empty? • False • a=LEFT, s=(2,1) • return a s’=(2,1) s s’

  15. Online DFS, example • GOAL-TEST((1,1))? • s’ != G thus false • (1,1) a new state? • false • s is null? • false (s=(2,1)) • result[LEFT,(2,1)] ← (1,1) • UB[(1,1)]={(2,1),(1,2)} • UX[(1,1)] empty? • True • UB[(1,1)] empty? False • a = an action b such that result[b,(1,1)]=(2,1) • b=RIGHT • a=RIGHT, s=(1,1) • Return a • And so on… s’=(1,1) s’ s

  16. Online DFS • Worst case each node is visited twice. • An agent can go on a long walk even when it is close to the solution. • An online iterative deepening approach solves this problem. • Online DFS works only when actions are reversible.

  17. Online local search • Hill-climbing is already online • One state is stored. • Bad performance due to local maxima • Random restarts impossible. • Solution1: Random walk introduces exploration • Selects one of actions at random, preference to not-yet-tried action • can produce exponentially many steps

  18. Online local search • Solution 2: Add memory to hill climber • Store current best estimate H(s) of cost to reach goal • H(s) is initially the heuristic estimate h(s) • Afterward updated with experience (see below) • Learning real-time A* (LRTA*) The current position of agent

  19. Learning real-time A*(LRTA*) function LRTA*-COST(s,a,s’,H) return an cost estimate if s’is undefined the return h(s) else returnc(s,a,s’) + H[s’] function LRTA*-AGENT(s’) return an action input:s’, a percept identifying current state static: result, a table of next state, indexed by action and state, initially empty H, a table of cost estimates indexed by state, initially empty s, a, the previous state and action, initially null if GOAL-TEST(s’) then return stop ifs’ is a new state (not in H) thenH[s’]  h(s’) unless s is null result[a,s]  s’ H[s]  min LRTA*-COST(s,b,result[b,s],H) b ACTIONS(s) a an action b in ACTIONS(s’) that minimizes LRTA*-COST(s’,b,result[b,s’],H) s  s’ returna

  20. Outline • Ch 4. informed search • Ch 6. adversarial search • Optimal decisions • α-β pruning • Imperfect, real-time decisions

  21. Games vs. search problems • Problem solving agent is not alone any more • Multiagent, conflict • Default: deterministic, turn-taking, two-player, zero sum game of perfect information • Perfect info. vs. imperfect, or probability • "Unpredictable" opponent  specifying a move for every possible opponent reply • Time limits  unlikely to find goal, must approximate * Environments with very many agents are best viewed as economies rather than games

  22. Game formalization • Initial state • A successor function • Returns a list of (move, state) paris • Terminal test • Terminal states • Utility function (or objective function) • A numeric value for the terminal states • Game tree • The state space

  23. Tic-tac-toe: Game tree (2-player, deterministic, turns)

  24. Minimax • Perfect play for deterministic games: optimal strategy • Idea: choose move to position with highest minimax value = best achievable payoff against best play • E.g., 2-ply game: only two half-moves

  25. Minimax algorithm

  26. Problem of minimax search • Number of games states is exponential to the number of moves. • Solution: Do not examine every node • ==> Alpha-beta pruning • Remove branches that do not influence final decision • Revisit example …

  27. Alpha-Beta Example Do DF-search until first leaf Range of possible values [-∞,+∞] [-∞, +∞]

  28. Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]

  29. Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]

  30. Alpha-Beta Example (continued) [3,+∞] [3,3]

  31. Alpha-Beta Example (continued) [3,+∞] This node is worse for MAX [3,3] [-∞,2]

  32. Alpha-Beta Example (continued) , [3,14] [3,3] [-∞,2] [-∞,14]

  33. Alpha-Beta Example (continued) , [3,5] [3,3] [−∞,2] [-∞,5]

  34. Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [−∞,2]

  35. Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [-∞,2]

  36. Properties of α-β • Pruning does not affect final result • Good move ordering improves effectiveness of pruning • With "perfect ordering," time complexity = O(bm/2) doubles depth of search

  37. α is the value of the best (i.e., highest-value) choice found so far at any choice point along the path for max If v is worse than α, max will avoid it  prune that branch Define β similarly for min Why is it called α-β?

  38. The α-β pruning algorithm

  39. The α-β pruning algorithm

  40. Resource limits • In reality, imperfect and real-time decisions are required • Suppose we have 100 secs, explore 104 nodes/sec106nodes per move • Standard approach: • cutoff test: • e.g., depth limit • evaluation function • = estimated desirability of position

  41. Evaluation functions • For chess, typically linear weighted sum of features Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s) • e.g., w1 = 9 for queen, w2 = 5 for rook, … wn = 1 for pawn f1(s) = (number of white queens) – (number of black queens), etc.

  42. Cutting off search MinimaxCutoff is identical to MinimaxValue except • Terminal-Test is replaced by Cutoff-Test • Utility is replaced by Eval Does it work in practice? bm = 106, b=35  m ≈ 4 4-ply lookahead is a hopeless chess player! • 4-ply ≈ human novice • 8-ply ≈ typical PC, human master • 12-ply ≈ Deep Blue, Kasparov

  43. Games that include chance chance nodes • Backgammon: move all one’s pieces off the board • Branches leading from each chance node denote the possible dice rolls • Labeled with roll and the probability

  44. Games that include chance • [1,1], [6,6] chance 1/36, all other chance 1/18 • Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16) • Cannot calculate definite minimax value, only expected value

  45. Expected minimax value EXPECTED-MINIMAX-VALUE(n) = UTILITY(n) If n is a terminal maxs  successors(n) MINIMAX-VALUE(s) If n is a max node mins  successors(n) MINIMAX-VALUE(s) If n is a max node s  successors(n) P(s) * EXPECTEDMINIMAX(s) If n is a chance node These equations can be backed-up recursively all the way to the root of the game tree.

  46. Position evaluation with chance nodes • Left, A1 is best • Right, A2 is best • Outcome of evaluation function (hence the agent behavior) may change when values are scaled differently. • Behavior is preserved only by a positive linear transformation of EVAL.

More Related