350 likes | 354 Vues
Artificial Intelligence. CS 165A Thursday, October 18, 2007. Informed (heuristic) search methods (Ch 4) Adversarial search – games (Ch 6). Today. 1. Notes. Reading list typo Skip Chapter 5, read Chapter 6 HW#1, problem 4: Tic-Tac-Toe
E N D
Artificial Intelligence CS 165A Thursday, October 18, 2007 • Informed (heuristic) search methods (Ch 4) • Adversarial search – games (Ch 6) Today 1
Notes • Reading list typo • Skip Chapter 5, read Chapter 6 • HW#1, problem 4: Tic-Tac-Toe • “It’s not whether you win or lose, it’s how you play the game!” • Implement breadth-first and depth-first approaches looking for your goal (three in a row for you) • If you want to also avoid losing (three in a row for your opponent), you may, but this is optional • In depth-first search, you should stop searching when you find a goal state (i.e., don’t keep looking for the “best” goal)
Not covering in detail Review Informed Search Methods • Best-first search • Greedy best-first search • A* search • Memory bounded search • IDA* • SMA* • Iterated improvement algorithms • Hill-climbing • Simulated annealing As with blind search, the strategy is defined by choosing the order of node expansion
Review A* Search • “A* Search” is A Search with an admissible h • h is optimistic – it never overestimates the cost to the goal • h(n) true cost to reach the goal • h is a “Pollyanna” • So f(n) never overestimates the actual cost of the best solution passing through node n functionA*-SEARCH(problem, h) returns a solution or failure return BEST-FIRST-SEARCH(problem, f )
Thursday Quiz • How does an online search agentoperate? • How does it differ from an offline search agent? See Sec. 4.5
Simulated Annealing Similar to hill-climbing But includes a random element Sometimes take “bad” steps to escape local maxima Motivated by the roughly analogous physical process of annealing Heating and then slowly cooling a substance to obtain a strong crystalline structure T is temperature, E is energy A schedule determines the rate at which T is lowered Want to maximize the energy E E(state) measures the state’s “goodness” E measures increase in “goodness” resulting from new state 6
Simulated Annealing (cont.) • For the current state, evaluate the operators/actions • Instead of taking the best action, choose an action at random • If that action results in a higher “goodness” value (E > 0), take it • Otherwise, take it with probability between 0 and 1 E = -1 E = 1 Etc… E = 3 E = -2
Probabilistic action • E : Energy(action) – Energy(current) • IfE > 0, take the action • If E < 0, maybe take it, maybe not (probabilistic action) • P(action) = eE/T (for negative E) • T : Determined by “schedule” • Starts large, decreases towards zero as the algorithm iterates • Large T means ??? • Small T means ??? • What happens when T = 0 ?
For action A1 Let E = -3 and T = 20 P(A1) = e-3/20 = 0.71 Choose a random number r [0..1] If r < 0.71, take action A1 Else do nothing For action A2 E = -9 and T = 20 P(A2) = e-9/20 = 0.35 Choose a random number r [0..1] If r < 0.35, take action A2 Else do nothing Probabilistic action • What does it mean to take an action with probability P(A) = eE/T ?
eE/T Decreasing T eE/T P(action) P(action) 1 0 E “Badness” of action
Intuition of SA • Consider a ball bouncing around a vibrating surface, which you are holding with your eyes closed • Goal: Get the ball to the deepest point of the surface • What’s your strategy? • Use vibration your to jiggle ball over surface • Use lots of vibration at beginning • Get big jumps as if had smoother surface • Use less vibration later • Getter smaller jumps when feel smaller surface bumps • Analogy to annealing • T measures amount of vibration • E measures depth of the ball (goodness of solution)
Simulated Annealing (cont.) • The trick is coming up with a proper schedule • Where to start (what T)? • How quickly to lower T? • Too quickly? • Too slowly? • SA is an example of a Monte Carlo method • A technique that uses probabilistic-based calculations to find an approximate solution to a complex (perhaps deterministic) problem
Effective Branching Factor • Though informed search methods may have poor worst-case performance, they often do quite well if the heuristic is good • Even if there is a huge branching factor • One way to quantify the effectiveness of the heuristic: the effective branching factor, b* • N: total number of nodes expanded • d: solution depth • N = 1 + b* + (b*)2 + … + (b*)d • For a good heuristic, b*is close to 1
Example: 8-puzzle problem Averaged over 100 trials each at different solution lengths Ave. # of nodes expanded Solution length
Local beam search • In simulated annealing, how many nodes do you keep in memory? • One • Local beam search keeps track of k states rather than just one • Start with population of k (initially random?) candidates • Choose k best descendents • Not same as choosing and following k independent solutions • Why? • Analogy to beam of flashlight (vs. single ray of light) • Stochastic forms of beam search • Like genetic algorithms, where successor states are generated by combining two parent states
I said that iterative improvement algorithms are like this: Hill-climbing and SA are really more like this: Local beam search is really more like this:
Review: Problem solving and search • AI is about solving hard problems, often requiring searching through a problem space • Problem definition • The choice of problem space is very important • Abstraction • The choice of data structure is very important • Can make tasks much easier or harder • Keys: Search graph and list of nodes to be expanded • Toy problems (M&C, etc.) are useful to show and compare algorithms • Real problems are not so simple!
Review: Problem solving and search • Choose a search algorithm depending on the problem requirements • There are various ways to evaluate search algorithms • Completeness • Optimality • Goodness of solution • Cost of finding solution (computation and memory requirements) • Uninformed search is exponential • We can do better, on average, if we know something about the problem (hence, informed search using heuristics) • The real difference: Effective branching factor is usually smaller for informed search methods
But alas, things are not so simple… • All this is fairly straightforward for M&C, N-queens, planning travel to Bucharest, etc. • What about dynamic, uncertain environments; stochastic inputs, etc.? • What if we also know that • Missionaries try to convert cannibals, and sometimes succeed • Converted cannibals don’t eat missionaries • Unconverted cannibals and converted cannibals argue a lot • Even unconverted cannibals would rather eat fish than eat missionaries • Missionaries can catch fish with their fancy traps • Missionaries won’t fish on Sundays and have no spare fish on Fridays • Some cannibals (converted or unconverted) steal the missionaries’ traps • How would we solve M&C now? • What if someone else chooses every other action?
Up next… • This leads us to representing and reasoning about more complex information • But first… • A detour into Game Playing • or, How to do search when someone is trying to make life miserable for you!
Game Playing – Chapter 6 • Why game playing? • Interesting, engaging, people like competition • Games seem to require intelligence, so perhaps a good AI testbed • Games are the drosophilae of AI??? • Nice research domain: • Simplified problems, easy to represent states and actions (usually) • Yet maintains significant elements of real-world problems: • Competition • Limited resources (e.g., time) • Solution strategies • Uncertainty • Due to opponent’s unknown moves, limited time to calculate states, and possible random events (e.g., dice) • Unlike the real world, it’s possible to be fully informed • But even that doesn’t make it easy
Game Playing (cont.) • Game playing is different from straightforward search • Uncertainty due to opponent • Much too hard to solve completely (branching factor is large) • Time limit is very significant • With limited time, a small change in efficiency makes a BIG difference • Because it is unlikely or impossible to calculate exact consequences of a move (find the goal state), it is necessary to approximate
Non-deterministic Deterministic Perfect information Imperfect information Game Playing (cont.) • Games can be deterministic or non-deterministic • Contains an element of chance: “probabilistic”, “stochastic” • …and have perfect information or not • “Accessible” – The world state is fully accessible to the program or agent (nothing hidden) Chess, Checkers, Go, Othello, Tic-tac-toe Backgammon, Monopoly Navigating a maze Bridge, Poker, Scrabble
Formulating a game strategy • Formal definition of game: • Initial state • Operators • Terminal test • Utility function • One move consists of two half-moves, or two ply • One per player • Our approach to finding a game strategy • Consider how to approach the exact solution (perfect play) • Techniques for choosing a good move when time is limited • Prune the search to reduce costs • Approximate by evaluating moves/states (heuristics)
The Minimax algorithm • Determines the optimal strategy, and thus the best move, for player MAX (whose opponent is MIN) • Generate the whole game tree down to terminal states • Evaluate each terminal state using the utility function • Back up from the leaf nodes toward the root, one layer at a time, using the MIN and MAX rules, alternatively • MAX chooses the maximum value • MIN chooses the minimum value • When reaching the root node, MAX’s best move is the child with the highest value • The minimax decision maximizes the utility under the assumption that the opponent seeks to minimize it (and uses the same evaluation function)
2-ply Minimax example Which move to choose? The minimax decision is move A1
A B C D 1 7 2 5 2 8 9 4 6 3 3 5 Another example • In the game, it’s your move. Which move will the minimax algorithm choose – A, B, C, or D? What is the minimax value of the root node and nodes A, B, C, and D? MAX 4 1 2 4 3 MIN
Multiplayer games • (A, B, C) – utility values for players A, B, and C Player A could take either move here Ignores strategies such as alliances…
Minimax search • Generate the tree of minimax values • Then choose best (maximum) move • Don’t need to keep all values around • Good memory property • How deep to generate the tree? • Generating the full tree is best, but not often feasible • Generate N moves down, and use an evaluation function • The larger N is, the better the game will perform • Depth-first search is used to implement minimax • Recursive implementation (Fig. 6.3)