269202 Algorithms for iSNE

269202 Algorithms for iSNE Dr. Kenneth Cosh Week 12

Review • Searching • Uninformed Searches • Breadth First • Depth First • Iterative • Informed Searches • Best First • Heuristics • Hill Climbing • Simulated Annealing

This Week • Adversarial Searches • Investigating how to plan ahead in environments where other agents are planning against us. • Game Theory

Who Stole The Final Exam? • OK, I know one of you stole the final exam… • You have 3 choices, “Admit it”, “Accuse someone else” or “Deny knowledge”. • If you “Deny Knowledge” you will get a D, unless everyone denies knowledge, when you’ll get a B • If you “Accuse someone” you will get a C, or if everyone else accuses the same person you will get a B, unless they admit it when you get a D. • If you “Admit it” you get an F, unless everyone else accuses you, when you get an A.

Game Theory • That example is similar to the famous “Prisoner’s Dilemma” • “Two suspects, A and B, are arrested by the police. The police have insufficient evidence for a conviction, and, having separated both prisoners, visit each of them to offer the same deal: if one testifies for the prosecution against the other and the other remains silent, the betrayer goes free and the silent accomplice receives the full 10-year sentence. If both stay silent, the police can sentence both prisoners to only six months in jail for a minor charge. If each betrays the other, each will receive a two-year sentence. Each prisoner must make the choice of whether to betray the other or to remain silent. However, neither prisoner knows for sure what choice the other prisoner will make. So the question this dilemma poses is: What will happen? How will the prisoners act?”

Game Theory • A branch of economics investigates any multiagent environment where one agents actions impact on another agent, whether co-operatively or competitively. • Games are deterministic, turn-taking, two-player, zero-sum games with perfect information. • In other words a deterministic and fully observable environment with two agents who alternate actions. Utility values at the end of the game are equal and opposite - the winner of chess receives ‘1’, the loser ‘-1’ (or possibly ‘0’ in stalemate).

Games • Games fit well with A.I. as they are easy to represent and agents are restricted to a small number of actions, the results of the actions are well defined. • This is in contrast to physical games such as football (or American Rugby) - here there are imprecise rules and a wide variety of actions. Computer game manufacturers attempts at producing such games often result in limited and repetitive versions.

Game Success • AI has had some success at game playing; • Machines are better than humans in Othello and checkers. • Sometimes they can win at chess and backgammon. • Why not more success? • At Go AI only performs amateurly. • The average game of chess has a branching factor of 35, with roughly 50 moves per player - which is 35100 or 10154 nodes on a search graph! • So we have to make a decision, even if it isn’t optimal.

Formal Definition • Initial State - the initial board position indicating who’s move. • Successor Function - returns a list of moves and their resulting states. • Terminal Test - determines if it is a terminal state, i.e. if the game is over. • Utility Function - An objective function giving a numeric value for states - terminal states could be -1, 0 or 1.

Searching • From that definition we can create a game tree, which can be searched for optimal moves. • Normal searches produce a sequence of actions leading to a terminal win state, but in games the other agent has an impact. • Therefore the search needs to find a contingent strategy . • The optimal strategy is one which finds an outcome at least as good as any other against an infallible opponent.

X X X X X X X X X X X X X X X X X X X O O O O O O O O O O O X X X X Game Trees X X Initial Moves (X): X X X X X X O Move 2 (O): …. O O O O O O Move 3 (X): …. X X X X X X …. …. …. The search tree even for a game as simple as tic tac toe, is v. complex Terminal States: O X O O X X O X X O X O O Utility: -1 0 1

Minimax Algorithm • Assuming the opponent will play optimally, the minimax algorithm is designed to maximise the worst case outcome - in tictactoe this is a draw. • In this simple example - which action should A take (a1, a2 or a3)? A a1 a3 a2 B C D b3 c3 b1 c1 d1 d3 b2 c2 d2 2 3 2 12 6 10 11 7 4

Minimax Algorithm A • Even though the best utility is found with ‘a2’ (12), and the best average is found with ‘a3’ (7.67), ‘a1’ is the choice as we assume that the opponent will choose their optimum strategy. a1 a3 a2 3 2 2 B C D b3 c3 b1 c1 d1 d3 b2 c2 d2 2 3 2 12 6 10 11 7 4

Multiplayer Games • In games with more than 2 players, the utility is scored as a vector containing utilities for each player (5,2,3). Minimax functions will choose an action which still leads to their best worst case scenario. • In multiplayer games, often ‘alliances’ are formed, with more than one player joining forces in order to stop another player from reaching a terminal win state.

Minimax Efficiency • If the maximum depth of the tree is ‘m’ and there are ‘b’ legal moves at each point, then time complexity is O(bm) and space complexity is O(bm), i.e. Exponential time efficiency. • We can improve the efficiency, for instance through Alpha-Beta Pruning; • If a player has a better choice ‘m’ either at the parent node of ‘n’ or any node higher up, then ‘n’ will never be reached, so effectively we can prune it.

Pruning Example A a1 • First we examine choice ‘a1’ with depth first search, finding the worst case ‘3’. a3 a2 3 2 2 B C D b3 c3 b1 c1 d1 d3 b2 c2 d2 3 2 12 6 10 11 2 7 4 A a1 a3 a2 3 B b3 b1 b2 7 4 3

Pruning Example 2 A • Examining choice ‘a2’, we immediately find a worse worst case ‘2’, so no need to continue searching a1 a3 a2 3 2 B C b3 c3 b1 c1 b2 c2 A 3 2 7 4 a1 a3 a2 3 2 2 B C D When we look at choice ‘a3’, we have to search all 3 leaves before we find a worse worst case. b3 c3 b1 c1 d1 d3 b2 c2 d2 3 2 10 11 2 7 4

Pruning Efficiency • Pruning choices in this way has dramatic effects on efficiency. • We only need to examine O(bd/2) nodes • This is assuming that we can examine successors that are likely to be best first. - You know what ASS U ME means. • Even with random selection, the efficiency improves, however, for many games this time inefficiency makes playing the game impractical - a minimax decision for chess would take longer than the allowed 2 hour game.

Imperfect, Real-Time Decisions • Our algorithm needs to be able to make a decision within a specified time period, based on imperfect information. • Humans use (sub-consciously) a heuristic utility function to judge the value of a position as humans are even more limited in searching capabilities - so a similar heuristic is needed here • I.e. Cut the search off at a non-terminal state.

Designing a Heuristic • A heuristic needs to comprise various features, weighted by their importance - perhaps using probabilities of success based on previous experiences (in this case 72% of previous instances have lead to success). • Human experience can be used to guide the heuristic, for example in chess; • Material Value - each piece can be given a value, pawn = 1, queen =8 • Pawn Structure - a good pawn layout might be worth more. • King Safety - a well protected King might be worth a few points. • 2 bishops together are worth a little over 2*1bishop. • Bishops are more valuable during the end game than at the start.

Search Cut-Off • Given the sophisticated heuristic function, which can be applied to various states, a cut off point must be chosen so the amount of time used is acceptable. The wrong cut off point could give rise to what is known as the ‘horizon effect’ - where a poor worst case is just over the horizon of the search.

Alternative Horizons • An alternative vision of the horizon effect comes from this situation. • A computer playing as black almost inexplicably chose to move e7-e8 - why?

Games with Chance • Some games, such as Backgammon combine chance with skill, where chance is controlled by the roll of a dice. • We can’t construct a game tree as we don’t know what our opponents possible moves are - but we do know what their likely moves are. • So the game tree needs to be expanded to include probabilities of the opponent receiving certain dice rolls.

Probability • Probability is based on our percepts of the environment - what we know about the environment. • My doctor said golf caused my shoulder injury as soon as he knew I played golf - even though the shoulder injury dates from before I started playing golf. • When we pick a card there is a 1/52 chance it is the Ace of Spades, after we look at it, the chance is either 0 or 1. • Probabilities can change when more evidence is acquired.

Utility Theory • Utility Theory represents preferences for certain states - the quality of a state being useful; • Is it preferable to choose plan C1440 (leaving for the airport 24 hours early) which has a 99.999% probability of success over plan A60, which has a 95% probability of success? Given the poor utility of the long wait, perhaps not. • Should I stop playing golf to improve my ‘lack of shoulder pain’ utility, and risk lowering my ‘playing golf pleasure’ utility?

Decision Theory • Decision Theory = Probability Theory + Utility Theory. • An agent is rational if they choose the action which yields the highest utility averaged across all possible outcomes of the action. • The principle of Maximum Expected Utility (MEU).

Acting Under Uncertainty

Acting Under Uncertainty • Suppose we need to check in at the airport and need to choose a time to set off. Plan A60 involves leaving 60 minutes before check-in - we could assume this was a good plan. • We can’t say “Plan A60 will get us to the airport in time”, only “Plan A60 will get us to the airport in time so long as we have enough gas, and there are no accidents, and the car doesn’t break down, and check-in doesn’t close early, and…” • Perhaps Plan B90 would be better?

Acting Under Uncertainty • While Plan B90 increases the ‘degree of belief’ that we will get to the airport on time, it also introduces a likely long unproductive wait at the airport. • To maximise an agents performance, the relative importance of both goals needs to be considered; • ‘getting to the airport’ • ‘avoiding a long wait’ • The rational decision is A60, but how do we draw that conclusion?

Diagnosis • Diagnosis is a task aimed at dealing with uncertainty normally using probability theory to construct a degree of belief in various statements. • Your car won’t start, so there’s a 80% chance the battery is dead. • You’ve got pain in your left calf so there’s a 70% chance you’ve pulled a muscle, and a 2% chance your leg has fallen off. • Probability provides a way of summarising the uncertainty that comes from our laziness and ignorance.

Domain Random Variables • Boolean Random Variables • Late<True, False> • Discrete RandomVariables • Weather<Sunny, Rainy, Snowy, Cloudy> • Continuous Random Variables • Temperature = 30.2

Atomic Events • Atomic Events are a particular combination of states; • Late = True, Weather = Cloudy • Late = False, Weather = Sunny • The existence of certain atomic events can lead to certain understandings; • Late = False, Weather = Rainy, means <Rainy => Late> = False

Prior Probability • Unconditional Probabilities can be assigned to each state - degree of belief with no other information; • P(Weather = Sunny) = 0.8 • P(Weather = Rainy) = 0.1 • P(Weather = Cloudy) = 0.0999 • P(Weather = Snowy) = 0.0001

Joint Probability • The probabilities for a combination of random variables can be stored in a grid - 2 or more dimensional. • Take a simple example with 3 boolean variables, Late, Male, AM. Late Not Late Male Not Male Male Not Male AM 0.22 0.25 0.08 0.02 not AM 0.1 0.18 0.1 0.05

Joint Probability • The sum of all probabilities adds up to 1. • Suppose we want to know the probability of being male OR late; • P(Male OR Late) = 0.22+0.25+0.1+0.18+0.08+0.1 = 0.93 • Or the probability of being male AND late; • P(Male AND Late) = 0.22+0.1 = 0.32 • Thus we can deduce probabilities given certain inputs.

Joint Probability • Suppose we add a further random variable - the weather. • We have to expand our table, in reality adding 4 times the size of the weather, one for each weather condition. • In doing this it is reasonable to ask how the former and latter table are related; how does P(Late, AM, Male, Weather = sunny) relate to P(Late, AM, Male)?

Product Rule • We can use the ‘Product Rule’ • The probability of a and b is the same as the probability of b multiplied by the probability of a given that b is the case; • P(a^b) = P(a|b)P(b) • Or in our case; • P(Late, AM, Male, Weather = sunny) = P(Weather = sunny | Late, AM, Male)P(Late, AM, Male) • The probability of a man being late on a sunny morning is the same as the probability of it being sunny given that a man is late in the morning, multiplied by the probability of a man being late in the morning.

Hang on a minute! • Let’s review that; • “The probability of a man being late on a sunny morning is the same as the probability of it being sunny given that a man is late in the morning, multiplied by the probability of a man being late in the morning.” • Unless we are a weathermonger, the probability of it being sunny isn’t influenced by a man’s morning tardiness!

Product Rule • The Product Rule can be stated in 2 ways; • P(a^b) = P(a|b)P(b) • P(a^b) = P(b|a)P(a) • Or in our case; • P(Late, AM, Male, Weather = sunny) = P(Late, AM, Male | Weather = sunny)P(Weather = sunny) • The probability of a man being late on a sunny morning is the same as the probability of a man is late in the morning given that it is sunny, multiplied by the probability of it being sunny.

Joint Probabilities • There is intuitively something more satisfactory about this statement - perhaps the weather could be a factor in determining lateness. • We could expand our knowledge about the domain by adding further random variables - perhaps adding Transport<Car, Bike, Walk> or FavouriteFood<Icecream, Steak, Fish>. • Transport ‘might’ have a direct influence over tardiness - it could be argued that ‘walkers’ should set off earlier, but surely FavouriteFood can be considered ‘Independent’. • When Independence can be found, the subset of data to be analysed can be greatly reduced.

Bayes’ Law • Requires 2 unconditional probabilities and 1 conditional probability, to calculate 1 further conditional probability! • But, if we have those probabilities, then it can be very useful. • Meningitis causes patients to have stiff necks 50% of the time. The probability of having meningitis is 1 in 50,000 and the probability that any patient has a stiff neck is 1 in 20. • P(s|m) = 0.5 • P(m) = 1/50000 • P(s) = 1/20 • P(m|s) = P(s|m)P(m)/P(s) = (0.5*(1/50000))/(1/20) = 0.0002 or 1 in 5,000

Welcome to Wumpus World • Wumpus World is a cave consisting of rooms connected by passageways. Somewhere in the cave is the evil Wumpus beast who eats anyone who enters the room. An agent can kill the Wumpus though by shooting an arrow - the arrow goes straight until it hits the Wumpus or a wall. Wumpus World also contains deathly pits in the floor which trap anyone who enters the room (except the Wumpus). On the positive side there is a big heap of gold that can be won by a successful agent. In rooms surrounding the Wumpus the agent can perceive a stench, in rooms surrounding a pit the agent perceives a breeze, and in rooms surrounding the gold the agent perceives a glitter. Agents can move through the 4*4 grid by turning left or right (90 degrees) or walking forward.

Wumpus World PIT Glitter Breeze Stench WUMPUS PIT Breeze Breeze Stench Glitter GOLD Glitter Stench Breeze PIT Breeze Breeze

Wumpus World OK OK

Wumpus World OK ?P Breeze ?P

Wumpus World W! OK Stench Breeze P!

Wumpus World OK ?G W! OK ?G Glitter Stench Breeze P!

Assignment – Part 1 • To what extent does Bayes’ Law affect the probability of locating a pit? • Consider the layout to the right, what can you deduce about the probability of where the pit is? Breeze Breeze

Assignment – Part II • Assuming obvious robotic and vision constraints have been solved; how could the approaches to game playing that we’ve discussed today be applied to robot pool. Could a robot win a game of pool?

269202 Algorithms for iSNE