Chapter 5 Real Time Heuristic Search

Chapter 5 Real Time Heuristic Search

Introduction • One of the major problems of the algorithms schemas we have seen so far is that they take exponential time to find the optimal solution. • Therefore, these algorithms are usually used for solving relatively small problems. • However, most often we do not need the optimal solution, and a near optimal solution can satisfy most “real” problems.

A serious drawback of the above algorithms is that they must search all the way to a complete solution before making a commitment to even the first move of the solution. The reason is that an optimal first move can not be guaranteed until the entire solution is found and shown to be at least as good as any other solution. Introduction

Two Player Games

Two Player Games • Heuristic search in two player games adopts an entirely different set of assumptions . • Example- chess game : • the actions are made before the consequences are known • there is a limited amount of time • a move that has been made can’t be revoked .

Real-Time Single-Agent Search • Our goal is to apply the assumptions of two-player games to single agent heuristic search . • So far we needed both to check all of the available moves and in each case of back tracking- each move that was tried was a waste of time and we didn’t gain any information.

Minimin Lookahead Search • In similar to the minmax search which we used in the two player games , we will use an algorithm minimin for the single problem solving agent . • This algorithm will always look for the minimal route to the goal by choosing each time the next minimal node , and that is because there is only one player that makes all of the decisions.

Minimin Lookahead Search • The search proceeds so that we do a minimin lookaheadfrom a planning mode and at the end of the search we execute the best move that was found . From that point we repeat the lookahead search procedure . • There are a few heuristics that can be used in this algorithm : • A* heuristic function : f(n) = g(n)+h(n) • Fixed depth heuristic function : a g(n) fixed cost. • Fixed f(n) cost : search at the frontier for the minimal node .

Minimin Lookahead Search • If a goal state is encountered before the search horizon, then the pat is terminated and a heuristic value of zero is assigned to the goal. • If a path ends I a non-goal dead-end before the horizon is reached, than a heuristic value of infinity is assigned to the dead-end node, guaranteeing that the path will not be chosen.

Branch-and-Bound Pruning • An obvious question is whether every frontier node must be examined to find one of minimum cost. • If we allow heuristic evaluations of interior nodes, then pruning is possible. By using an admissible f function, we can apply the branch-and-bound method, in order to reduce the number of nodes checked .

Efficiency of Branch and Bound 99 24 15 8 8 15 24 1,000,000 100,000 99 10,000 nodes 1000 100 10 10 20 30 40 50 Search depth

Efficiency of Branch and Bound • From the graph above we can see the advantages of branch-and-bound pruning verses the brute-force minmax search . • For example : • on a scale of a million nodes per move , in the 8 puzzle - brute-force search searches 25 moves, where else the branch-and-bound searches 35 moves. About 40 % ! • We can also see that we get better results for the 15 puzzle than for the 8 puzzle .

An Analytic Model • Is the former surprising result special for the sliding-tile-puzzles or is it more general ? • A model has been defined so that each edge was assigned a value of 0 or 1 with probability p , and a uniform branching factor and depth . • This model represents the tile puzzle. In the tile puzzle- each movement of a tile either increases or decreases by one the h value.

An Analytic Model • Since for each move the g function increases by one- the f function either increases by 2 or doesn’t increase at all. • It has been proved that if the probability of finding a zero cost node below a certain node is less than one- finding the lowest cost route is exponential, while if the probability is more than one -then the time is polynomial.

An Analytic Model • For example: • if the probability is 0.5-for a binary tree the expected number of zero cost edges is 2*0.5=1, whereas for a ternary tree the expected number of zero cost edges is 3*0.5=1.5 ! We see that a ternary tree can be searched more efficiently than a binary tree !

An Analytic Model • But this model is not so accurate for a number of reasons: • We can predict the results only till a certain point. • The model applies only to a limited depth, because from there the model assumes the probability for a zero cost node is the same for all edges whereas in the sliding tile puzzle the probability from some depth is not the same for each node. (The probability for a positive node increases).

Real -Time-A* (RTA*) • So far , we only found a solution for one move at a time , but not for several moves. • The initial idea would be to repeat the action done for one move several times . But that leads to several problems .

Real -Time-A* (RTA*) Problems: • We might want to make a move to a node that has already been visited and we will be in an infinite loop . • If we don’t allow visiting in previous visited nodes , then we may encounter a state where we visited all the rest of the nodes. • Due to the limited information known in each state we want to allow back tracking in cases where we won’t repeat the same moves from that state.

Real -Time-A* (RTA*) Solution: • We should allow backtracking only if the cost of returning to that point plus the estimated cost from there is less than the estimated cost from the current point. • Real-Time-A* (RTA*) is an efficient algorithm for implementing this solution.

RTA* Algorithm • In RTA* the value of f(n) for node n is like in A*. • The difference is g(n) in RTA* is computed differently than in A*- g(n) of RTA* is the distance of node n from the current state, and not from the initial state. • The implementation will be stored in an open list, and for each move we update the g value of each node in the open list relatively to the new current state . f(n) = g(n)+h(n)

RTA* - The Drawbacks • The time to make a move is linear in the size of the open list. • It is not clear exactly how to update the g values. • It is not clear how to find the path to the next destination node that was chosen from the open list . But these problems can be solved in constant time per move !

e i 4 5 1 b a c d i j 9 9 3 2 8 7 k m RTA* - Example

RTA* - Example • In the example, we start at node a and we update the nodes so that now f(b)=1+1=2, f(c)=1+2=3 , f(d)=1+3=4 . • The problem solver goes to b because it’s the minimal and update h(a)=f(c)=3. Then we generate nodes e and i and updates that f(e)=1+4=5 , f(i)=1+5=6 ,f(a)=1+3=4 • The problem solver goes to a and updates f(b)=1+5=6 , and so on.

RTA* - Example • As we can notice we won’t get into an infinite loop even though we do allow back tracking, since each time we gather more information and according to that we decide what will be the next move . • Note: RTA* does not require good admissible functions and will find the solution in any case (though a good heuristic function will give better results). • RTA* running time is linear in the number of moves made, and so is the size of the hash table stored.

Completeness of RTA* • RTA* is complete if it stands under the following restrictions: • The problem space must be finite. • A goal must be reachable from every state. • There can’t be cycles in the graph with zero or negative cost. • The heuristic values returned must be finite.

Correctness of RTA* • RTA* makes decisions based on limited information, and therefore the quality of the decision it makes is the best relative to the part of the search space it has seen so far. • The nodes that need to be expanded by RTA* are similar to the open list in A*. • The main difference is in the definitions of g and h. • The completeness of RTA* can be proved by induction on the number of moves made.

Solution Quality vs. Computation • We should also consider the quality of the solution that is returned by RTA*. • This depends on the accuracy of the heuristic function and the search depth. • A choice should be made between all of the families of heuristic functions, while some of them are more accurate but more expensive to compute, while the other are less accurate but simpler to compute.

Learning-RTA* (LRTA*) • Until now RTA* solved the problem for single trailed problems. • We would like now to improve the algorithm so that it will now be good also for multiple problem solving trails

Learning-RTA* (LRTA*) • The algorithm for that is the same, except for one change that will make the algorithm suitable for the new problem: The algorithm will store the best value of the heuristic function, instead of the second best value, each time.

Convergence of LRTA* • An important advantage of LRTA* is that because of the repetition of the problem solving trails the heuristic values become the exact values! • This advantage is under the following circumstances : • The initial and goal states are chosen randomly. • The initial heuristic values are admissble or do not overestimate the distance to the nearest goal. • Ties are broken randomly, otherwise if we find one optimal solution- we might continue finding the same one each time, and not find the other trails to the goal.

Convergence of LRTA* • Theorem 5.2: In a finite space with finite positive edge costs , and non-overestimating initial heuristic values , in which a goal state is reachable from every state , over repeated trials of LRTA* , the heuristic values will eventualy converge to their exact values along every optimal path.

Conclusion • In real-time large scale application, we can’t use the single agent heuristic search algorithms, because the high cost and the fact that the algorithm does not return a solution before searching the expanded tree. • Minimin solves the problems for such cases. • Branch and bound pruning improves very much the results given by minimin. • RTA* solves the problem, of abandoning a trail to a better looking one, efficiently.

Conclusion • RTA* guarantees finding a solution. • RTA* makes optimal local decisions. • The more usage of lookahead- the higher the cost is but the better quality of solution. • The family of heuristic varies according to the accuracy of the solution and the computational complexity. • The optimal level of lookahead depends on the relative costs of simulating vs. executing moves. • LRTA* is an algorithm that solves the over repeated problem solving trail while preserving the completeness of the solution.

Chapter 6 Origin of Heuristic Functions

Heuristic from Relaxed Models • A heuristic function returns the exact cost of reaching a goal in a simplified or relaxed version of the original problem. • This means that we remove some of the constraints of the problem we are dealing with.

Heuristic from Relaxed Models - Example • Consider the problem of navigating in a network of roads from initial location to a goal location, • A good heuristic would be to estimate the cost between two points in a straight line. • We remove the constraint of the original problem that we have to move along the roads and assume that we are allowed to move in a straight line between two points. Thus we get a relaxation of the original problem.

Relaxation example - TSP problem • We can describe the problem as a graph with 3 constraints: • Our tour covers all the cities. • Every node has a degree two • an edge entering the node and • an edge leaving the node. • The graph is connected. • If we remove constraint 2 : We get a spanning graph and the optimal solution to this problem is a MST (Minimum Spanning tree). • If we remove constraint 3: Now the graph isn’t connected and the optimal solution to this problem is the solution to the assignment problem.

Relaxation example - Tile Puzzle problem • One of the constraints in this problem is that a tile can only slide into the position occupied by a blank. • If we remove this constraint we allow any tile to be moved horizontally or vertically position. And we actually get its Manhattan distance to its goal location.

The STRIPS Problem formulation • We would like to derive such heuristics automatically. • In order to do that we need a formal description language that is richer than the problem space graph. • One such language is calledSTRIPS. • In this language we have predicates and operators. • Let’s see a STRIPS representation of the Eight Puzzle Problem

STRIPS - Eight Puzzle Example • On(x,y) = tile x is in location y. • Clear(z) = location z is clear. • Adj(y,z) = location y is adjacent to location z. • Move(x,y,z) = move tile x from location y to location z. In the language we have: • A precondition list - for example to execute move(x,y,z) we must have: On(x,y) Clear(z) Adj(y,z) • An add list - predicates that weren’t true before the operator and now after the operator was executed are true. • A delete list - a subset of the preconditions, that now after the operator was executed aren’t true anymore.

STRIPS - Eight Puzzle Example • Now in order to construct a simplified or relaxed problem we only have to remove some of the preconditions. • For example - by removing Clear(z) we allow tiles to move to adjacent locations. • In general, the hard part is to identify which relaxed problems have the property that their exact solution can be efficiently computed.

Admissibility and Consistency • The heuristics that are derived by this method are both admissible and consistent. • Note : The cost of the simplified graph should be as close as possible to the original graph. • Admissibility means that the simplified graph has an equal or lower cost than the lowest - cost path in the original graph. • Consistency means that a heuristic h is consistent for every neighbor n’ of n, when h(n) is the actual optimal cost of reaching a goal in the graph of the relaxed problem. h(n)  c(n,n’)+h(n)

Heuristic from Multiple Subgoals • We begin by presenting an alternative derivation of the Manhattan distance heuristic for the sliding tile puzzles. • Any description of this problem is likely to describe the goal state as a set of subgoals, where each subgoal is to correctly position of individual tile, ignoring the interaction with the other tiles.

Enhancing the Manhattan distance • In the Manhattan distance for each tile we looked for the optimal solution ignoring other tiles and only counting moves of the tile in question. • Therefore the heuristic function we get isn’t accurate. 1 2 3 4 5 6 7 8 1 9 10 11 12 13 14 15 16 17 18 1 19 20 21 22 23 24

Enhancing the Manhattan distance • We can perform a single search for each tile, starting from its goal position, and record how many moves of the tile are required to move to it to every other position. • Doing this for all tiles results in a table which gives, for every possible position of each tile, its Manhattan distance from its goal position. • Then, since each move moves one tile, for a given state we add the Manhattan distances of each tile to get an admissible heuristic for the state.

Enhancing the Manhattan distance • However this heuristic function isn’t accurate, since it ignores the interactions between the tiles. • The obvious next step is to repeat the process on all possible pairs of tiles. • In other words, for each pair of tile, and each combination of positions they would occupy, perform a search to their goal positions, and count only moves of the two tiles of interest. We call this value the pairwisedistance of the two tiles from their goal locations.

Enhancing the Manhattan distance • Of course the goal is to find the shortest path from the goal state to all possible positions of the two tiles, where only moves of the two tiles of interest are counted. • For almost all pairs of tiles and positions, their pairwise distances will equal the sum of their Manhattan distances from their goal positions. • However, there are three types of cases where the pairwise distance exceeds the combined Manhattan distance.

Cost us relatively to Manhattan :+2 2 1 1 2 2 2 2 1 2 1 1 1 1 Enhancing the Manhattan distance - the first case • Two tiles are in the same row or column but are reversed relative to their goal positions. In order to get to the goal states of the tiles, one tile must move down or up in order to unable the other one to get to its goal location, and than return to the row and go back to its place.

Cost us relatively to Manhattan : +2 3 3 4 4 3 3 1 3 4 2 1 4 Enhancing the Manhattan distance - the second case • The corners of the puzzle. • If the 3 tile is in its goal position, but some tile other than the 4 is in the 4 position, the 3 tile will have to move temporarily to correctly position the 4 tile. This requires two moves of the 3 tile, one to move it out of position, and another to move it back. Thus the sum of their Manhattan distances will exceed by two moves to their Manhattan distances.

Cost us relatively to Manhattan : +2 5 1 1 1 1 5 1 1 5 Enhancing the Manhattan distance - the third case • In the last moves of the solution. A detailed explanation is in the next slide

Chapter 5 Real Time Heuristic Search

Chapter 5 Real Time Heuristic Search

Presentation Transcript

Heuristic Search

Heuristic Search

Heuristic Search

Heuristic Search

Heuristic Search

Heuristic Search

Heuristic Search

Heuristic Search

HEURISTIC SEARCH

Heuristic Search

Heuristic Search

Heuristic Search

Heuristic Search

Heuristic Search

Heuristic Search

Heuristic search

Heuristic Search