Search Algorithms for Agents

Search Algorithms for Agents problems that have been addressed by search algorithms can be divided into three classes: • path-finding problems • constraint satisfaction problems (CSP) • two-player games

Two-player games Two-player games studies are obviously related to DAI/multiagent systems where agents are competitive.

CSP & Path-finding Most algorithms for these classes were originally developed for a single-agent Among them, what kinds of algorithms would be useful for cooperative problem solving by multiple agents?

search algorithm graph representation A search problem can be represented by using a graph. Some of the search problems can be solved by accumulating local computations for each node in the graph.

Asynchronous search algorithms definition Asynchronous search algorithm solves a search problem by accumulating local computations. The execution order of these local computation can be arbitrary or highly flexible, and can be executed asynchronously and concurrently.

CSP – a quick reminder • A CSP consists of n variables x1,…,xn, Whose values are taken from finite, discrete domains D1,…,Dn, respectively, and a set of constraints on their values. • The constraint pk(xk1,…,xkj) is a predicate that is defined on the Cartesian product Dk1 x … x Dkj. This predicate is true iff the value assignment of these variables satisfies this constraint.

CSP Since constraint satisfaction is NP-complete in general, a trial-and-error exploration of alternatives is inevitable. For simplicity, we will focus our attention on binary CSPs, i.e., all the constraints are between two variables.

Example: binary CSP graph The figure shows 3 variables x1,x2,x3 and constraints x1 != x3, x1 = x2 x2 x1 = != x3

Distributed CSP • Assuming that the variables of a CSP are distributed among agents, solving the consist of achieving coherence between the agents. • Problems like multiagent truth maintenance tasks, interpretation problems, and assignment problems can be formalized as distributed CSPs.

CSP and asynchronous algorithms Each process will correspond to a variable. We assume the following communication model: • Processes communicate by sending messages. • The delay in delivering a massage is finite. • Between two processes, messages are received in the order they were sent. Processes that have links to xi is called neighbors of xi.

Filtering Algorithm A process xi perform the following procedure revise(xi,xj) for each neighboring process xj. procedure revise(xi,xj) for all xiinDi do if there is no value vj inDj such that vj is consistent with vi then delete vi from Di; end if; end do; • When a value is deleted, the process sends it’s new domain to his neighboring processes. • When xi receives a new domain from a neighbor xj, the procedure revise(xi,xj) is performed again. The execution order of these processes is arbitrary.

Filtering example: 3-Queens x1 x2 x3 Revise(x1,x2) x1 Revise(x2,x3) Revise(x3,x2) x1 x2 x3 x1 x2 x3 x2 x3

3-Queens example – continue x1 x2 x3 Revise(x1,x3) x1 x1 x2 x3 x1 x2 x3 x2 x3

Filtering Algorithm • If a domain of some variable becomes an empty set, the problem is over-constrained and has no solution. • If each domain has a unique value, then the remaining values are a solution. • If there exist multiple values for some variables, we cannot tell whether the problem has a solution or not, and further search is required. Filtering should be considered a preprocessing procedure that is invoked before the application of other search methods.

K-Consistency A CSP is k-consistent iff given any instantiation of any k-1 variables satisfying all the constraints among them, it is possible to find an instantiation of any kth variable such that these k variable values satisfy all the constraints among them. If the problem is k-consistent and j-consistent for all j<k, the problem is called strongly k-consistent. Next, we’ll see an algorithm that transforms a given problem into an equivalent strongly k-consistent problem.

Hyper-Resolution-Based Consistency Algorithm The hyper-resolution rule is described as follows (Ai is a proposition such as x1=1). In this algorithm, all constraints are represented as a nogood, which is a prohibited combination of variables values. (example next slide).

Graph coloring example • The constraints between x1 and x2 can be represented as two nogoods {x1=red,x2=red} and {x1=blue,x2=blue}. • By using the hyper-resolution rule we can obtain from {x1=red,x2=red} and {x1=blue,x3=blue} a new nogood {x2=red,x3=blue} x2 x1 {red,blue} {red,blue} x3 {red,blue}

Hyper-Resolution-Based Consistency Algorithm • Each process represents its constraints as nogoods. • Each process generates new nogoods by combining the information about its domain and existing nogoods using the hyper-resolution rule. • A newly obtained nogood is communicated to related processes. • If a new nogood is communicated, the process tries to generate further new nogoods using the communicated nogood.

Hyper-Resolution-Based Consistency Algorithm • A nogood is a combination of variables values that is prohibited, therefore, a superset of a nogood cannot be a solution. • If an empty set becomes a nogood, the problem is over- constrained and has no solution. The hyper-resolution rule can generate a very large number of nogoods. If we restrict the application of the rules so that only nogoods whose length are less than k are produced, the problem becomes strongly k-consistent.

Asynchronous Backtracking An asynchronous version of a backtracking algorithm, which is a standard method for solving CSPs. The completeness of the algorithm is guaranteed. • The processes are ordered by the alphabetical order of the variable identifiers. Each process chooses an assignment. • Each process maintains the current value of other processes from its viewpoint (local view). A process changes its assignment if its current value isn’t consistent with the assignments of higher priority processes. • If there exist no value that is consistent with the higher priority processes, the process generates a new nogood, and communicate the nogood to a higher priority process.

Asynchronous Backtracking • The local view may contain obsolete information. Therefore, the receiver of a new nogood must check whether the nogood ia actually violated from its own local view. • The main messages types communicated among processes are ‘ok?’ to communicate the current value, and ‘nogood’ to communicate a new nogood.

Asynchronous Backtracking example X2 {2} X1 {1,2} != != (((ok?, (x2,2 (ok?, (x1,1)) X3 {1,2} Local view: {(x1,1),(x2,2)}

Asynchronous Backtracking example – continue(1) Add neighbor, and get value requests Local view: {(x1,1)} X2 {2} X1 {1,2} New link != != X3 {1,2} (nogood, {(x1,1),(x2,2)})

Asynchronous Backtracking example – continue(2) (nogood,{(x1,1)}) X2 {2} X1 {1,2} != != X3 {1,2}

Asynchronous Backtracking When received (ok?, (xj,dj)) do add (xj,dj) to local_view; check_local_view; end do; When received (nogood, nogood) do record nogood as a new constraint; when (xk,dk) where xk is not a neighbor do request xk to add xi to its neighbors; add xk to neighbors; add (xk,dk) to local_view; end do; check_local_view; end do;

Asynchronous Backtracking Procedure check_local_view when local_view and current_value are not consistent do if no value in Di is consistent with local_view then resolve new nogood using hyper-resolution rule and send the nogood to the lowest priority process in the nogood; when an empty nogood is found do broadcast to other processes that there is no solution, terminate this algorithm; end do; else select d in Di where local_view and d are consistent; current_value f d; send (ok?, (xi,d)) to neighbors; end if; end do;

Asynchronous Weak-Commitment Search This algorithm introduces a method for dynamically ordering processes so that a bad decision can be revised without an exhaustive search. • For each process, the initial priority is 0. • If there exists no consistent value for xi, the priority of xi is changed to k+1, where k is the largest value of related processes. • The order is defined such that any process with a larger priority value has higher priority. If the priority value of processes are the same, the order is determined by the alphabetical order of the variables.

Asynchronous Weak-Commitment Search As in the asynchronous backtracking, each process concurrently assigns a value to its variable, and send the variable value to other processes. • The priority value, as well as the current assignment, is communicated through the ‘ok?’ message. • If the current value is not consistent with the local view the agent changes its value using the min-conflict heuristic, i.e., a value that is consistent with the local view and minimizes the number of constraint violations with variable of lower priority processes.

Asynchronous Weak-Commitment Search • Each process records the nogoods that have been resolved. • When xi cannot find a consistent value with its local view, xi sends nogoods messages to other processes, and increment its priority only if he created a new nogood.

Asynchronous Weak-Commitment Search example X1 (0) X1 (0) X2 (0) X2 (0) X3 (0) X3 (0) X4 (0) X4 (1) (a) (b)

Asynchronous Weak-Commitment Search example - continue X1 (0) X1 (0) X2 (0) X2 (0) X3 (2) X3 (2) X4 (1) X4 (1) (c) (d)

Asynchronous Weak-Commitment Search Completeness The completeness of algorithm is guaranteed by the fact that the processes record all nogoods found so far. Handling a large number of nogoods is time/space consuming. We can restrict the number of recorded nogoods, such that each processes records only the most recently found nogoods. In this case the theoretical completeness is not guaranteed. Yet, when the number of recorded nogoods is reasonably large, an infinite processing loop rarely occurs.

Path Finding Problem A path finding problem consist of the following components: • A set of nodes N, each representing a state. • A set of directed links L, each representing an operator available to a problem solving agent. • A unique node s called the start node. • A set of nodes G, each represents a goal state.

Path Finding Problem More definitions: • h*(i) is the shortest distance from node i to goal nodes • If j is a neighbor of i, the shortest distance via j is given by f*(j) = k(i,j) + h*(j), where k(i,j) is the cost of the link between i and j. • If i is not a goal node, then h*(i) = minjf*(j) holds.

Asynchronous Dynamic Programming Algorithm Let assume the following situation. • For each node i there exist a process corresponding to it. • Each process records h(i), which is the estimated value of h*(i). The initial value of h(i) is except for goal nodes. • For each goal node g, h(g) is 0. • Each process can refer to h value of neighboring nodes. The algorithm: each process updates h(i) by the following procedure. For each neighboring node j, compute f(j) = k(i,j) + h(j), and update h(i) as follows: h(i) f minjf(j).

Asynchronous Dynamic Programming Example 3 1 a 2 c 1 1 4 0 s g 1 1 1 3 3 b d 2 2 3 2

Asynchronous Dynamic Programming • If the costs of all links are positive, it is proved that for each node i, h(i) converges to the true value h*(i). • In reality, the number of nodes can be huge, and we cannot afford to have processes for all nodes.

Learning Real-Time A* Algorithm (LRTA*) As with asynchronous dynamic programming, each agent records the estimated distance h(i) Each agent repeats the following procedure. • Lookahead: calculate f(j) = k(i,j) + h(j). • Update: h(i) f minjf(j). • Action selection: move to the neighbor j that has the minimum f(j) value.

LRTA* • The initial value of h is determined using an admissible heuristic function. • By using an admissible heuristic function on a problem with finite number of nodes, in which all links are positive and there exist a path from every node to a goal node, the completeness is guaranteed. • Since LRTA* never overestimates, it learns the optimal solutions through repeated trials.

Real-Time A* Algorithm (RTA*) • Similar to LRTA*, only that the updating phase is different. - instead of setting h(i) to the smallest value of f(j), the second smallest value is assigned to h(i). - as a result, RTA* learns more efficiently than LRTA*, but can overestimate heuristic costs. In a finite space with positive edge costs, in which there exist a path from every state to a goal, using a non-negative admissible initial heuristic values, RTA* is complete.

Moving Target Search (MTS) • MST algorithm is a generalization of LRTA* to the case where the target can move. • We assume that the problem solver and the target move alternately, and each can traverse at most one edge in a single move. • The task is accomplished when the problem solver and the target occupy the same node. • MTS maintains a matrix of heuristic values, representing the function h(x,y) for all pairs of states x and y. • The matrix initialized to the values returned by the static evaluation function.

MTS To simplify the following discussion, we assume that all edges in the graph have unit cost. When the problem solver moves: • Calculate h(xj,yi) for each neighbor xj of xi. • Update the value of h(xi,yi) as follows: h(xi,yi) f max{ h(xi,yi), minxj{h(xj,yi) +1} } 3. Move to the neighbor xj with the minimum h(xj,yi).

MTS When the target moves: • Calculate h(xi,yj) for the target’s new position yj. • Update the value of h(xi,yi) as follows: h(xi,yi) f max{ h(xi,yi), h(xi,yj) -1 } 3. Assign yj to yi, yj is the new target’s position. MST completeness: In a finite problem space with positive edge costs , in which there exists a path from every state to the goal state, starting with non-negative admissible initial heuristic values, and with the other assumptions we mentioned, the problem solver will eventually reach the target.

Real-Time Bidirectional Search Algorithm (RTBS) • Two problem solvers starting from the initial and goal states move toward each other. • Each of them knows its current location, and can communicate with the other. The following steps are executed until the solvers meet: • Control strategy: select a forward or backward move. • Forward move: the forward solver moves toward the other. • Backward move: the backward solver moves toward the other.

RTBS There are two categories of RTBS: • Centralized RTBS where the best action is selected from among all possible moves of the two solvers. • Decoupled RTBS where the two solvers independently make their own decisions. The evaluation results show that when the heuristic function return accurate values decoupled performs better than centralized. Otherwise, centralized is better.

Is RTBS better than unidirectional search? • The number of moves for centralized RTBS is around 1/2 in 15-puzzles and 1/6 in 24-puzzles that for real-time unidirectional search. • In mazes, the number of moves for RTBS is double that for unidirectional search. The key to understand this results is to view that the difference between RTBS and unidirectional search is their problem spaces.

RTBS • We call a pair of locations (x,y) a p-state. • We call the problem space consisting of p-states a combined problem space. • A heuristic depression is a set of connected states with heuristic values less than or equal to the set of immediate surrounding. • The performance of real-time search is sensitive to the topography of the problem space, especially to heuristic depressions.

RTBS Heuristic depressions of the original problem space have been observed to become large and shallow in the combined problem space. - if the original heuristic depressions are deep, they become large, and that makes the problem harder to solve. - if the original depressions are shallow, they become very shallow, and this makes the problem easier to solve

Search Algorithms for Agents