Ram Meshulam 2004 Search Algorithms Overview

Artificial Intelligence Rehearsal Lesson Ram Meshulam 2004

Solving Problems with Search Algorithms • Input: a problem P. • Preprocessing: • Define states and a state space • Define Operators • Define a start state and goal set of states. • Processing: • Activate a Search algorithm to find a path form start to one of the goal states. Ram Meshulam 2004

Uninformed Search • Uninformed search methods use only information available in the problem definition. • Breadth First Search (BFS) • Depth First Search (DFS) • Iterative DFS (IDA) • Bi-directional search • Uniform Cost Search (a.k.a. Dijkstra alg.) Ram Meshulam 2004

Breadth-First-Search Attributes • Completeness – yes • Optimality – yes, if graph is un-weighted. • Time Complexity: • Memory Complexity: • Where b is branching factor and d is the solution depth 4 Ram Meshulam 2004 Ram Meshulam 2004

1 Optimal s. 5 2 3 solution 4 Depth-First-Search Attributes • Completeness – No. Infinite loops or Infinite depth can occur. • Optimality – No. • Time Complexity: • Memory Complexity: • Where b is branching factor and m is the maximum depth of search tree Ram Meshulam 2004

Limited DFS Attributes • Completeness – Yes, if d≤l • Optimality – No. • Time Complexity: • If d<l, it is larger than in BFS • Memory Complexity: • Where b is branching factor and l is the depth limit. Ram Meshulam 2004

0 2,6,16 1,3,9 8,20 7,17 c 4,10 5,13 c 15 c 11 12 14 18 19 21 22 The numbers represent the order generated by DFID Depth-First Iterative-Deepening Ram Meshulam 2004

Iterative-Deepening Attributes • Completeness – Yes • Optimality – yes, if graph is un-weighted. • Time Complexity: • Memory Complexity: • Where b is branching factor and d is the maximum depth of search tree Ram Meshulam 2004

State Redundancies • Closed list - a hash table which holds the visited nodes. • For example BFS: Closed List Open List (Frontier) Ram Meshulam 2004

Uniform Cost Search Attributes • Completeness: yes, for positive weights • Optimality: yes • Time & Memory complexity: • Where b is branching factor, c is the optimal solution cost and e is the minimum edge cost Ram Meshulam 2004

Best First Search Algorithms • Principle: Expand node n with the best evaluation function value f(n). • Implement via a priority queue • Algorithms differ with definition of f : • Greedy Search: • A*: • IDA*: iterative deepening version of A* • Etc’ Ram Meshulam 2004

Best-FS Algorithm Pseudo code • Start with open = [initial-state]. • While open is not empty do • Pick the best node on open. • If it is the goal node then return with success. Otherwise find its successors. • Assign the successor nodes a score using the evaluation function and add the scored nodes to open Ram Meshulam 2004

General Framework using Closed-list (Graph-Search) • GraphSearch(Graph graph, Node start, Vector goals) • Omake_data_structure(start) // open list • Cmake_hash_table // closed list • While O not empty loop • n O.remove_front() • If goal (n) return n • If n is found on C  continue • //otherwise • O  successors (n) • Cn • Return null //no goal found Ram Meshulam 2004

s 1 3 b a 2 1 g Greedy Search Attributes • Completeness: No. Inaccurate heuristics can cause loops (unless using a closed list), or entering an infinite path • Optimality: No. Inaccurate heuristics can lead to a non optimal solution. • Time & Memory complexity: h=1 h=2 Ram Meshulam 2004

A* Algorithm (1) • Combines greedy h(n) and uniform cost g(n) approaches. • Evaluation function: f(n)=g(n)+h(n) • Completeness: • In a finite graph: Yes • In an infinite graph: if all edge costs are finite and have a minimum positive value, and all heuristic values are finite and non-negative. • Optimality: • In tree-search: if h(n) is admissible • In graph-search: if it is also consistent Ram Meshulam 2004

Heuristic Function h(n) • Admissible/Underestimate:h(n) never overestimate the actual cost from n to goal • Consistent/monotonic (desirable): h(m)-h(n) ≤w(n,m) where m is parent of n. This ensures f(n) ≥f(m). Ram Meshulam 2004

A* Algorithm (2) • optimally efficient: A* expands the minimal number of nodes possible with any given (consistent) heuristic. • Time and space complexity: • Worst case: Cost function f(n) = g(n) • Best case: Cost function f(n) = g(n) + h*(n) Ram Meshulam 2004

Duplicate Pruning • Do not enter the father of the current state • With or without using closed-list • Using a closed-list, check the closed list before entering new nodes to the open list • Note: in A*, h has to be consistent! • Do not remove the original check • Using a stack, check the current branch and stack status before entering new nodes Ram Meshulam 2004

IDA* Algorithm • Each iteration is a depth-first search that keeps track of the cost evaluation f = g + h of each node generated. • The cost threshold is initialized to the heuristic of the initial state. • If a node is generated whose cost exceeds the threshold for that iteration, its path is cut off. Ram Meshulam 2004

IDA* Attributes • The cost threshold increases in each iteration to the total cost of the lowest-cost node that was pruned during the previous iteration. • The algorithm terminates when a goal state is reached whose total cost does not exceed the current threshold. • Completeness and Optimality: Like A* • Space complexity: • Time complexity*: Ram Meshulam 2004

Local Search – Cont. • In order to avoid local maximum and plateaus we permit moves to states with lower values in probability p. • The different algorithms differ in p. Ram Meshulam 2004

Hill Climbing • Always choose the next best successor • Stop when no improvement possible • In order to avoid plateaus and local maximum: • Sideways move • Stochastic hill climbing • Random-restart algorithm Ram Meshulam 2004

Simulated Annealing – Pseudo code Cont. • Acceptor func. example: • Schedule func. example: Ram Meshulam 2004

Search Algorithms Hierarchy Ram Meshulam 2004

Exercise • What are the different data structures used to implement the open list in BFS,DFS,Best- FS: Ram Meshulam 2004

Minimax • Perfect play for deterministic games • Idea: choose move to position with highest minimax value = best achievable payoff against best play • E.g., 2-ply game: Ram Meshulam 2004

Properties of minimax • Complete? (=will not run forever) Yes (if tree is finite) • Optimal? (=will find the optimal response) Yes (against an optimal opponent) • Time complexity? O(bm) • Space complexity? O(bm) (depth-first exploration), O(bm) for saving the optimal response • For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible Ram Meshulam 2004

α-β pruning example Ram Meshulam 2004

Planning • Traditional search methods does not fit to a large, real world problem • We want to use general knowledge • We need general heuristic • Problem decomposition Ram Meshulam 2004

STRIPS – Representation • States and goal – sentences in FOL. • Operators – are combined of 3 parts: • Operator name • Preconditions – a sentence describing the conditions that must occur so that the operator can be executed. • Effect – a sentence describing how the world has change as a result of executing the operator. Has 2 parts: • Add-list • Delete-list • Optionally, a set of (simple) variable constraints Ram Meshulam 2004

Choosing an attribute • Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" • Patrons? is a better choice Ram Meshulam 2004

Using information theory • To implement Choose-Attribute in the DTL algorithm • Information Content of an answer (Entropy): I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi) • For a training set containing p positive examples and n negative examples: Ram Meshulam 2004

Information gain • A chosen attribute A divides the training set E into subsets E1, … , Ev according to their values for A, where A has v distinct values. • Information Gain (IG) or reduction in entropy from the attribute test: • Choose the attribute with the largest IG Ram Meshulam 2004

Information gain For the training set, p = n = 6, I(6/12, 6/12) = 1 bit Consider the attributes Patrons and Type (and others too): Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root Ram Meshulam 2004

Bayes’ Rule P(B|A) = P(A|B)*P(B) P(A)

Computing the denominator: #1 approach - compute relative likelihoods: • If M (meningitis) and W(whiplash) are two possible explanations #2 approach - Using M & ~M: • Checking the probability of M, ~M when S • P(M|S) = P(S| M) * P(M) / P(S) • P(~M|S) = P(S| ~M) * P(~M)/ P(S) • P(M|S) + P(~M | S) = 1 (must sum to 1)

Perceptrons • Linear separability • A set of (2D) patterns (x1, x2) of two classes is linearly separable if there exists a line on the (x1, x2) plane • w0 + w1x1 + w2 x2 = 0 • Separates all patterns of one class from the other class • A perceptron can be built with • 3 input x0 = 1, x1, x2 with weights w0, w1, w2 • n dimensional patterns (x1,…, xn) • Hyperplanew0 + w1x1 + w2 x2 +…+ wnxn = 0 dividing the space into two regions Ram Meshulam 2004

w13 w35 x5 x4 x3 x1 x2 w14 w23 w45 w24 Backpropagation example • Sigmoid as activation function with x=3: • g(in) = 1/(1+℮-3·in) • g’(in) = 3g(in)(1-g(in)) Ram Meshulam 2004

1 1 x0 x6 w03 w65 w04 w13 w35 x2 x5 x1 x3 x4 w14 w23 w45 w24 Adding the threshold Ram Meshulam 2004

Training Set • Logical XOR (exclusive OR)function x1 x2 output 0 0 0 0 1 1 1 0 1 1 1 0 • Choose random weights • <w03,w04,w13,w14,w23,w24,w65,w35,w45> = <0.03,0.04,0.13,0.14,-0.23,-0.24,0.65,0.35,0.45> • Learning rate: 0.1 for the hidden layers, 0.3 for the output layer Ram Meshulam 2004

First Example • Compute the outputs • a0 = 1 , a1= 0 , a2 = 0 • a3 = g(1*0.03 + 0*0.13 + 0*-0.23) = 0.522 • a4 = g(1*0.04 + 0*0.14 + 0*-0.24) = 0.530 • a6 = 1, a5 = g(0.65*1 + 0.35*0.522 + 0.45*0.530) = 0.961 • Calculate ∆5 = 3*g(1.0712)*(1-g(1.0712))*(0-0.961) = -0.108 • Calculate ∆6, ∆3, ∆4 • ∆6 = 3*g(1)*(1-g(1))*(0.65*-0.108) = -0.010 • ∆3 = 3*g(0.03)*(1-g(0.03))*(0.35*-0.108) = -0.028 • ∆4 = 3*g(0.04)*(1-g(0.04))*(0.45*-0.108) = -0.036 • Update weights for the output layer • w65 = 0.65 + 0.3*1*-0.108 = 0.618 • w35 = 0.35 + 0.3*0.522*-0.108 = 0.333 • w45 = 0.45 + 0.3*0.530*-0.108 = 0.433 Ram Meshulam 2004

First Example (cont) • Calculate ∆0, ∆1, ∆2 • ∆0 = 3*g(1)*(1-g(1))*(0.03*-0.028 + 0.04*-0.036) = -0.001 • ∆1 = 3*g(0)*(1-g(0))*(0.13*-0.028 + 0.14*-0.036) = -0.006 • ∆2 = 3*g(0)*(1-g(0))*(-0.23*-0.028 + -0.24*-0.036) = 0.011 • Update weights for the hidden layer • w03 = 0.03 + 0.1*1*-0.028 = 0.027 • w04 = 0.04 + 0.1*1*-0.036 = 0.036 • w13 = 0.13 + 0.1*0*-0.028 = 0.13 • w14 = 0.14 + 0.1*0*-0.036 = 0.14 • w23 = -0.23 + 0.1*0*-0.028 = -0.23 • w24 = -0.24 + 0.1*0*-0.036 = -0.24 Ram Meshulam 2004

Second Example • Compute the outputs • a0 = 1, a1= 0 , a2 = 1 • a3 = g(1*0.027 + 0*0.13 + 1*-0.23) = 0.352 • a4 = g(1*0.036 + 0*0.14 + 1*-0.24) = 0.352 • a6 = 1, a5 = g(0.618*1 + 0.333*0.352 + 0.433*0.352) = 0.935 • Calculate ∆1 = 3*g(0.888)*(1-g(0.888))*(1-0.935) = 0.012 • Calculate ∆6, ∆3, ∆4 • ∆6 = 3*g(1)*(1-g(1))*(0.618*0.012) = 0.001 • ∆3 = 3*g(-0.203)*(1-g(-0.203))*(0.333*0.012) = 0.003 • ∆4 = 3*g(-0.204)*(1-g(-0.204))*(0.433*0.012) = 0.004 • Update weights for the output layer • w65 = 0.618 + 0.3*1*0.012 = 0.623 • w35 = 0.333 + 0.3*0.352*0.012 = 0.334 • w45 = 0.433 + 0.3*0.352*0.012 = 0.434 Ram Meshulam 2004

Second Example (cont) • Calculate ∆0, ∆1, ∆2 • Skipped, we do not use them • Update weights for the hidden layer • w03 = 0.027 + 0.1*1*0.003 = 0.027 • w04 = 0.036 + 0.1*1*0.004 = 0.036 • w13 = 0.13 + 0.1*0*0.003 = 0.13 • w14 = 0.14 + 0.1*0*0.004 = 0.14 • w23 = -0.23 + 0.1*1*0.003 = -0.23 • w24 = -0.24 + 0.1*1*0.004 = -0.24 Ram Meshulam 2004

Bayesian networks • Syntax: • a set of nodes, one per variable • a directed, acyclic graph (link ≈ "directly influences") • a conditional distribution for each node given its parents: P (Xi | Parents (Xi))- conditional probability table (CPT) Ram Meshulam 2004

P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi))  full joint distribution table Calculation of Joint Probability • Given its parents, each node is conditionally independent of everything except its descendants • Thus, • Every BN over a domain implicitly represents some joint distribution over that domain Ram Meshulam 2004

Ram Meshulam 2004 Search Algorithms Overview

Ram Meshulam 2004 Search Algorithms Overview

Presentation Transcript

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

ARTIFICIAL INTELLIGENCE