1 / 76

Biologically Inspired Intelligent Systems

Biologically Inspired Intelligent Systems. Lecture 05 Dr. Roger S. Gaborski. HW#1 Answer Matlab Function. function [wts, tar, results = HebbTrainingHW1(inp, tar) % inp is the input data % tar is the target data wts = tar ’ *inp; % Check if response is correct

Télécharger la présentation

Biologically Inspired Intelligent Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biologically Inspired Intelligent Systems Lecture 05 Dr. Roger S. Gaborski Roger S. Gaborski

  2. HW#1 AnswerMatlab Function function [wts, tar, results = HebbTrainingHW1(inp, tar) % inp is the input data % tar is the target data wts = tar’*inp; % Check if response is correct % wts contain the weight matrix results = sign(wts*inp’)’;

  3. Quiz on Thursday • Review lecture material • Review recommended videos (see lecture notes for links) • Closed book except for one page of hand written notes • Turn in your one page of notes with your quiz (include your name on your notes) Roger S. Gaborski

  4. Textbook • Essentials of Metaheuristics • Sean Luke, George Mason University • Available online at no cost: http://cs.gmu.edu/~sean/book/metaheuristics/ Some lecture material taken from pp1-49 Roger S. Gaborski

  5. General Evolution • Darwin – 1859 Charles Darwin proposed a model explaining evolution Roger S. Gaborski

  6. Principles of Evolution • Diversity of population is critical for adaptability to the environment. Population of structures that are heterogeneous. Individual phenotypes express different traits. • Populations evolve over many generations. Reproduction of a specific individual depends on specific conditions of the environment and the organism’s ability to survive and produce offspring • Offspring inherit their parents fitness, but the genetic material of the parents merge and mutation results in slight variations Roger S. Gaborski

  7. Learning • Rote Learning: No inference, direct implantation of knowledge • Learning by Instruction: Knowledge acquired from a teacher or organized source • Learning by Deduction: Deductive, truth preserving inferences and memorization of useful conclusions • Learning by Analog: Transformation of existing knowledge that bears similarity to new desired concept • Learning by Induction: • Learning from examples (concept acquisition): Based on set of examples and counter examples, induce general concept description that explains examples and counter examples. • Learning by observations and discovery (descriptive generalization, unsupervised learning): Search regularities and general rules explaining observations without a teacher Roger S. Gaborski (Michalski, Carbonell, Mitchell,…)

  8. Evolutionary Algorithms • Inductive learning by observation and discovery • No teacher exists who presents examples. System develops examples on its own • Creation of new examples (search points) by the algorithm is an inductive guess on basis of existing knowledge • Knowledge base – population • If new example is good, added to population (knowledge added to population) Roger S. Gaborski

  9. Fitness • Evaluate every solution in the population and determine its fitness • Fitness is a measure of how close the solution matches the problem’s objective • Fitness is calculated by a fitness function • Fitness functions are problem dependent • Fitness values are usually positive, with zero being a perfect score (the larger the fitness value the worse the solution) Roger S. Gaborski

  10. In1 In2 +1 bias y If we know the structure, can we evolve the weights? There are 7 weights. The solution is a point in a 7 dimensional space Roger S. Gaborski

  11. Gradient Ascent orGradient Descent • Find the maximum (or minimum) of a function http://en.wikipedia.org/wiki/File: Gradient_ascent_%28surface%29.png http://en.wikipedia.org/wiki/ Gradient_descent Roger S. Gaborski

  12. Gradient Ascent • Find the maximum of a function • Start with arbitrary value x • Add to this value a small fraction of the slope x = x+hf’(x) h<1, f’(x) derivative • Positive slope, x increases • Negative slope, x decreases • x will continue towards the maximum, at the maximum of f(x) the slope is zero and x will no longer change Roger S. Gaborski

  13. Gradient Descent • Gradient descent is the same algorithm except the slope is subtracted to find the minimum of the function Roger S. Gaborski

  14. Gradient descent: algorithm • Start with a point (guess) Repeat • Determine a descent direction • Choose a stepUpdate • Until stopping criterion is satisfied Roger S. Gaborski

  15. Issues • Convergence time – how large should ‘h’ be? To large and we may overshoot the maxima, too small and we may run out of time before maximum (or minimum is found) • Local maxima (minima) • Saddle points • Newton’s Method: Use both the first and second derivatives: X = x + h( f’(x)/f’’(x) ) Roger S. Gaborski

  16. Gradient Ascent is a local optimization method • Instead of starting at only one x, randomly select several x’s and apply gradient ascent at each x allowing a wider exploration of the solution space Roger S. Gaborski

  17. Multidimensional Functions Replace x with vector x (underscore indicating a vector) Slope f’(x) is now the gradient of x, f(x) The gradient is a vector where each element of the vector is the slope of x in that dimension Δ Roger S. Gaborski

  18. We are making the assumption that we can calculate the first and possibly the second derivative in each dimension. • What if we can’t? • We don’t know what the function is • We can • Create inputs • Test the inputs • Assess the results Roger S. Gaborski

  19. Metaheuristic Algorithm • Initialization Procedure: Provide one or more initial candidate solutions • Assessment Procedure: Assess quality of a solution • Modification Procedure: Make a copy of a candidate solution and produce a candidate that is slightly randomly different from candidate solution (Derivatives are not calculated) Roger S. Gaborski

  20. Hill-Climbing • Somehow create an initial candidate solution • Randomly modify the candidate solution • If modified candidate is better than initial solution, replace initial candidate with modified candidate • Continue process until solution found or run out of time Roger S. Gaborski

  21. Hill Climbing Roger S. Gaborski

  22. Roger S. Gaborski

  23. Pick Initial (x,y) Location: 54,113 Roger S. Gaborski

  24. Simple Implementation • Compare value of the function at four adjacent location – N,S,E and W of current position: Current location: (54, 113) N: (54, 114) S: (54, 112) E: (55, 113) W: (53, 113) • Update location to largest function value • If no difference, done Roger S. Gaborski

  25. Roger S. Gaborski

  26. Roger S. Gaborski

  27. BIIShillClimbing1.m Roger S. Gaborski

  28. Variations on Basic Hill Climbing • Make several modifications to the candidate instead of a single modification • Always keep the modified candidate (don’t compare) but keep a separate variable called ‘best’ that always retains the best discovered solution – at end of program, return ‘best’ Roger S. Gaborski

  29. Individual Creation • Assume vector length L and a range of valid entries, low and high • Use a random number generator (uniform distribution) and scale values to be between low and high: >> low = 10; high = 20; >> v = low + (high-low) .*rand(5,1) v = 18.2415 12.1823 10.9964 16.1951 11.0381 Roger S. Gaborski

  30. Check Range: >> low = 10; >> high = 20; >> test = low + (high-low) .*rand(10000,1); >> min(test) ans = 10.0012 >> max(test) ans = 19.9991 Roger S. Gaborski

  31. Modification of Individual • Add a small amount of uniformly distributed random noise to each component of vector v u = (2*rand(10000,1))-1; %u ranges from -1 to +1 Simply scale u to the desired range, r Desired range -r to r, let r=.1 resulting in: -.1 to +.1 >> u1 = r*n; >> min(u1) = -0.1000 >> max(u1) = 0.1000 v(i) = v(i) + u1(i), check that v(i) is within bounds: low, high Roger S. Gaborski

  32. >> u1(1:5) = -0.0476 -0.0297 0.0526 -0.0639 -0.1000 >> low = 10; high = 20; v = low + (high-low) .*rand(5,1) v = 14.7211 11.7077 17.7977 19.9797 16.2644 Modified v: >> v = v +u1(1:5) v = 14.6735 11.6780 17.8503 19.9158 16.1644 Roger S. Gaborski

  33. Effects of Range Value • If r is very small, hill climbing will only explore local region and potentially get caught in local minima. • If r is very large, hilling climbing will bounce around and if its near the peak of the function it may miss it because it may overshoot the peak • r controls the degree of Exploration (randomly explore space) versus Exploitation (exploit local gradient) in the hill climbing algorithm Roger S. Gaborski

  34. Function with Local Maximums Roger S. Gaborski

  35. Another view Roger S. Gaborski

  36. Roger S. Gaborski

  37. Roger S. Gaborski

  38. Roger S. Gaborski

  39. Roger S. Gaborski

  40. BIIShillClimbing2.m Roger S. Gaborski

  41. Hill Climbing with Random Restarts • Extreme Exploration – random search • Extreme Exploitation – very small r • Combination: • Randomly select starting place x • Using small r, perform hill climbing for a random amount of time, save result if ‘Best’ • At end of time, randomly select new starting point x • Using small r, perform hill climbing for a random amount of time, save result if ‘Best’ • Repeat until solution found Roger S. Gaborski

  42. Affect of Time Interval • If random time interval is long, algorithm effectively becomes a Hill Climber algorithm • If random time interval is short, algorithm effectively becomes a random search • The random time interval drives the algorithm from one extreme to the other • Which is best  It Depends…. Roger S. Gaborski

  43. Random Restarts Roger S. Gaborski

  44. >> BIIShillClimbing3 MATLAB/CV2011 Roger S. Gaborski

  45. Roger S. Gaborski

  46. RANDOM SEARCH HILL CLIMBING RANDOM SEARCH HILL CLIMBING LEAD AWAY FROM MAXIMUM Roger S. Gaborski

  47. Previously, we required a bounded uniform distribution. The range of values was specified. A Gaussian distribution usually generates small numbers, but numbers of any magnitude are possible. Large numbers result in exploration Roger S. Gaborski

  48. GAUSSIAN DISTRIBUTION >>g1(1:5) = 0.0280 larger values -0.1634 -0.1019 1.0370 0.1884 PREVIOUSLY, UNIFORM >> u1(1:5) = -0.0476 -0.0297 0.0526 -0.0639 -0.1000 >> low = 10; high = 20; v = low + (high-low) .*rand(5,1) v = 14.7211 11.7077 17.7977 19.9797 16.2644 Modified v: >> v = v +u1(1:5) v = 14.6735 11.6780 17.8503 19.9158 16.1644 >> v = v +g1(1:5) v = 14.7491 11.5443 17.6958 21.0167 16.4528 Roger S. Gaborski

  49. Simulated Annealing • Differs from Hill Climbing in its decision when to replace the original individual (parent S) with the modified individual (the child R) • In Hill Climbing, check if modified individual is better. If it is, replace original • In simulated annealing, if the child is better, replace parent • If the child is NOT better, still replace parent with child with a certain probability P(t,R,S): • P(t,R,S)= exp((Quality(R)-Quality(S))/t) Roger S. Gaborski

  50. P(t,R,S)= exp((Quality(R)-Quality(S))/t) • Recall, R is worse than S • First, t ≥ 0 • (Quality(R) – Quality(S)) is negative • If R is much worse than S, the fraction is larger, and the probability is close to 0 • If R is very close to S, the probability is close to 1 and we will select R with reasonable probability • t is selectable, t close to 0, fraction is large and the probability is close to 0 • If t is large, probability is close to 1 Roger S. Gaborski

More Related