620 likes | 644 Vues
Evolving Efficient List Search Algorithms. Kfir Wolfson Moshe Sipper Department of Computer Science Ben Gurion University 2009. Agenda. Introduction Evolutionary Setup Results Less Knowledge – More Automation Related Work Conclusions and Future Work Post-Evolutionary Analysis.
E N D
Evolving EfficientList Search Algorithms Kfir Wolfson Moshe Sipper Department of Computer Science Ben Gurion University 2009
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work • Post-Evolutionary Analysis Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Introduction • Evolutionary algorithms have been applied to many areas, but limited research on software engineering and algorithmic design • We introduce: Algorithmic design through Darwinian evolution • Begin with a benchmark case — List Search Algorithms: • Can evolution be applied to finding a search algorithm? • Can evolution be applied to finding an efficient search algorithm? • Yes to both questions — using GP • Only related work Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Binary Search Knuth (The Art of Computer Programming): “Although the basic idea of binary search is comparatively straightforward, the details can be somewhat tricky, and many good programmers have done it wrong the first few times they tried.” Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work • Post-Evolutionary Analysis Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
If INDEX:= = NOP Array [INDEX] KEY ITER Representation • Genotype • Koza-style GP • Evaluation trees • Strongly typed • More understandable algorithms • Function and Terminal sets • Same for evolution both of linear andsublinear search algorithms Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Array = Representation KEY = 18 • Phenotype • Array search algorithm • Searches for a key in a 1-dimentional array Java function: publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { -> PLUG IN EVOLVING GENOTYPE HERE <- } return INDEX; } Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Array = Representation KEY = 18 publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { -> PLUG IN EVOLVING GENOTYPE HERE <- } return INDEX; } Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
global variables • Set to: • n for linear search • log2 n for sublinear Array = Representation KEY = 18 publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { } return INDEX; } -> PLUG IN EVOLVING GENOTYPE HERE <- return variable index (might be “illegal”) Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
later replaced with an ADF Array = Representation KEY = 18 -> PLUG IN EVOLVING GENOTYPE HERE <- Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
If Equivalent Java: INDEX:= = NOP if (arr[INDEX] == KEY) ; else INDEX = ITER; Array [INDEX] KEY ITER Representation - Example • An example correct solution to linear search problem: LISP: (If (= Array[INDEX] KEY) NOP INDEX:= ITER))) Let’s plug into the phenotype frame… Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Representation - Example • An example correct solution to linear search problem: publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { if (arr[INDEX] == KEY) ; else INDEX = ITER; } return INDEX; } Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Representation int search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER=0; ITER < iterations; ITER++) { -> PLUG IN GENOTYPE HERE <- } return INDEX; } • search call: • Always halts • No loop functions • Only read access to ITER • Number of iterations is limited • Inherently deals with keys not in the array • With wrapper function • No early termination when key is found • Harder problem:Evolved algorithm will have to learn to retain correct index. Why? Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Fitness Function minN=2 • How do we rate a solution? • Many random arrays • Of varying array lengths • Search for all keys in all arrays • Reward individual for closenessof returned indexes • Key range disjoint from index range • Discourage “cheating” • Sorted and unsorted arrays of positive integers maxN=100 Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
correct search(arr,key) correct search(arr,key) Hit ! error=0 error=2 Fitness Function - Definitions • Error per single key search • Distance between the correct index of KEY and the index returned • Elements are unique - No ambiguity • Hit = finding precise location of KEY • error = 0 KEY = 18 Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Fitness Function • The fitness value of an individual is defined as: • This gives a 0.5% bonus reduction for every 1% of correct hits • For example, if an individual scored 300 hits in 1000 search calls, its fitness will be the average error per call, reduced by 15% • This bonus • encourages perfect answers (“almost” is bad…), • increases fitness variation in population Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Generality Test • The best solution of each run was subjected to a stringent generality test, by running it on random arrays of all lengths in the range [2, 5000] ([2, 500] for linear case). • Kinnear (1993) noted that: “For any algorithm... that operates on an infinite domain of data, no amount of testing can ever establish generality. Testing can only increase confidence.” • We included analysis by hand for selected solutions. Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
GP Operators and Parameters Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work • Post-Evolutionary Analysis Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Results - Linear • It turned out that evolving a linear-time search algorithm was quite easy with the function and terminal sets we designed. • 46 out of 50 runs (92%) produced perfect solutions, passing the generality testing of arrays up to length 500. • Our representation rendered the problem easy enough for a perfect individual to appear in the randomly generated generation 0 in three of the runs. • Search space was small enough for random search. Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Equivalent Java: if (arr[INDEX] == KEY) M1 = (M0+M1)/2; else INDEX = ITER; If INDEX:= = M1:= Array [INDEX] KEY ITER [M0+M1]/2 Results - Linear • An example evolved solution: LISP: (If (= Array[INDEX] KEY) (M1:= [M0+M1]/2) INDEX:= ITER))) Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Irrelevant but does not effect output index Results - Linear • An example evolved solution: publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { if (arr[INDEX] == KEY) M1 = (M0+M1)/2; else INDEX = ITER; } return INDEX; } Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Sublinear Search • We set iterationsto log2n,and proceeded to evolve sublinear search algorithms. publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { -> PLUG IN EVOLVING GENOTYPE HERE <- } return INDEX; } Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Results - Sublinear • Unsurprisingly, this case proved a harder problem, but it was also solved by the evolution. • 35 out of 50 runs (70%) produced perfect solutions, passing the generality testing of arrays up to length 5,000. • 7 runs (14%) produced near-perfect solutions, which failed on a single key in the input arrays (99.96% hits on the generality test) Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Results – Sublinear • An example simplified evolved solution: LISP: Equivalent Java: (Simplified by hand from a tree of 50 nodes down to 14) (PROGN2 (INDEX:= [M0+M1]/2) (If (> KEY Array[INDEX]) (PROGN2 (M0:= [M0+M1]/2) (INDEX:= M1)) (M1:= [M0+M1]/2)))) INDEX = (M0+M1)/2 ; if (KEY > arr[INDEX]){ M0 = (M0+M1)/2 ; INDEX = M1; } else M1 = (M0+M1)/2 ; Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Results - Sublinear publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { INDEX = (M0+M1)/2 ; if (KEY > arr[INDEX]){ M0 = (M0+M1)/2 ; INDEX = M1; } else M1 = (M0+M1)/2 ; } return INDEX; } This is a form ofBinary Search(with a small twist) Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work • Post-Evolutionary Analysis Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Less Knowledge – More Automation • Re-examining representation: • Most terminals and functions are either • General-purpose or • Problem-specific • However, one terminal stands out: [M0+M1]/2 • Solution-specific • We proceed to • Remove [M0+M1]/2 terminal • Add an automatically defined function (ADF) Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Adding ADF PROGN2 PROGN2 PROGN2 INDEX:= INDEX:= INDEX:= INDEX Array [INDEX] Array [INDEX] KEY KEY ITER M0:= M0:= M1:= M1:= M1 M1 M0 TRUE FALSE NOP [M0+M1]/2 If If [M0+M1]/2 [M0+M1]/2 [M0+M1]/2 ADF0 < = > > Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Adding ADF PROGN2 PROGN2 PROGN2 ADF0 INDEX:= INDEX:= INDEX:= INDEX Array [INDEX] Array [INDEX] KEY KEY ITER M0:= M0:= M1:= M1:= M1 M1 M0 ADF Functions & Terminals TRUE FALSE NOP TRUE If If ADF0 ADF0 ADF0 + + / / 0 1 1 M0 M0 < = > > - - * * 2 M1 M1 Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Results – Sublinear with ADF • The sublinear search problem with an ADF naturally proved more difficult than with the [M0+M1]/2 terminal • 12 out of 50 runs (24%) produced perfect solutions, passing the generality testing of arrays up to length 5,000 (later increased to 20,000) • In a set of 50 additional runs, without the “2” terminal, the success rate was lower - 8%. Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Results – Sublinear with ADF • Analysis revealed all perfect solutions to be variations of binary search • The algorithmic idea can be deduced by inspecting the ADFs, all of which turned out to be equivalent to one of the following (all fractions truncated): which are reminiscent of the [M0+M1]/2 terminal we dropped (M0+M1)/2 (M0+M1+1)/2 M0/2+(M1+1)/2 Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Results – Sublinear with ADF • An example simplified evolved solution: LISP: Equivalent Java: (PROGN2 (PROGN2 (if (< Array[INDEX] KEY) (INDEX:= ADF0) NOP) (if (< Array[INDEX] KEY) (M0:= INDEX) (M1:= INDEX))) (INDEX:= ADF0))) ADF0: (/ (+ (+ 1 M0) M1) 2) if (arr[INDEX] < KEY) INDEX = ((1+M0)+M1)/2; if (arr[INDEX] < KEY) M0 = INDEX; else M1 = INDEX; INDEX = ((1+M0)+M1)/2; (Simplified by hand from a total of 58 nodes down to 26) Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Results – Sublinear with ADF publicstaticint search(int[] arr, int KEY) { int n = arr.length; int M0 = 0; int M1 = n-1; int INDEX = 0; for (int ITER = 0; ITER < iterations; ITER++) { if (arr[INDEX] < KEY) INDEX = ((1+M0)+M1)/2; if (arr[INDEX] < KEY) M0 = INDEX; else M1 = INDEX; INDEX = ((1+M0)+M1)/2; } return INDEX; } This is another form ofBinary Search(with a different twist) Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Interesting Results • Interesting to mention some of the other evolved solutions • With minN=2, maxN=100 and main-tree max-depth = 17 linearsearch algorithms had evolved, failing on longer arrays • How is this possible (in log2n iterations)? • An O(logn) solution has a constant factor, i.e. algorithm does klogn operations. • We set a limit to number of iterations, where each iteration the full genotype code is executed. • A linear search could evolve, by taking advantage of the constant factor k Skip Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
If key is found, do nothing else increment INDEX by 1 Interesting Results • Linear solution ADF: ADF0=(M0+1) • Main tree included 16 occurrences of: • For array of size n=100: • logn=7, for k=16: klogn=167>100 (enough to traverse all the array) • We proceeded to • increase minN, maxN (to 200, 300), • decrease maximum k, by lowering max-depth to 10 (If (= Array[INDEX] KEY) NOP (PROGN2 (M0:= ADF0) (INDEX:= M0))) Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
256 …. 4 8 2 1 Interesting Results • An intiguing solution has evolved • Gains perfect scores (100% hits) up to array length 6,643 • But ADF is: ADF0 = 2*M1-M0-1 • Analyzing it revealed an interesting algorithm which makes a series of jumps in exponentially increasing size, restarting them when next element is too small • Thus was able to handle array sizes n such that, • n ≤ 511 x log2n n ≤ 6643 Skip Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
M1 2*M1-M0-1 256 …. 4 8 2 1 Notice INDEX is one step ahead of M1 allows backtracking Exponential Jumps • ADF0 = 2*M1-M0-1 • Main tree included 8 occurrences similar to: • Difference grows by factor of 2 (PROGN2 (if (> Array[INDEX] KEY) (M1:= ADF0) NOP) (INDEX:= ADF0)) M1’ 2*M1 -M0-1 M1’’ 2*M1’-M0-1 ------------------ M1’’-M1’ = 2(M1’-M1) Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work • Post-Evolutionary Analysis Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Related Work • No previous work on evolving list search algorithms • “Closest”: sorting algorithms • Loosely related – in both cases, solutions have to be 100% correct • We found 10-15 works on evolving sorting algorithms • In some of the works, evolution of sorting algorithms was not the goal, but just an example problem. • Koza’s ADIs (1999), Kirshenbaum’s iteration schema (2001) • not good for sublinear - inherit O(n) • Loops constructs, such as Koza’s ADL (1999) • In search, as opposed to sorting, the heart of the algorithm is the loop contents and not the fact there is a loop, so defined outside genotype. More… Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work • Post-Evolutionary Analysis Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Conclusions • We showed that algorithmic design of efficient list search algorithms is possible. • high-level fitness function • encouraging correct answers • within a given number of iterations • Linear search was very simple with our setup.Sublinear much more challenging. • Evolution produced: • many variations of correct binary search, • some nearly-correct solutions erring on a mere handful of extreme cases (which one might expect, according to Knuth), • and interesting solutions with innovative algorithmic ideas. Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Future Work • Post-evolutionary analysis • Tree Alignmentalgorithms from bioinformatics • Joint work with M. Ziv-Ukelson and S. Zakov, BGU, Israel. • Submitted as a journal paper • Coevolution of individual main trees and ADFs • As in Ahluwalia (2001) • Turing Complete representation • e.g. current phenotypes always halt • More algorithms, like interpolation search • Ultimately, we wish to find an algorithmic innovation not yet invented by humans. Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Agenda • Introduction • Evolutionary Setup • Results • Less Knowledge – More Automation • Related Work • Conclusions and Future Work • Post-Evolutionary Analysis 44 Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Have Your Spaghetti and Eat it Too: Kfir Wolfson, Shay Zakov Moshe Sipper, Michal Ziv-Ukelson Department of Computer Science Ben Gurion University 2009 Evolutionary Algorithmics and Post-Evolutionary Analysis
Post-Evolutionary Analysis GP solutions tend to be bloated Arduous to analyze and comprehend We turn to bioinformatics to design a methodology for analyzing and comprehending our GP programs. Redefine building blocks (BBs) based on semantics rather then syntax (phenotype instead of genotype) Take a task-oriented analysis approach for code reasoning by identifying semantic BBs, and analyzing them as a step to understanding the entire evolved algorithm. Employ a new 3-step analysis tool: G-PEA GP Post Evolutionary Analysis Use the Array Search problem as an example 46 Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Post-Evolutionary Analysis Intuitions used as guidelines in this work: The task, or function, performed by a GP expression is used to find correlation between sub-trees in the search for building blocks. The standard search for identical structural or syntactical motifs is too strict for code created by evolution. We employ a similarity-based measurement in the analysis. Like in nature, the repetition of similar fragments in a number of evolved individuals suggests the importance of these fragments. (Syntactic) expressions with no observed similar instances are less likely to play a significant role. Take advantage of the multitude of separately evolved solutions (common in GP), to understand how each of them works. 47 Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
Novelty System based on: Similarity (of sub-expressions), not identity Semantics, not syntax Multiple solutions (trees), not single individual 48 Evolving Efficient List Search Algorithms - ECAL – 09.05.2010
G-PEA Methodology I III II All sub-expressions For each cluster, try to deduce a common task for the expressions within the cluster. This is a candidate semantic building block. Pairwise functional similarity between sub-expressions Bottom-up O(n2)
Measuring Expression Similarity Distance metric Semantically oriented Language specific (not problem-specific!) For Tree-Based GP Tree Edit Distance Tree Edit Distance Edit operations Replace node, remove node, add subtree, etc’. Cost for each operation Edit script – cost of script consists of all operation costs Edit distance = cost of minimum script that transforms T1 to T2 Normalize to [0,1]. 0 = equivalent, 1 = no similarity detected Think recursively, implement iteratively - dynamic programming 50 Evolving Efficient List Search Algorithms - ECAL – 09.05.2010