Recent Advances in Differential Evolution

Recent Advances in Differential Evolution Yong Wang Lecturer, Ph.D. School of Information Science and Engineering, Central South University ywang@csu.edu.cn

Outline of My Talk • Introduction to Differential Evolution • The State-of-the-Art of Differential Evolution • Composite Differential Evolution • Orthogonal Crossover based Differential Evolution • Conclusion

f(x) f(x,y) The optimal solution! y x x Evolutionary Algorithms • What are evolutionary algorithms (EAs)? • EAs are intelligent optimization and search techniques inspired by nature • Why evolutionary algorithms (EAs)? • The framework of evolutionary algorithms (EAs) the first individual the second individual Population Selection the NPth individual Replacement Parent Set The optimal solution! Crossover Mutation New Solutions Is it the optimal solution?

Differential Evolution (1/2) • Differential evolution (DE), proposed by Storn and Price in 1995, is one of the main branches of evolutionary algorithms (EAs). • DE includes three main operators, i.e., mutation operator, crossover operator, and selection operator. • Currently, DE has been successfully used in various fields.

Differential Evolution (2/2) • The algorithmic framework of DE the target vectors Remark: mutation + crossover = trial vector generation strategy

The Mutation Operators the fashion the base vector has been selected the number of the difference vector the base vector the scaling factor • rand/1 the difference vector • rand/2 the scaling factor • best/1 • best/2 • current-to-best/1 • current-to-rand/1 Remark: r1, r2, r3, r4, and r5 are different indexes uniformly randomly selected from , is the best individuals in the current population. The scaling factor F plays a very important role in mutation.

The Characteristics of the Mutation Operators (1/3) • rand/1 • Characteristics • rand/1 is the most commonly used mutation operator in the literature. • All vectors for mutation are selected from the population at random and, consequently, it has no bias to any special search directions and chooses new search directions in a random manner. • It usually demonstrates slow convergence speed and bears stronger exploration capability.

The Characteristics of the Mutation Operators (2/3) • rand/2 • Characteristics • In rand/2, two difference vectors are added to the base vector, which might lead to better perturbation than the strategies with only one difference vector. • It can generate more different trial vectors than the rand/1 mutation operator with respect to the same population. • When using rand/2, the diversity of the population can be kept, however, it has a side effect on the convergence speed of DE.

The Characteristics of the Mutation Operators (3/3) • best/1 • best/2 • current-to-best/1 • Characteristics • best/1, best/2 and current-to-best/1 usually have the fast convergence speed and perform well when solving unimodal problems. • They are easier to get stuck at a local optimum and thereby lead to a premature convergence when solving multimodal problems. • The best/1 is a degenerated case of the current-to-best/1 with the first scaling factor F being equal to 1.

The Crossover Operators (1/2) • Binomial crossover the target vector rand2>CR rand1>CR mix the trial vector rand1≤CR rand2≤CR the mutant vector is always different from

The Crossover Operators (2/2) • Exponential crossover the target vector mix the trial vector the mutant vector Pr(r≥v)= CRv-1 The crossover control parameter CR plays a very important role in crossover.

The Characteristics of the Crossover Operators • Characteristics • Binomial crossover is similar to discrete crossover in genetic algorithm. • Exponential crossover is functionally equivalent to two-point crossover in genetic algorithm. • Exponential crossover has the capability in maintaining the linkage among variables and the building block. • Binomial crossover may destroy building block.

DE Variations • By combining different mutation operators and different crossover operators, we can obtain different DE variants. • DE/x/y/z • DE: differential evolution • x: the fashion the base vector has been selected • y: the number of the difference vector • z: the type of the crossover operator; “bin” represents the binomial crossover and “exp” represents the exponential crossover • DE/rand/1/bin, DE/rand/1/exp, DE/rand/2/bin, …

The Illustrative Graph of DE/rand/1/bin the triangle denotes the trial vector

On Rotation Invariance (1/3) • Why the rotation invariance is very important for optimization algorithms • We have no a prior knowledge about the topology structure of the optimization problems

On Rotation Invariance (2/3) • In DE, the crossover control parameter CR controls the rotation invariance to a certain degree CR=0.0 CR=1.0 CR=0.5 S. Das, and P. N. Suganthan. Differential evolution: A survey of the state-of-the-art. IEEE Transactions on Evolutionary Computation, vol. 15, no. 1, pp. 4-31, 2011.

On Rotation Invariance (3/3) • current-to-rand/1 is a rotation-invariant strategy rand/1 arithmetic crossover binomial crossover/ exponential crossover Remark: current-to-rand/1 can be considered as rand/1 + arithmetic crossover, in which the crossover control parameter CR is unnecessary arithmetic crossover

The Current Research Directions of DE • The DE performance mainly depends on two components • trial vector generation strategy (i.e., the mutation and crossover operators) • control parameters (i.e., the population size NP, the scaling factor F, and the crossover control parameter CR). • Much effort has been made to improve the performance of DE • Introduction of new trial vector generation strategy for generating new solutions • Tuning the control parameters (static/deterministic, dynamic/adaptive, and self-adaptive) • Hybridization of DE with other operators or methods • Use of multiple populations (distributed DE)

Six Representative DE • jDE(self-adaptive parameters in DE, IEEE TEC, 2006, 10(6)) • DEahcSPX(DE with adaptive hill-climbing and simplex crossover, IEEE TEC, 2008, 12(1)) • SaDE (DE with strategy adaptation, IEEE TEC, 2009, 13(2)) • JADE (adaptive DE with optional external archive IEEE TEC, 2009, 13(5)) • DEGL (DE using a neighborhood-based mutation operator, IEEE TEC, 2009, 13(3)) • ODE(opposition-based DE, IEEE TEC, 2008, 12(1))

jDE • Main motivation • How to self-adaptively adjust the scaling factor F and the crossover control parameters CR of DE • Main idea • F and CR are applied at individual level

DEahcSPX (1/2) Local improvement process (LIP) oriented LS Crossover-based LS (XLS) • Main motivation • Incorporating local search (LS) heuristics is often very useful in designing an effective evolutionary algorithm for global optimization. • Main challenges of XLS • the length of the XLS • the selection of individuals which undergo the XLS • the choice of the other parents which participate in the crossover operation • whether deterministic or stochastic application of XLS should be used

DEahcSPX (2/2) • Main techniques • At each generation, firstly, the best individual with other np individuals randomly chosen from the population are selected to participate in the simplex crossover (SPX). • One offspring is produced and if the offspring is better than the best individual, then it will be used to replace the best individual. • Afterward, DE is implemented. adaptive hill-climbing (ahc)

SaDE (1/4) • Main motivation • At different stages of evolution, different trial vector generation strategies coupled with different control parameter settings may be required in order to achieve the best performance. • Main idea • Adaptively adjust the trial vector generation strategies and the control parameters simultaneously by learning from their previous experiences.

SaDE (2/4) • How to adapt the trial vector generation strategy • Use four trial vector generation strategies to construct the strategy candidate pool

SaDE (3/4) • How to adapt the trial vector generation strategy • For each trial vector generation strategy at generation G, SaDE records: • nk,G : the number of the trial vectors generated by the kthstrategy • nsk,G: the number of the trial vectors generated by the kthstrategy which can enter the next generation • During the first LP generations, each trial vector generation strategy is chose with the same probability. When the generation number G is larger than LP, the probability, pk,G, of using control parameter setting k is calculated as follows: avoid all the success rates being equal to zero and

SaDE (4/4) • How to adapt F and CR • the parameter F is approximated by a normal distribution with mean value 0.5 and standard deviation 0.3, denoted by N(0.5,0.3). • CR obeys a normal distribution with mean value CRm and standard deviation Std=0.1, denoted by N(CRm,Std) where CRm is initialized as 0.5. • CRMemoryk is used to store those values with respect to the kth strategy that have generated trial vectors successfully entering the next generation within the previous LPgenerations. • During the first LP generations, CR values with respect tokth strategy are generated by N(0.5,0.1). • At each generation after LP generations, the median value stored in CRMemoryk will be calculated to overwrite CRmk. Then, CR values can be generated according to N(CRmk,0.1) when applying the kth strategy.

JADE (1/4) • Main motivation • The current-to-best/1benefit from its fast convergence by incorporating best solution information in the evolutionary search. However, the best solution information may also cause problems such as premature convergence due to the resultant reduced population diversity. • A well-designed parameter adaptation scheme is usually beneficial to enhance the robustness of an algorithm. How to exploit the advantages and overcome the disadvantages of the current-to-best/1 and how to adapt F and CR during the evolution

JADE (2/4) • A new mutation operator: current-to-pbest/1 • The characteristics of current-to-pbest/1 • Any of the top 100p% solutions can be randomly chosen to play the role of the single best solution in DE/current-to-best. • Recently explored inferior solutions, when compared to the current population, provide additional information about the promising progress direction. Denote Aas the set of archived inferior solutions and P as the current population. • is randomly chosen from the union .

JADE (3/4) • How to adapt CR The mean is initialized to be 0.5 and then updated at the end of each generation as: where cis a positive constant between 0 and 1, SCRis the set of all successful crossover probabilities CRiat generation G. and meanA(·) is the usual arithmetic mean.

JADE (4/4) • How to adapt F The location parameter of the Cauchydistribution is initialized to be 0.5 and then updated at the endof each generation as whereSFis the set of all successful mutation factorsin generation G and meanL(·)is the Lehmer mean

DEGL (1/3) • Main motivation • A proper tradeoff between exploration and exploitation is necessary for the efficiency and effectiveness of a population-based stochastic search method. • The current-to-best/1 of DE favors exploitation only, since all the vectors are attracted by the same best position found so far by the entire population. • As a result of such exploitative tendency, in many cases, the population of DE may lose its global exploration abilities within a relatively small number of generations. How to balance the exploration and exploitation in the current-to-best/1

DEGL (2/3) • Main idea global mutation model local neighborhood model how to define the neighborhood Remark: w controls the balance between the exploration and exploitation

DEGL (3/3) • How to set the parameter w • increasing weight factor • linear increment • exponential increment • random weight factor • self-adaptive weight factor the weight factor associated with the best individual of the population

ODE (1/3) • Main motivation • All population-based optimization algorithms, no exception for DE, suffer from long computational times because of their evolutionary/stochastic nature. • In the absence of a priori information about the solution, we usually start with random guesses. The computation time, is related to the distance of these initial guesses from the optimal solution. • Main idea • By using the current solution and its opposite solution, the convergence speed of DE can be enhanced (opposition-based learning). the optimal solution the opposite solution the current solution a (a+b)/2 b

ODE (2/3) • Basic definitions of opposition-based learning • Opposite number: Let be a real number. The opposite number is defined as . • Opposite solution: Let be a solution in D-dimensional space, where and . The opposite point is completely defined by its components . • Opposition-Based Comparison: Let be a solution in D-dimensional space and its objective function value. According to the definition of the opposite solution, is the opposite solution of . If , then can be replaced with .

ODE (3/3) • Opposition-based population initialization • Opposition-based generation jumping

Composite Differential Evolution (CoDE) • Motivation • During the last decade, DE researchers have suggested many empirical guidelines for choosing trial vector generation strategies and control parameter settings. • some trial vector generation strategies are suitable for the global search and some others are useful for rotated problems • some control parameter settings can speed up the convergence and some other settings are effective for separable functions • However, these experiences have not yet systematically exploited in DE algorithm design. whether the performance of DE can be improved by combining several effective trial vector generation strategies with some suitable control parameter settings

DE/rand/1/bin F=1.0, CR=0.1 DE/rand/2/bin F=1.0, CR=0.9 DE/current-to-rand/1 F=0.8, CR=0.2 Composite Differential Evolution (CoDE) • Main idea strategy candidate pool parameter candidate pool Y. Wang, Z. Cai, and Q. Zhang, “Differential evolution with composite trial vector generation strategies and control parameters.” IEEE Transactions on Evolutionary Computation, vol. 15, no. 1, pp. 55-66, 2011.

Composite Differential Evolution (CoDE) • In general, we expect that the chosen trial vector generation strategies and control parameter settings show distinct advantages. • Thus, they can be effectively combined to solve different kinds of problems.

Composite Differential Evolution (CoDE) • Basic properties of the strategy candidate pool • DE/rand/1/bin has stronger global exploration ability, and it is effective when solving multimodal problems. • DE/rand/2/bin may lead to better permutation than DE/rand/1/bin, since the former uses two difference vectors. • DE/current-to-rand/1 is rotation-invariant and suitable for rotated problems.

Composite Differential Evolution (CoDE) • Basic properties of the parameter candidate pool • A large value of Fcan make the mutant vectors distribute widely in the search space and can increase the population diversity. • A low value of Fmakes the search focus on neighborhoods of the current solutions, and thus it can speed up the convergence. • A large value of CRcan make the trial vector very different from the target vector. Therefore, the diversity of the offspring population can be encouraged. • A small value of CRis very suitable for separable problems, since in this case the trial vector may be different from the target vector by only one parameter and, as a result, each parameter is optimized independently.

Composite Differential Evolution (CoDE) • Basic properties of the parameter candidate pool • When combined with the three strategies, [F=1.0,CR=0.1] is for dealing with separable problems. • [F=1.0,CR=0.9] is mainly to maintain the population diversity and to make the three strategies powerful in global exploration. • [F=0.8,CR=0.2] encourages the exploitation of the three strategies in the search space and thus accelerates the convergence speed of the population. Conclusion: the selected strategies and parameter settings exhibit distinct advantages and, therefore, they can complement one another for solving optimization problems of different characteristics.

Composite Differential Evolution (CoDE) • The main framework the first trial vector target vector the second trial vector the best trial vector the third trial vector combining each trial vector generation strategies with one control parameter setting randomly selected comparison

Composite Differential Evolution (CoDE) • The experimental results • 25 test functions proposed in the IEEE CEC2005 were used to study the performance of the proposed CoDE • unimodal functions F1–F5 • basic multimodal functions F6–F12 • expanded multimodal functionsF13–F14 • hybrid composition functionsF15–F25 • The average and standard deviation of the function error value were recorded for measuring the performance of each algorithm • For each test function, 25 independent runs were conducted with 300,000 function evaluations (FES) as the termination criterion • Wilcoxon’s rank sum test at a 0.05 significance level was conducted on the experimental results the global optimum of the test function the best solution found by the algorithm in a run

Composite Differential Evolution (CoDE) • The experimental results • Comparison with four state-of-the-art DE “－”, “＋”, and “≈” denote that the performance of the corresponding algorithm is worse than, better than, and similar to that of CoDE, respectively. basic multimodal functions expanded multimodal functions hybrid composition functions CoDE is the best unimodal functions CoDE is the second best Overall, CoDE is better than the fourcompetitors.

Composite Differential Evolution (CoDE) • The experimental results • Comparison with CLPSO, CMA-ES, and GL-25 “－”, “＋”, and “≈” denote that the performance of the corresponding algorithm is worse than, better than, and similar to that of CoDE, respectively. Overall, CoDE significantly outperforms CLPSO, CMA-ES,and GL-25.

Composite Differential Evolution (CoDE) • The experimental results • Random selection of the control parameter settings (CoDE) • Adaptive selection of the control parameter settings (adaptive CoDE) • Adaptive CoDE VS CoDE the adaptive CoDE outperforms CoDE on one unimodal function CoDE wins the adaptive CoDE on another unimodal function CoDE wins on two hybrid composition functions Overall, CoDE is slightly better than the adaptive CoDE.

Recent Advances in Differential Evolution

Recent Advances in Differential Evolution

Presentation Transcript

Recent Advances in Rheumatoid Arthritis

RECENT ADVANCES IN DENTAL CERAMICS

RECENT ADVANCES IN BIOTECHNOLOGY

Recent advances in renal hypertension

Recent Advances in SmartGridSolve

RECENT ADVANCES IN DIABETES MELLITUS

Recent Advances in ViPER

Recent advances in EnKF

recent technology advances

Recent Advances in Radiosity

Recent Advances in ViPER

Differential Evolution

Recent Advances

Recent advances in High-Energy QCD evolution equations

Recent Advances in Magneto-Optics

Differential Evolution

TAU: Recent Advances

Recent advances in knee Replacement

Recent Advances in Query Optimization