Robert J. Marks II Distinguished Professor Of Electrical and Computer Engineering

Does Evolution Require External Information？ Some Lessons From Computational Intelligence进化需要外部信息吗？--来自计算智能的启示 Robert J. Marks II Distinguished Professor Of Electrical and Computer Engineering

Abstract摘要 Engineers use models of science to improve quality of life. Computational intelligence is such a useful engineering tool. It can create unexpected, insightful and clever results. Consequently, an image is often painted of computational intelligence as a free source of information. Although fast computers performing search do add information to a design, the needed information to solve even moderately sized problems is beyond the computational ability of the closed universe. Assumptions concerning the solution must be included. For targeted search, the requirement for added information is well known. The need has been popularized in the last decade by the No Free Lunch theorems. Using classic information theory, we show the added information for searches can, indeed, be measured. The total information available prior to search is determined by application of Bernoulli's principle of insufficient reason. The added information measures the information provided by the evolutionary program towards achieving the available information. Some recently proposed evolutionary models are shown, surprisingly, to offer negative added information to the design process and therefore perform worse than random sampling.

What is Evolutionary Computation?什么是进化计算？ • Simulation of Darwinian Evolution on a Computer How good is each solution? A set of possible solutions Computer Model Survival of the fittest Next generation Mutation Duplicate, Mutate & Crossover Keep a set of the best solutions

Yagi-Uda antenna (1954)  Space of all parameters. Parameters that give results better than Yagi-Uda T Designed by Evolutionary Search at NASA http://ic.arc.nasa.gov/projects/esg/research/antenna.htm Search in Engineering Design工程设计的搜索 • Can we do better? Engineers… • Create a parameterized model • Establish a measure design’s fitness • Search the N-D parameter space

Random Search: You are told “Yes & No” (Success and no success)随机搜索：你只被告知是或否 Target

Directed Search: Information is given to you定向搜索：你将获得更多信息 • e.g. • Warmer! • Steepest Descent • Conjugate Gradient Descent • Interval Halving Target

Blind Searches... 盲目搜索… 27 keys Apply Bernoulli's principle of insufficient reason “in the absence of any prior knowledge, we must assume that the events have equal probability Jakob Bernoulli, ``Ars Conjectandi'' (``The Art of Conjecturing), (1713). • Monkeys at a typewriter… 如同猴子打字… Information Theoretic Equivalent: Maximum Entropy (A Good Optimization Assumption)

Eiffel Tower has 7000 tons of iron Random Searches... 随机搜索... • “IT WAS A DARK AND STORMY NIGHT” 在一个夜黑风高的夜晚 2730= 8.73 x 1042 : 1 (143 bits) The same odds as choosing a single atom from over twenty trillion short tons of iron. Using Avogadro's number, we compute 27^30 atoms times 1 mole per 6.022 10^23 atoms times 55.845 grams per mole times 1 short ton per 907,185 grams = 1.22 x 10^12 short tons.

Converting Mass to Computing Power将重量转化为能量 • Minimum energy for an irreversible bit (Von Neumann-Landaurer limit = ln(2) k T = 1.15 x 10 -23joules • Mass of Universe ~ 1053 kg. Convert all the mass in the universe to energy (E=mc2) , we could generate1 7.83 x 1092 Bits • Assume age of universe is ~13.7 billion years. • Making the universe conversion to energy every nanosecond since the big bang gives only 3.4 x 10119 Bits 1. Assuming background radiation of 2.76 degrees Kelvin

Random Searches... 随机搜索... Convert all the mass in the universe to energy a billion times per second since the big bang – 10120 bits • A Definition of “Impossible” 一个不可能的定义 120 10  Impossible

L How Long a Phrase? 一个词组有多长？ Target • IN THE BEGINNING ... EARTH • JFD SDKA ASS SA ... KSLLS • KASFSDA SASSF A ... JDASF • J ASDFASD ASDFD ... ASFDG • JASKLF SADFAS D ... ASSDF . . . • IN THE BEGINNING ... EARTH Expected number = NL

How Long a Phrase from the Universe?一个随意选取的词组有多长？ Number of bits expected for a random search p=N-L 10120 bits = NL log2NL For N = 27, p=N-L L = 82 characters

Pr[tT ] =  Probability Search Space Pr()=1 Target T t Prescriptive Information in Targeted Search定向搜索中的处方性信息

Acceptable solutions  T Fitness 匹配度 Each point in the parameter space has a fitness. The problem of the search is finding a good enough fitness.

Search Algorithms算法搜索 Steepest Ascent Exhaustive Newton-Rapheson Levenberg-Marquardt Tabu Search Simulated Annealing Particle Swarm Search Evolutionary Approaches Problem: In order to work better than average, each algorithm implicitly assumes something about the search space and/or location of the target.

No Free Lunch Theorem无白吃午餐理论 With no knowledge of where the target is at and no knowledge about the fitness surface, one search performs, on average, as good as any another.

No Free Lunch Theorem Made EZ无白吃午餐理论使之简单化 Find the value of x that maximimizes the fitness, y. y x Nothing is known about the fitness, y.

Quotes on the need for added information for targeted search …关于定向搜索需要附加信息的引言… • “…unless you can make prior assumptions about the ... [problems] you are working on, then no search strategy, no matter how sophisticated, can be expected to perform better than any other” Yu-Chi Ho and D.L. Pepyne, (2001). • No free lunch theorems “indicate the importance of incorporating problem-specific knowledge into the behavior of the [optimization or search] algorithm.” David Wolpert & William G. Macready (1997). ``Simple explanantion of the No Free Lunch Theorem", Proceedings of the 40th IEEE Conference on Decision and Control, Orlando, Florida, "No free lunch theorems for optimization", IEEE Trans. Evolutionary Computation 1(1): 67-82 (1997).

Therefore...因此 • Nothing works better, on the average, than random search. • For a search algorithm like evolutionary search to work, we require prescribed exogenous information.

Evolutionary Search...进化搜索… • Evolutionary search is “able to adapt solutions to new problems and do not rely on explicit human knowledge.” David Fogel* BUT, the dominoes of an evolutionary program must be set up before the are knocked down. Recent results (NFL) dictate there must be implicitlyadded information in the crafting of an evolution program. * (emphasis added D. Fogel, Review of “Computational Intelligence: Imitating Life,” IEEE Trans. on Neural Networks, vol. 6, pp.1562-1565, 1995.

Evolutionary Computing进化计算 e.g. setting up a search requires formulation of a “fitness function” or a “penalty function.” Michael Healy, an early pioneer in applied search algorithms, called himself a “penalty function artist.”

9 6 ? Can a computer program generate more information than it is given If a search algorithm does not obey the NFL theorem, it “is like a perpetual motion machine - conservation of generalization performance precludes it.” Cullen Schaffer (1994) – anticipating the NFLT. • Cullen Schaffer, 1994. “A conservation law for generalization performance,”in Proc. Eleventh International Conference on Machine Learning, H. Willian and W. Cohen, San Francisco: Morgan Kaufmann, pp.295-265.

Shannon Information Axioms香农信息公理 • Small probability events should have more information than large probabilities. • “the nice person” (common words  lower info) • “philanthropist” (less used  more information) • Information from two disjoint events should add • “engineer”  Information I1 • “stuttering”  Information I2 • “stuttering engineer”  Information I1 + I2

Shannon Information香农信息 I p

 Probability Search Space Target T Targeted Search定向搜索 Bernoulli's Principle of Insufficient Reason = Maximum Entropy Assumption

Target T Available Information可获得的信息  This is all of the information we can get from the search. We can get no more.

 T Available Information: Interval Halving可获得的信息：区间半分法 no = 0 This is a “perfect search” no = 0 yes = 1 yes = 1 4 bits of information = 0011

Search Probability of Success.Choose a search algorithm...搜索成功的可能性. 选择一个搜索算法… Let be the probability of success of an evolutionary search. From NFL, on the average, if there is no added information: If information has been added (or we’re lucky).

reference Added Information Definition附加信息的定义 Checks: • For a “perfect search”, = all of the available information

Added Information附加信息 Checks: 2. For a “blind query”, = no added information

No added information: UNIFORM Target Added Information: MORE PROBABLE AREAS T Added Information 附加信息  Prescriptive information can be NEGATIVE

EXAMPLES of PRESCRIPTIVE INFORMATION处方性信息的例子 Interval HalvingRandom SearchPartitioned SearchFOO Search in Alphabet & NucleotidesNegative Added Information

Prescribed Exogenous Information in Random Searches...随机搜索中处方性外部信息… For random search, for very small p Q = Number of Queries (Trials) p = success of a trial pS = chance of one or more successes

Prescribed exogenous Information in Random Searches...随机搜索中处方性外部信息… • Added information is not a function of the size of the space or the probability of success – but only the number of queries. • There is a diminishing return. Two queries gives one bit of added information. Four queries gives two bits. Sixteen queries gives four bits, 256 gives 8 bits, etc.

XEHDASDSDDTTWSW*QITE*RIPOCFL • XERXPLEE*ETSXSR*IZAW**LPAEWL • MEQWASKL*RTPLSWKIRDOU*VPASRL 2. Prescribed exogenous Information in Partitioned Search...分步搜索中的处方性外部信息 • METHINKS*IT*IS*LIKE*A*WEASEL yada yada yada • METHINKS*IT*IS*LIKE*A*WEASEL

2. Prescribed exogenous Information in Partitioned Search...分步搜索中的处方性外部信息… • METHINKS*IT*IS*LIKE*A*WEASEL For random search For Partitioned Search Hints amplify the added information by a factor of L.

2. Prescribed exogenous Information in Partitioned Search...分步搜索中的处方性外部信息… For perfect search using partitioning information, set Iterations From Information Since L=28, if we set it follows that... L= 27 characters, 26 in alphabet

2. Prescribed exogenous Information in Partitioned Search...分步搜索中的处方性外部信息… • Comparison METHINKS*IT*IS*LIKE*A*WEASEL Reality: For Partitioned Search For Random Search There is a lot of added information! L= 28 characters, 27 in alphabet

2. Single Agent Mutation (MacKay) Single Agent Mutation (MacKay)解决方案的变异 • Specify a target of bits of length L • Initiate a string of random bits. • Form two children with mutation (bit flip) probability of . • Find the best fit of the two children. Kill the parent and weak child. If there is a tie between the kids, flip a coin. • Go to Step 3 and repeat. (WLOG, assume target is all ones)

2. Single Agent Mutation (MacKay)解决方案的变异

2 (L-k) k ones 1- 1- 2 (L-k) Single Agent Mutation (MacKay) 解决方案的变异 2. Single Agent Mutation (MacKay) If  <<1 , this is a Markov birth process.

128 bits = perfect search information I+ (Q) = 126.7516 bits Q=50,000 queries  = 0.00005, L=128 bits

3. Prescribed exogenous FOO Information处方性外部发生频率信息 FOO = frequency of occurrence Concise Oxford Dictionary (9th edition, 1995) Information of nth Letter Average information=Entropy

Kullback-Leibler Distance between FOO and Maximum Entropy English Alphabet Entropy英语字母熵 • English • Uniform • FOO • Added information

Asymptotic Equapartition Theorem渐进等分定理 • A FOO structuring of a long message restricts search to a subspace uniform in .  Target T • For a message with L characters with alphabet of N letters... FOO Subspace

Asymptotic Equapartition Theorem渐进等分定理 • For King James Bible using FOO, the prescriptive information is I+ = 6.169 MB. Available Information I = 16.717 MB • Can we add MORE information? digraphs trigraphs

4. Example of Negative Prescriptive Information 反向处方性信息的例子 • The NFL theorem has been useful to address the "sometimes outrageous claims that had been made of specific optimization algorithms“ S. Christensen and F. Oppacher, "What can we learn from No Free Lunch? A First Attempt to Characterize the Concept of a Searchable,“ Proceedings of the Genetic and Evolutionary Computation (2001).

4. Example of Negative Prescriptive Information 反向处方性信息的例子

Schneider’s EV施奈德进化理论

String of 131 nucleotides 24 weights on [-511, 512] Bias [-511, 512] Fixed binding site locations. error Equivalent to inverting a perceptron:

Robert J. Marks II Distinguished Professor Of Electrical and Computer Engineering

Robert J. Marks II Distinguished Professor Of Electrical and Computer Engineering

Presentation Transcript

Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

Vishwani D. Agrawal James J. Danaher Professor Dept. of Electrical and Computer Engineering

Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

Professor Jose Principe Distinguished Professor of Electrical and Biomedical Engineering

Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

Vishwani D. Agrawal James J. Danaher Professor Dept. of Electrical and Computer Engineering

Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering

Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering