1 / 73

Hierarchical Bayesian Optimization Algorithm (hBOA)

Hierarchical Bayesian Optimization Algorithm (hBOA). Martin Pelikan University of Missouri at St. Louis pelikan@cs.umsl.edu. Foreword. Motivation Black-box optimization (BBO) problem Set of all potential solutions Performance measure (evaluation procedure)

randi
Télécharger la présentation

Hierarchical Bayesian Optimization Algorithm (hBOA)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hierarchical Bayesian Optimization Algorithm (hBOA) Martin Pelikan University of Missouri at St. Louis pelikan@cs.umsl.edu

  2. Foreword • Motivation • Black-box optimization (BBO) problem • Set of all potential solutions • Performance measure (evaluation procedure) • Task: Find optimum (best solution) • Formulation useful: No need for gradient, numerical functions, … • But many important and tough challenges • This talk • Combine machine learning and evolutionary computation • Create practical and powerful optimizers (BOA and hBOA)

  3. Overview • Black-box optimization (BBO) • BBO via probabilistic modeling • Motivation and examples • Bayesian optimization algorithm (BOA) • Hierarchical BOA (hBOA) • Theory and experiment • Conclusions

  4. Black-box Optimization • Input • How do potential solutions look like? • How to evaluate quality of potential solutions? • Output • Best solution (the optimum) • Important • We don’t know what’s inside evaluation procedure • Vector and tree representations common • This talk: Binary strings of fixed length

  5. BBO: Examples • Atomic cluster optimization • Solutions: Vectors specifying positions of all atoms • Performance: Lower energy is better • Telecom network optimization • Solutions: Connections between nodes (cities, …) • Performance: Satisfy constraints, minimize cost • Design • Solutions: Vectors specifying parameters of the design • Performance: Finite element analysis, experiment, …

  6. BBO: Advantages & Difficulties • Advantages • Use same optimizer for all problems. • No need for much prior knowledge. • Difficulties • Many places to go • 100-bit strings…1267650600228229401496703205376 solutions. • Enumeration is not an option. • Many places to get stuck • Local operators are not an option. • Must learn what’s in the box automatically. • Noise, multiple objectives, interactive evaluation, ...

  7. Typical Black-Box Optimizer • Sample solutions • Evaluated sampled solutions • Learn to sample better Evaluate Sample Learn

  8. Many Ways to Do It • Hill climber • Start with a random solution. • Flip bit that improves the solution most. • Finish when no more improvement possible. • Simulated annealing • Introduce Metropolis. • Evolutionary algorithms • Inspiration from natural evolution and genetics.

  9. Evolutionary Algorithms • Evolve a population of candidate solutions. • Start with a random population. • Iteration • SelectionSelect promising solutions • VariationApply crossover and mutation to selected solutions • ReplacementIncorporate new solutions into original population

  10. Estimation of Distribution Algorithms • Replace standard variation operators by • Building a probabilistic model of promising solutions • Sampling the built model to generate new solutions • Probabilistic model • Stores features that make good solutions good • Generates new solutions with just those features

  11. EDAs Selected population Current population Newpopulation 11001 01111 11001 Probabilistic Model 10101 11001 10101 01011 11011 01011 11000 00111 11000

  12. What Models to Use? • Our plan • Simple example: Probability vector for binary strings • Bayesian networks (BOA) • Bayesian networks with local structures (hBOA)

  13. Probability Vector • Baluja (1995) • Assumes binary strings of fixed length • Stores probability of a 1 in each position. • New strings generated with those proportions. • Example:(0.5, 0.5, …, 0.5) for uniform distribution(1, 1, …, 1) for generating strings of all 1s

  14. EDA Example: Probability Vector Current population Selected population Newpopulation 11001 10101 11001 10101 10001 10101 1.0 0.5 0.5 0.0 1.0 01011 11101 01011 11000 11001 11000 11001 10101 01011 11000

  15. Probability Vector Dynamics • Bits that perform better get more copies. • And are combined in new ways. • But context of each bit is ignored. • Example problem 1: ONEMAX • Optimum: 111…1

  16. Probability Vector on ONEMAX Optimum Proportions of 1s Iteration

  17. Probability Vector on ONEMAX Optimum Proportions of 1s Success Iteration

  18. Probability Vector: Ideal Scale-up • O(n log n) evaluations until convergence • (Harik, Cantú-Paz, Goldberg, & Miller, 1997) • (Mühlenbein, Schlierkamp-Vosen, 1993) • Other algorithms • Hill climber: O(n log n) (Mühlenbein, 1992) • GA with uniform: approx. O(n log n) • GA with one-point: slightly slower

  19. When Does Prob. Vector Fail? • Example problem 2: Concatenated traps • Partition input string into disjoint groups of 5 bits. • Each group contributes via trap (ones=num. ones): • Concatenated trap = sum of single traps • Optimum: 111…1

  20. Trap Global optimum Trap Number of 1s

  21. Probability Vector on Traps Optimum Proportions of 1s Iteration

  22. Probability Vector on Traps Optimum Failure Proportions of 1s Iteration

  23. Why Failure? • Onemax: • Optimum in 111…1 • 1 outperforms 0 on average. • Traps: optimum in 11111, but • f(0****) = 2 • f(1****) = 1.375 • So single bits are misleading.

  24. How to Fix It? • Consider 5-bit statistics instead of 1-bit ones. • Then, 11111 would outperform 00000. • Learn model • Compute p(00000), p(00001), …, p(11111) • Sample model • Sample 5 bits at a time • Generate 00000 with p(00000), 00001 with p(00001), …

  25. Correct Model on Traps: Dynamics Optimum Proportions of 1s Iteration

  26. Correct Model on Traps: Dynamics Optimum Proportions of 1s Success Iteration

  27. Good News: Good Stats Work Great! • Optimum in O(n log n) evaluations. • Same performance as on onemax! • Others • Hill climber: O(n5 log n) = much worse. • GA with uniform: O(2n) = intractable. • GA with one point: O(2n) (without tight linkage).

  28. Challenge • If we could learnand use context for each position • Find nonmisleading statistics. • Use those statistics as in probability vector. • Then we could solve problems decomposable into statistics of order at most k with at most O(n2) evaluations! • And there are many of those problems.

  29. Bayesian Optimization Algorithm (BOA) • Pelikan, Goldberg, & Cantú-Paz (1998) • Use a Bayesian network (BN) as a model. • Bayesian network • Acyclic directed graph. • Nodes are variables (string positions). • Conditional dependencies (edges). • Conditional independencies (implicit).

  30. Conditional Dependency Z Y X

  31. Bayesian Network (BN) • Explicit: Conditional dependencies. • Implicit: Conditional independencies. • Probability tables

  32. BOA Bayesian network New population Current population Selected population

  33. BOA Variation • Two steps • Learn a Bayesian network (for promising solutions) • Sample the built Bayesian network (to generate new candidate solutions) • Next • Brief look at the two steps in BOA

  34. Learning BNs • Two components: • Scoring metric (to evaluate models). • Search procedure (to find the best model).

  35. Learning BNs: Scoring Metrics • Bayesian metrics • Bayesian-Dirichlet with likelihood equivalence • Minimum description length metrics • Bayesian information criterion (BIC)

  36. Learning BNs: Search Procedure • Start with an empty network (like prob. vec.). • Execute primitive operator that improves the metric the most. • Until no more improvement possible. • Primitive operators • Edge addition • Edge removal • Edge reversal.

  37. Sampling BNs: PLS • Probabilistic logic sampling (PLS) • Two phases • Create ancestral ordering of variables:Each variable depends only on predecessors • Sample all variables in that order using CPTs:Repeat for each new candidate solution

  38. BOA Theory: Key Components • Primary target: Scalability • Population sizing N • How large populations for reliable solution? • Number of generations (iterations) G • How many iterations until convergence? • Overall complexity • O(N x G) • Overhead: Low-order polynomial in N, G, and n.

  39. BOA Theory: Population Sizing • Assumptions: n bits, subproblems of order k • Initial supply (Goldberg) • Have enough partial sols. to combine. • Decision making (Harik et al, 1997) • Decide well between competing partial sols. • Drift (Thierens, Goldberg, Pereira, 1998) • Don’t lose less salient stuff prematurely. • Model building (Pelikan et al., 2000, 2002) • Find a good model.

  40. BOA Theory: Num. of Generations • Two bounding cases • Uniform scaling • Subproblems converge in parallel • Onemax model (Muehlenbein & Schlierkamp-Voosen, 1993) • Exponential scaling • Subproblems converge sequentially • Domino convergence (Thierens, Goldberg, Pereira, 1998)

  41. Good News • Theory • Population sizing (Pelikan et al., 2000, 2002) • Initial supply. • Decision making. • Drift. • Model building. • Iterations until convergence (Pelikan et al., 2000, 2002) • Uniform scaling. • Exponential scaling. • BOA solves order-k decomposable problems in O(n1.55) to O(n2) evaluations! O(n) to O(n1.05) O(n0.5) to O(n)

  42. 500000 Experiment 450000 Theory 400000 350000 300000 250000 Number of Evaluations 200000 150000 100000 100 125 150 175 200 225 250 Problem Size Theory vs. Experiment (5-bit Traps)

  43. Additional Plus: Prior Knowledge • BOA need not know much about problem • Only set of solutions + measure (BBO). • BOA can use prior knowledge • High-quality partial or full solutions. • Likely or known interactions. • Previously learned structures. • Problem specific heuristics, search methods.

  44. From Single Level to Hierarchy • What if problem can’t be decomposed like this? • Inspiration from human problem solving. • Use hierarchical decomposition • Decompose problem on multiple levels. • Solutions from lower levels = basic building blocks for constructing solutions on the current level. • Bottom-up hierarchical problem solving.

  45. Hierarchical Decomposition Car Engine Braking system Electrical system Fuel system Valves Ignition system

  46. 3 Keys to Hierarchy Success • Proper decomposition • Must decompose problem on each level properly. • Chunking • Must represent & manipulate large order solutions. • Preservation of alternative solutions • Must preserve alternative partial solutions (chunks).

  47. Hierarchical BOA (hBOA) • Pelikan & Goldberg (2001) • Proper decomposition • Use BNs as BOA. • Chunking • Use local structures in BNs. • Preservation of alternative solutions • Restricted tournament replacement (niching).

  48. Local Structures in BNs • Look at one conditional dependency. • 2k probabilities for k parents. • Why not use more powerful representationsfor conditional probabilities? X1 X3 X2

  49. Local Structures in BNs • Look at one conditional dependency. • 2k probabilities for k parents. • Why not use more powerful representationsfor conditional probabilities? X2 X1 0 1 X3 15% X3 X2 0 1 44% 26%

  50. Restricted Tournament Replacement • Used in hBOA for niching. • Insert each new candidate solution x like this: • Pick random subset of original population. • Find solution y most similar to x in the subset. • Replace y by x if x is better than y.

More Related