Créer une présentation
Télécharger la présentation

Télécharger la présentation
## BOA (Bayesian Optimization Algorithm)

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**BOA (Bayesian Optimization Algorithm)**for Dummies Hsuan Lee**References**• Martin Pelikan: Hierarchical Bayesian Optimization Algorithm, StudFuzz170, 31–48 (2005) //BOA • Martin Pelikan and D. E. Goldberg: Hierarchical Bayesian Optimization Algorithm, Studies in Computational Intelligence (SCI) 33, 63-90 (2006) //hBOA • Cooper, G. F. and Herskovits, E. H. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347. • Heckerman, D., Geiger, D., and Chickering, D. M. (1994). Learning Bayesian networks: The combination of knowledge and statistical data. Technical Report MSR-TR-94-09, Microsoft Research, Redmond, WA. • Friedman, N., and Goldszmidt, M. (1999). Learning Bayesian networks with local structure. In Jordan, M. I., (Ed.), Graphical models, pp. 421–459. MIT, Cambridge, MA Hsuan Lee @ NTUEE**Generating Offspring**EDA Mutate Crossover • Group Reproduction • Use a GROUP of fit chromosome to build a model. Sample the model to generate an offspring • Eg. DSMGA(?) for SpinGlass, BOA • Asexual Reproduction • Use ONE fit chromosome. Change slightly to form an offspring • Eg. ES • Sexual Reproduction • Use a PAIR of fit chromosome. Take parts of each to form an offspring • Eg. sGA, DSMGA Hsuan Lee @ NTUEE**Bayesian Optimization Algorithm**• Pseudo Code Bayesian Optimization Algorithm (BOA) t 0; generate initial population P(0); while (not done) { SELECT population of promising solution S(t); BUILD Bayesian network (BN) B(t) from S(t); SAMPLE B(t) to generate O(t); incorporate O(t) into P(t); //REPLACEMENT t t+1; } Hsuan Lee @ NTUEE**Bayesian Optimization Algorithm**• Initialization Until Termination Hsuan Lee @ NTUEE**Learning Bayesian Network**• Bayesian Network • A BN is a directed acyclic graph (DAG) • An edge on Bayesian Network ABimplies that the occurrence of A has an effect on the probability of B’s occurrence. A is a parent of B. B is conditionally dependent on A. • Two nodes are assumed to be conditionally independent if there is not an edge between them Hsuan Lee @ NTUEE**Learning Bayesian Network**• Bayesian Network Sprinkler Rain Wet Grass Hsuan Lee @ NTUEE**Learning Bayesian Network**• Learning Bayesian Network from data • Structure (B) To learn the structure of a BN, we need • A scoring metric (or a set of scoring metrics) on structures • A search procedure • Parameters (Θ,θ) • Given the structure of a BN, learning parameters is straight forward. • Maximum Likelihood (ML), Learning parameters is easy, but learning the best BN structure is NP-Complete Hsuan Lee @ NTUEE**Learning Bayesian Network**• Scoring Metrics: evaluations of a BN structure • Bayesian Metrics Determines the likelihood of a structure given the observed data and some prior knowledge Eg. Bayesian Dirichlet Metric (BD) • Minimum Description Length Metrics Evaluate the structure according to the number of bits required to store the model and the data compressed according to the model Eg. Bayesian Information Criterion We’ll come back to the scoring metrics later. Hsuan Lee @ NTUEE**Learning Bayesian Network**• The Search Procedure of a good Bayesian Network • It can be shown that finding the best Bayesian network isNP-Complete. But the best BN is not required in BOA, a good BN is enough. • Greedy Algorithm can be used to find a good BN Greedy Algorithm of network construction initialize the network B (an empty network or the network of the last generation) done false; while (not done) { O all simple graph operations applicable to B; IF there exists an operation in O that improves score(B) THEN op = operation from O that improves score(B) the most; apply op to B; ELSE done true; } return B; Hsuan Lee @ NTUEE**Learning Bayesian Network**• Simple Graph Operations of Bayesian Network • Edge Addition • Edge Removal • Edge Reversal Wet Road Rain Car Crash Radar Speed Hsuan Lee @ NTUEE**Learning Bayesian Network**• Learning Parameters Maximum Likelihood (ML) Wet Road Rain Car Crash Radar Speed Hsuan Lee @ NTUEE**Sampling Bayesian Network**• Generate Offspring with a Bayesian Network • Given a Bayesian network with structure & parameters • Perform a topology sort on the Bayesian network, which is a directed acyclic graph (DAG) • Assign values to the new chromosome bit by bit in the topological sorted order. according to the parameters. Wet Road Rain Car Crash Radar Speed Hsuan Lee @ NTUEE**Bayesian Optimization Algorithm**• Initialization Until Termination Hsuan Lee @ NTUEE**Scoring Metrics Revisited**• Minimum Description Length Metrics Evaluate the structure according to the number of bits required to store the model and the data compressed according to the model Bayesian Information Criterion B: Bayesian Structure H(A|B): Conditional Entropy of A given B N: population size Hsuan Lee @ NTUEE**Scoring Metrics Revisited**• Bayesian Metrics Determines the likelihood of a structure given the observed data and some prior knowledge Bayesian Dirichlet Metric (BD) B: Bayesian Structure D: Observed Data 𝜉: Prior Information Nijk: # of Observed Data that has value k on bit i with the parent string j N’ijk: prior knowledge Γ: Gamma Function Hsuan Lee @ NTUEE**Scoring Metrics Revisited**• Bayesian Dirichlet Metric (BD) In BOA, is set to 1 and . This reduced form of BD metric is called K2 metric. Physical meaning: all outcomes k of a given parental setup has the same probability at the beginning . The term can be set either to a constant or set to favor simpler structures. Hsuan Lee @ NTUEE**Scoring Metrics Revisited**• Decomposability of scoring metrics • In both metrics, the score of a structure only changes locally after performing a simple graph operation (by greedy search) • Only one particular term (one particular i) is changed in the entire metric Largely simplifies the computation of the greedy search Hsuan Lee @ NTUEE**Scoring Metrics Revisited**• Problems exist in both scoring metrics • In BIC, the term about model complexity confines the complexity of the Bayesian structure, resulting in over simplified structures • In BD, maximizing marginal probability leads to over-fitting, resulting in over complicated structures A combination of both can produce favorable results Hsuan Lee @ NTUEE**hBOA**Hierarchical Bayesian Optimization Algorithm**Hierarchical BOA (hBOA)**• The hierarchical version of BOA, used to solve nearly decomposable and hierarchical problems • Three important challenges must be considered for the design of solvers of difficult hierarchical problems • Decomposition Bayesian Network • Chunking Representing partial solutions at each level compactly to enable the algorithm to effectively process partial solutions for higher order. Using local structures • Diversity Maintenance RTR replacement Hsuan Lee @ NTUEE**Hierarchical BOA (hBOA)**• Local Structure Decision Tree, in hBOA • Full Table Hsuan Lee @ NTUEE**Hierarchical BOA (hBOA)**• Benefits of building local structure • Simplifies the model In the case shown, 8 parameters has to be maintained for full conditional probability model table, but only 4 for decision tree • Generalizes the parental condition In the case shown, with the full table setting, an occurrence of ABCX=1010 contributes nothing in predicting ABCX=1110 in the future; with the local structure 1010 DOES predict 1110 Hsuan Lee @ NTUEE**Hierarchical BOA (hBOA) //EDIT**• Scoring Metrics: evaluations of a local structure Bi • Bayesian Metrics In hBOA, is set to favor simpler models. Hsuan Lee @ NTUEE**Hierarchical BOA (hBOA) //EDIT**• Scoring Metrics: evaluations of a local structure Bi • Minimum Description Length Metrics Hsuan Lee @ NTUEE**Hierarchical BOA (hBOA)**• Search procedure for local structure (decision tree) Greedy Algorithm of local structure (decision tree) construction initialize the structure Bi (a one-node tree that represents all parental strings) // top-down Branch (Bi , Πi); return Bi; Branch (T, P) IF exists elements in P THEN choose π∈ Pthat best splits the decision tree T; Left Child = Branch (Tπ=1 , P- π); Right Child = Branch (Tπ=0 , P- π); // bottom-up IF the score given by Tπ=1 and Tπ=0is worse than T THEN merge Tπ=1 and Tπ=0 back into T; ELSE Left Child = Right Child = NIL; return T; Hsuan Lee @ NTUEE**Hierarchical BOA (hBOA)**• Search procedure for local structure (Decision Tree) demonstration • A=1 • A=0 • C=1 • C=0 • B=1 • B=0 • B=1 • B=0 • B=1 • B=0 • C=1 • C=0 • C=1 • C=0 Hsuan Lee @ NTUEE**Hierarchical BOA (hBOA)**• Modified network construction for hBOA Greedy Algorithm of network with local structure construction initialize the network B (an empty network or the network of the last generation) done false; while (not done) { O all simple graph operations applicable to B; optimize every structure in O with local structure; IF there exists an operation in O that improves score(B) THEN op = operation from O that improves score(B) the most; apply op to B; ELSE done true; } return B; Hsuan Lee @ NTUEE**Hierarchical BOA (hBOA)**• Sampling a Bayesian network with local structure • Topology sort • Assign values according to local structures, instead of full conditional probability tables Wet Road Rain Car Crash Radar Speed Hsuan Lee @ NTUEE**Some Thoughts about BOA/hBOA**• Use causal Bayesian network to solve an acausal problem • Are arrows really needed? • “Markovian” Optimization Algorithm, MOA? • Adopt the idea of Bayesian Dirichlet Metric. Wet road Rain Car Crash Radar Speed Hsuan Lee @ NTUEE