1 / 44

Combining Genetics, Learning and Parenting

Combining Genetics, Learning and Parenting. Michael Berger. Based on: “When to Apply the Fifth Commandment: The Effects of Parenting on Genetic and Learning Agents” / Michael Berger and Jeffrey S. Rosenschein Submitted to AAMAS 2004. Abstract Problem. Hidden state

elinor
Télécharger la présentation

Combining Genetics, Learning and Parenting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining Genetics, Learning and Parenting Michael Berger Based on: “When to Apply the Fifth Commandment: The Effects of Parenting on Genetic and Learning Agents” /Michael Berger and Jeffrey S. Rosenschein Submitted to AAMAS 2004

  2. Abstract Problem • Hidden state • Metric defined over state space • Condition C1: When state changes, it is only to an “adjacent” state • Condition C2: State changes occur at a low, but positive rate

  3. The Environment • 2D Grid • Food patches • Probabilistic • Unlimited (reduces analytical complexity) • May move to adjacent squares (retains structure) 0.5 0.5 0.7 0.7 0.5 0.5 0.5 0.2 0.2 0.2 0.2 0.2 • Cyclic (reduces analytic complexity)

  4. Agent Definitions • Reward = Food Presence (0 or 1) • Perception = <Position, Food Presence> • Action  {NORTH, EAST, SOUTH, WEST, HALT} • Memory = <<Per, Ac>, …, <Per, Ac>, Per> • Memory length = |Mem| = No. of elements in memory • No. of possible memories =(2*<Grid Width>*<Grid Height>)|Mem| * 5|Mem|-1 • MAM - Memory-Action Mapper • Table • One entry for every possible memory • ASF - Action-Selection Filter

  5. Genetic Algorithm (I) • Algorithm on a complete population, not on a single agent • Requires introduction of generations • Every generation consists of a new group of agents • Each agent is created at the beginning of a generation, and is terminated at its end • Agent’s life cycle:Birth --> Run (foraging) --> Possible matings --> Death

  6. Genetic Algorithm (II) • Each agent carries a gene sequence • Each gene has a key (memory) and a value (action) • A given memory determines the resultant action • Gene sequence remains constant during the life-time of an agent • Gene sequence is determined at the mating stage of an agent’s parents

  7. Genetic Algorithm (III) • Mating consists of two stages: • Selection stage - Determining mating rights. Should be performed according to two principles: • Survival of the fittest (as indicated in performance during the life-time) • Preservation of genetic variance • Offspring creation stage: • One or more parents create one or more offspring • Offspring inherit some combination of parents’ gene sequence • Each of the stages has many variants

  8. Genetic Algorithm Variant • Selection: • Will be discussed later. • Offspring creation: • Two parents mate and create two offspring • Gene sequences of parents are aligned against one another, and then two processes occur: • Random crossover • Random mutation • Resultant pair of gene sequences are inherited by the offspring (one by each offspring).

  9. Genetic Inheritance Parent1 Parent2 K1, V1 K1, U1 K2, V2 K2, U2 K3, V3 K3, U3 K4, V4 K4, U4 K5, V5 K5, U5 K1, V1 K1, U1 K2, V2 K2, U2 Crossover Crossover K3, U3 K3, V3 K4, U4 K4, V4 Crossover Crossover K5, V5 K5, U5 K1, V1 K1, U1 Mutation K2, V2 K2, U2* Mutation K3, U3* K3, V3 K4, U4 K4, V4 Offspring2 Offspring1 K5, V5 K5, U5

  10. Genetic Agent • MAM: • Every entry is considered a gene • First column - Possible memory (key) • Second column - Action to take (value) • No changes after creation • Parameters: Memory length Crossover probability for each gene pair Mutation probability for each gene

  11. Learning Algorithm • Reinforcement Learning type algorithm: • After performing an action, agents receive a signal informing them how well their choice of action was (in this case, the reward) • Selected algorithm: Q-learning with Boltzmann exploration

  12. Basic Q-Learning (I) • Definitions: Discount factor (non-negative, less than 1) Reward at round j Rewards’ Discounted sum at round n • Q-learning attempts to maximize the expected rewards’ discounted sum of an agent as a function of any given memory at any round n

  13. Basic Q-Learning (II) • Q(s,a) - “Q-value”. The expected discounted sum of future rewards for an agent when its memory is s and it selects action a and follows an optimal policy thereafter. • Q(s,a) is updated after every time an agent selects action a when at memory s. After action execution, agent receives reward r and contains memory s’. Q(s,a) is updated as follows:

  14. Basic Q-Learning (III) • Q(s,a) values can be stored in different forms: • Neural network • Table (nicknamed a Q-table) • When saved as a Q-table, each row corresponds to a possible memory s, and each column to a possible action a. • When an agent contains memory s, it should simply select an action a with that maximizes Q(s,a) - WRONG !!! • Q(s,a) values can be stored in different forms: • Neural network • Table (nicknamed a Q-table) • When saved as a Q-table, each row corresponds to a possible memory s, and each column to a possible action a. • When an agent contains memory s, it should simply select an action a with that maximizes Q(s,a) - right ???

  15. Boltzmann Exploration (I) • Full exploitation of a Q-value might hide other, better Q-values • Exploration of Q-values needed, at least in early stages • Boltzmann exploration:The probability of selecting action ai:

  16. Boltzmann Exploration (II) • t - An annealing temperature • At round n: • t decreases ==> exploration decreases, exploitation increases • For a given s, the probability for selecting its best Q-value approaches 1 as n increases • Variant here uses a freezing temperature: Freezing temperature - when t is below it, exploration is replaced by full exploitation

  17. Learning Agent • MAM: • A Q-table (dynamic) • Parameters: Memory length Learning rate Rewards’ discount factor Temperature annealing function Freezing temperature

  18. Parenting Algorithm • No classical “parenting” algorithm around, this needs to be simulated • Selected algorithm: Monte-Carlo (another Reinforcement Learning type algorithm)

  19. Monte-Carlo (I) • Some similarity to Q-learning: • A table (nicknamed an “MC-table”) stores values (“MC-values”) that describe how good it is to take action a given memory s • Table dictates a policy of action-selection • Major differences from Q-learning: • Table isn’t modified after every round, but only after episodes of rounds (in our case, a generation) • Q-Value and MC-values have different meanings

  20. Monte-Carlo (II) • “Off-line” version of Monte-Carlo: • After completing an episode (generation) where one table has dictated the action-selection policy, a new, second table is constructed from scratch to evaluate how good any action a is for a given memory s • Second table will dictate policy in the next episode (generation) • Equivalent to considering the second table as being built during the current episode, as long as it isn’t used in the current episode

  21. Monte-Carlo (III) • MC(s,a) is defined as the average of all rewards received after memory s was encountered and action a was selected • What if (s,a) was encountered more than once? • “Every-visit” variant: • The average of all subsequent rewards is calculated for each occurrence of (s,a) • MC(s,a) is the average of all calculated averages

  22. Monte-Carlo (IV) • “Every-visit” variant more suitable than “first-visit” variant (where only the first encounter with (s,a) counts) • Environment can change a lot since the first encounter with (s,a) • Exploration variants not used here • For a given memory s, action a with the highest MC-value is selected • Full exploitation here because we have the experience of the previous episode of rounds

  23. Parenting Agent • MAM: • An MC-table (doesn’t matter if dynamic or static) • Dictates action-selection for offsprings only • ASF: • Selects between the actions suggested by both parents with equal chance • Parameters: Memory length

  24. Complex Agent (I) • Contains a genetic agent, a learning agent and a parenting agent in a subsumption architecture • Mating selection (debt from before) occurs among complex agents: • At a generation’s end, each agent’s average reward serves as its score • Agents receive mating rights according to scores “strata” (determined by scores’ average and standard deviation)

  25. Complex Agent (II) • Mediates between the inner agents and the environment • Perceptions passed directly to inner agents • Actions suggested by all inner agents passed through an ASF, which selects one of them • Parameters: ASF’s prob. to select genetic action ASF’s prob. to select learning action ASF’s prob. to select parenting action

  26. Complex Agent - Mating Complex (Previous Generation) Complex (Current Generation) Complex (Previous Generation) Genetic Genetic Genetic MAM MAM MAM Memory Memory Memory Learning Learning Learning MAM MAM MAM Memory Memory Memory ASF ASF ASF Parenting Parenting MAM Memory MAM Memory ASF ASF Parenting MAM Memory ASF ENVIRONMENT

  27. Complex Agent - Perception Complex (Previous Generation) Complex (Current Generation) Complex (Previous Generation) Genetic Genetic Genetic MAM MAM MAM Memory Memory Memory Learning Learning Learning MAM MAM MAM Memory Memory Memory ASF ASF ASF Parenting Parenting MAM Memory MAM Memory ASF ASF Parenting MAM Memory ASF ENVIRONMENT

  28. Complex Agent - Action Complex (Previous Generation) Complex (Current Generation) Complex (Previous Generation) Genetic Genetic Genetic MAM MAM MAM Memory Memory Memory Learning Learning Learning MAM MAM MAM Memory Memory Memory ASF ASF ASF Parenting Parenting MAM Memory MAM Memory ASF ASF PGen Parenting PLrn MAM Memory PPar ASF ENVIRONMENT

  29. Experiment (I) • Measures: • Eating-rate: average reward for a given agent (throughout its generation) • BER: Best Eating-Rate (in a generation) • Framework: • 20 agents in generation • 9500 generations • 30000 rounds per generation • Dependent variable: • Success measure (Lambda) - Average of the BERs in the last 1000 generations

  30. Experiment (II) 0.2 0.8 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 • Environment: • Grid: 20 x 20 • A single food patch, 5 x 5 in size

  31. Experiment (III) • Constant values: 1 0.02 0.005 1 0.2 0.95 5 * 0.999n 0.2 1

  32. Experiment (IV) • Independent variables: • Complex agent parameters:ASF probabilities (111 combinations) • Environment parameter:Probability that in a given round, the food patch moves in a random direction (0, 10-6, 10-5, 10-4, 10-3, 10-2, 10-1) “Movement Probability” • One run for each combination of values

  33. Results: Static Environment • Best combination: • Genetic-Parenting hybrid (PLrn = 0) • PGen > PPar • Pure genetics don’t perform well • GA converges slower if not assisted by learning or parenting • Pure parenting performs poorly • For a given PPar, success improves as PLrn decreases (Graph for movement prob. 0)

  34. Results: Low Dynamic Rate • Best combination: • Genetic-Learning-Parenting hybrid • PLrn > PGen + PPar • PPar >= PGen • Pure parenting performs poorly (Graph for movement prob. 10-4)

  35. Results: High Dynamic Rate • Best combination: • Pure learning(PGen = 0,Ppar = 0) • Pure parenting performs poorly • Parenting loses effectiveness: • Non-parenting agents have better success (Graph for movement prob. 10-2)

  36. Conclusions • Pure parenting doesn’t work • Agent algorithm A will be defined as an action-augmentor of agent algorithm B if: • A and B are always used for receiving perceptions • B is applied for executing an action in most steps • A is applied for executing an action in at least 50% of the other steps • In a static enviornment (C1 + ~C2), parenting helps when used as an action-augmentor for genetics • In slowly changing enviornments (C1 + C2), parenting helps when used as an action-augmentor for learning • In quickly changing enviroments (C1 only), parenting doesn’t work - pure learning is best

  37. Bibliography (I) • Genetic Algorithm: • R. Axelrod. The complexity of Cooperation: Agent-Based Models of Competition and Collaboration. Princeton University Press, 1997. • H.G. Cobb and J.J. Grefenstette. Genetic algorithms for tracking changing environments. In Proceedings of the Fifth International Conference on Genetic Algorithms, pages 523-530, San Mateo, 1993. • Q-Learning: • T.W. Sandholm and R.H. Crites. Multiagent reinforcement learning in the iterated prisoner’s dilemma. Biosystems, 37: 147-166, 1996. • Monte-Carlo methods, Q-Learning, Reinforcement Learning: • R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. The MIT Press, 1998.

  38. Bibliography (II) • Genetic-Learning combinations: • G. E. Hinton and S. J. Nowlan. How learning can guide evolution. In Adaptive Individuals in Evolving Populations: Models and Algorithms, pages 447-454. Addison-Wesley, 1996. • T.D. Johnston. Selective costs and benefits in the evolution of learning. In Adaptive Individuals in Evolving Populations: Models and Algorithms, pages 315-358. Addison-Wesley, 1996. • M. Littman. Simulations combining evolution and learning. In Adaptive Individuals in Evolving Populations: Models and Algorithms, pages 465-477. Addison-Wesley, 1996. • G. Mayley. Landscapes, learning costs and genetic assimilation. Evolutionary Computation, 4(3): 213-234, 1996.

  39. Bibliography (III) • Genetic-Learning combinations (cont.): • S. Nolfi, J.L. Elman and D. Parisi. Learning and evolution in neural networks. Adaptive Behavior, 3(1): 5-28, 1994. • S. Nolfi and D. Parisi. Learning to adapt to changing environments in evolving neural networks. Adaptive Behavior, 5(1): 75-98, 1997. • D. Parisi and S.Nolfi. The influence of learning on evolution. Models and Algorithms, pages 419-428. Addison-Wesley, 1996. • P.M. Todd and G.F. Miller. Exploring adaptive agency II: Simulating the evolution of associative learning. In From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior, pages 306-315, San Mateo, 1991.

  40. Bibliography (IV) • Exploitation vs. Exploration: • D. Carmel and S. Markovitch. Exploration strategies for model-based learning in multiagent systems.Autonomous Agents and Multi-agent Systems, 2(2): 141-172, 1999. • Subsumption architecture: • R.A. Brooks. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, 2(1): 14-23, March 1986.

  41. Backup - Qualitative Data

  42. Qual. Data: Mov. Prob. 0 Pure Parenting Pure Genetics Pure Learning Best: (0.7, 0, 0.3)

  43. Qual. Data: Mov. Prob. 10-4 Pure Parenting Pure Learning Best: (0.03, 0.9, 0.07)

  44. Qual. Data: Mov. Prob. 10-2 Pure Parenting (0.09, 0.9, 0.01) Best: Pure Learning

More Related