1 / 15

Evolving Mutation Operators for EA-Based Neural Networks in Reinforcement Learning

This work explores the implementation of evolutionary algorithms (EA) to optimize mutation operators for neural networks within reinforcement learning frameworks. Focusing on both online and offline learning, we demonstrate how adaptable mutation strategies can enhance efficiency in learning processes, particularly in complex environments with infinite states. Our techniques leverage feed-forward and recurrent neural networks to approximate value functions, combining supervised and unsupervised methods for improved performance. This research aims to pave the way for smarter AI solutions capable of faster, more effective learning.

palmer
Télécharger la présentation

Evolving Mutation Operators for EA-Based Neural Networks in Reinforcement Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mutation Operator Evolution for EA-Based Neural Networks By Ryan Meuth

  2. Environment Reward State Agent Action State ValueEstimate Action Policy Reinforcement Learning

  3. Reinforcement Learning • Good for On-Line learning where little is known about environment • Easy to Implement in Discrete Environments • Value estimate can be stored for each state • In infinite time, optimal policy guaranteed. • Hard to Implement in Continuous Environments • Infinite States! Must estimate Value Function. • Neural Networks Can be used for function approximation.

  4. Neural Network Overview • Feed Forward Neural Network • Based on biological theories of neuron operation

  5. Feed-Forward Neural Network

  6. Recurrent Neural Network

  7. Neural Network Overview • Traditionally used with Error Back-Propagation • BP uses Samples to Generalize to Problem • Few “Unsupervised” Learning Methods • Problems with No Samples: On-Line Learning • Conjugate Reinforcement Back Propagation

  8. EA-NN • Both Supervised and Unsupervised Learning Method. • Uses weight set as genome of individual • Fitness Function is Mean-Squared Error over target function. • Mutation Operator is a sample from a Gaussian Distribution. • Possible that mutation operator might not be best.

  9. Uh… Why? • Could improve EA-NN efficiency • Faster Online Learning • Revamped tool for Reinforcment Learning • Smarter Robots. • Why Use an EA? • Knowledge – Independent

  10. Experimental Implementation • First Tier – Genetic Programming • Individual is Parse-tree representing Mutation operator • Fitness is Inverse of sum of MSE’s from EA Testbed • Second Tier – EA Testbed • 4 EA’s, spanning 2 classes of problems • 2 Feed-Forward Non-Linear Approximations • 1 High-Order, 1 Low-Order • 2 Recurrent Time Series Predictions • 1 Will be Time-Delayed, 1 Not Time-Delayed

  11. GP Implementation • Functional Set: {+,-,*,/} • Terminal Set: • Weight to be Modified • Random Constant • Uniform Random Variable • Over-Selection: 80% of Parents from top 32% • Rank-Based Survival • Initialized by Grow Method (Max Depth of 8) • Fitness: 1000/(AvgMSE) – num_nodes • P(Recomb) = 0.5; P(Mutation) = 0.5; • Repair Function • 5 runs, 100 generations each. • Steady State: Population of 1000 individuals, 20 children per generation.

  12. EA-NN Implementation • Recombination: Multi-Point Crossover • Mutation: Provided by GP • Fitness: MSE over test function (minimize) • P(Recomb) = 0.5; P(Mutation) = 0.5; • Non-Generational: Population of 10 individuals, 10 children per generation • 50 Runs of 50 Generations.

  13. Results • This is where results would go. • Single Uniform Random Variable: ~380 • Observed Individuals: ~600 • Improvement! Just have to Wait and See…

  14. Conclusions • I don’t know anything yet.

  15. Questions? Thank You!

More Related