150 likes | 273 Vues
This work explores the implementation of evolutionary algorithms (EA) to optimize mutation operators for neural networks within reinforcement learning frameworks. Focusing on both online and offline learning, we demonstrate how adaptable mutation strategies can enhance efficiency in learning processes, particularly in complex environments with infinite states. Our techniques leverage feed-forward and recurrent neural networks to approximate value functions, combining supervised and unsupervised methods for improved performance. This research aims to pave the way for smarter AI solutions capable of faster, more effective learning.
E N D
Mutation Operator Evolution for EA-Based Neural Networks By Ryan Meuth
Environment Reward State Agent Action State ValueEstimate Action Policy Reinforcement Learning
Reinforcement Learning • Good for On-Line learning where little is known about environment • Easy to Implement in Discrete Environments • Value estimate can be stored for each state • In infinite time, optimal policy guaranteed. • Hard to Implement in Continuous Environments • Infinite States! Must estimate Value Function. • Neural Networks Can be used for function approximation.
Neural Network Overview • Feed Forward Neural Network • Based on biological theories of neuron operation
Neural Network Overview • Traditionally used with Error Back-Propagation • BP uses Samples to Generalize to Problem • Few “Unsupervised” Learning Methods • Problems with No Samples: On-Line Learning • Conjugate Reinforcement Back Propagation
EA-NN • Both Supervised and Unsupervised Learning Method. • Uses weight set as genome of individual • Fitness Function is Mean-Squared Error over target function. • Mutation Operator is a sample from a Gaussian Distribution. • Possible that mutation operator might not be best.
Uh… Why? • Could improve EA-NN efficiency • Faster Online Learning • Revamped tool for Reinforcment Learning • Smarter Robots. • Why Use an EA? • Knowledge – Independent
Experimental Implementation • First Tier – Genetic Programming • Individual is Parse-tree representing Mutation operator • Fitness is Inverse of sum of MSE’s from EA Testbed • Second Tier – EA Testbed • 4 EA’s, spanning 2 classes of problems • 2 Feed-Forward Non-Linear Approximations • 1 High-Order, 1 Low-Order • 2 Recurrent Time Series Predictions • 1 Will be Time-Delayed, 1 Not Time-Delayed
GP Implementation • Functional Set: {+,-,*,/} • Terminal Set: • Weight to be Modified • Random Constant • Uniform Random Variable • Over-Selection: 80% of Parents from top 32% • Rank-Based Survival • Initialized by Grow Method (Max Depth of 8) • Fitness: 1000/(AvgMSE) – num_nodes • P(Recomb) = 0.5; P(Mutation) = 0.5; • Repair Function • 5 runs, 100 generations each. • Steady State: Population of 1000 individuals, 20 children per generation.
EA-NN Implementation • Recombination: Multi-Point Crossover • Mutation: Provided by GP • Fitness: MSE over test function (minimize) • P(Recomb) = 0.5; P(Mutation) = 0.5; • Non-Generational: Population of 10 individuals, 10 children per generation • 50 Runs of 50 Generations.
Results • This is where results would go. • Single Uniform Random Variable: ~380 • Observed Individuals: ~600 • Improvement! Just have to Wait and See…
Conclusions • I don’t know anything yet.
Questions? Thank You!