1 / 17

Learning to Coordinate Behaviors

In this presentation, Pattie Maes and Rodney A. Brooks introduce a behavior-based system that learns to coordinate behaviors using positive and negative feedback. The distributed algorithm allows each behavior to learn when to become active, maximizing positive feedback and minimizing negative feedback.

myersm
Télécharger la présentation

Learning to Coordinate Behaviors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Coordinate Behaviors Pattie Maes & Rodney A. Brooks Presented by: Javier Martinez

  2. Introduction • Behavior-based system • Learning using positive and negative feedback • Behaviors decide when is time to activate • Distributed algorithm • Test the concept in a robot

  3. Motivation • Behavior control is a weak point initial Behavior-based systems • Behavior control has to be prewired • This approach doesn’t scale too well

  4. New Ideas • Behavior control is learned through experience • Learning algorithm completely distributed • Each behavior learns when to become active • The solution maximizes positive feedback and minimizes negative feedback

  5. The Learning Task What is needed: • Vector of binary perceptual conditions • Set of behaviors • Positive feedback generator • Negative feedback generator

  6. The Learning Task The task: • Change the precondition list from each behavior to maximize relevance and reliability

  7. The Learning Task Constraints: • Relevance: behavior correlated to positive feedback, not correlated with negative feedback • Reliability: behavior receives consistent feedback

  8. The Learning Task More constraints: • Algorithm should deal with noise, • Perform in real time, • Support readaptation

  9. The Learning Task Assumptions: • At least one combination of preconditions is bounded • Feedback is immediate • Only combinations of conditions can be learned

  10. Algorithm Measure: • Number of times a positive/negative feedback did/didn’t happen when a behavior was/wasn’t active • Calculate the correlation between positive/negative feedback and the status of the behavior

  11. Algorithm Measure: • Express relevance and reliability in terms of this correlation • Relevance controls whether a behavior should be active or not • Reliability decides whether the behavior should try to improve itself

  12. Algorithm Measure: • Improvement is done by monitoring a perceptual condition • If reliability increases, the behavior is added to the list of preconditions • Keep monitoring in a circle until reaching the threshold

  13. Genghis • Six-legged robot that walks forward • 12 behaviors, 6 conditions, 8742 nodes • 4 eight-bit microprocessors, 32 KB memory • The challenge is to learn how to coordinate the legs to produce a forward movement

  14. Results Convergence time • Non-intelligent search during the monitoring stage: 10 minutes • Intelligent search: 1min 45sec • A “tripod” gait emerged which is common among six-legged insects

  15. Conclusions • A learning algorithm was developed which allows a behavior-based robot to learn when its behaviors should become active using positive and negative feedback

  16. Comments • Impressive results • Global behavior (walking) emerges from coordinated Behaviors • Simple idea, powerful consequences. Robot learned how to walk, wasn’t taught

  17. Comments • Dead behaviors don’t revive. They might be useful in other situations • How to deal with concurrent actions? (i.e. walking and following a target)

More Related