1 / 10

Abstract

Behavioural priors: Learning to search efficiently in action planning Aapo Hyvärinen Depts of Computer Science and Psychology University of Helsinki. Abstract. Prior knowledge important in perception What kinds of objects/scenes are typical/frequent?

Télécharger la présentation

Abstract

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Behavioural priors: Learning to search efficiently in action planningAapo HyvärinenDepts of Computer Science and PsychologyUniversity of Helsinki

  2. Abstract • Prior knowledge important in perception • What kinds of objects/scenes are typical/frequent? • Formalized as prior probabilities in Bayesian inference • We propose same needed in action planning • What kind of action sequences are typically good? • Number of possible action sequences large:computationally more efficient to constrain search those which are typically good

  3. Basic framework: Planning • Thoroughly investigated in classic AI • Agent is in state A and wants to state B:What sequence of actions is needed? • Agent assumed to have a world model • Exponential explosion in computation:for a actions and t time steps, at possibilities • Exhaustive search impossible

  4. Biological agents are different • A biological agent faces the same planning problems many times • Moving the same limbs • Navigation in the same environment • Manipulating similar objects, etc. • Good action sequences obey regularities, due to the physical structure of the world • “Good” means action sequences selected by careful, computationally intensive, planning

  5. Learning regularities aids in planning • Initially, agent considers whole search space • It can learn from information on which action sequences were good / typically executed • “Typical” and “good” are strongly correlated because only rather good sequences are executed • Search can then be constrained • Examples of regularities: • No point in moving limb back and forth • Many sequences contain detours • Skills: learning more regularities in particular task

  6. A prior model on good sequences • A probabilistic approach: build a model on the statistical structure of those sequences which were executed (= lead to goal, or close) • Use a model which can generate candidate sequences in future planning • E.g. A Markov model • After sufficient experience, search only using action sequences generated by the prior model

  7. Simulation 1 • Grid world, actions: “up”, “down”, “left”, “right” • Food randomly scattered • Initially, planning is random • Markov prior learned, used in later test period • Result 1: Markov model learns the “rules” • Do not go back and forth • Do not change direction too often • Result 2: Performance improved

  8. Simulation 2 • Using behavioural prior to improve model-free reinforcement learning • Same grid world, one goal • Value function incompletelylearned by Q-learning • After Q-learning, plan to find maximum of value function • Similar results as in Simulation 1 Value function previously learned

  9. Related and future work • A lot of work on chunking actions, “macro-actions”: A special form of priors • Options (Sutton et al): probabilistic interpretation also possible? • Case-based planning: different because new action considered also depends on world state • Behavioural priors in • model-free reinforcement learning? • motor control? (motor synergies)

  10. Conclusion • Perceptual priors are widely considered important for perception to work • We propose action planning needsbehavioural priors • Priors tell which action sequences are typically useful • Improves planning by constraining search • More simulations needed to verify utility

More Related