1 / 36

Learning Agents

Learning Agents. Presented by: Huayan Gao ( huayan.gao@uconn.edu ), Thibaut Jahan ( thj@ifrance.com ), David Keil ( dmkeil@att.net ), Jian Lian ( lianjian@yahoo.com ). Students in CSE 333 Distributed Component Systems Prof. Steven A. Demurjian, Sr.

quant
Télécharger la présentation

Learning Agents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Agents • Presented by: Huayan Gao (huayan.gao@uconn.edu), Thibaut Jahan (thj@ifrance.com), David Keil (dmkeil@att.net), Jian Lian (lianjian@yahoo.com) Students in CSE 333 Distributed Component Systems Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

  2. Outline • Agents • Distributed computing agents • The JADE platform • Reinforcement learning • UML design of agents • The maze problem • Conclusion and future work

  3. Agents • Autonomy • goal-orientedness • collaboration • flexibility • ability to be self-starting • temporal continuity • character • adaptiveness • mobility • capacity to learn. Some general features characterizing agents:

  4. Classification of agents • Interface AgentsAI techniques to provide assistance to the user • Mobile agentscapable of moving around networks gathering information • Co-operative agentscommunicate with, and react to, other agents in a multi-agent systems within a common environment • Reactive agents“reacts” to a stimulus or input that is governed by some state or event in its environment

  5. Distributed Computing Agents • Common learning goal (strong sense) • Separate goals but information sharing (weak sense)

  6. The JADE Platform • Java Agent Development Environment-Java Software framework- Middleware platform- Simplifies implementation and deployment of MAS • Services Provided- AMS (Agent Management System)registration, directory and management- DF (Directory Facilitator)yellow pages service- ACC (Agent Communication Channel)message passing service within the platform (including remote agents)

  7. JADE Platforms for distributed agents

  8. Agents and Markov processes Agent type Deterministic Stochastic Accessible Reflex Solves MDPs Inaccessible Policy-based Solves non-Markov POMDPs* *Partially observable Markov decision problems Environment type

  9. Learning from the environment • Environment, especially a distributed one, may be complex, may change • Necessity to learn dynamically, without supervision • Reinforcement learning - used in adaptive systems - involves finding a policy • Q-learning, a special case of RL - compute Q-values into Q-table - finds optimal policy

  10. Policy search • Policy: a mapping from states to actions • Policy is as opposed to action sequence • Agents that precompute action sequences cannot respond to new sensory information • Agent that follows a policy incorporates sensory information about state into action determination

  11. Components of a learner • In learning, percepts may help improve agent’s future success in interaction • Components:- Learning element (improves policy)- Performance element (executes policy)- Critic: Applies fixed performance measure- Problem generator: Suggests experimental actions that will provide information to learning element

  12. A learning agent and its environment

  13. Temporal difference learning • Uses observed transitions and differences between utilities of successive states to adjust utility estimates • Update rule based on transition from state i to j:U(i) U(i) + (R(i) + U(j) U(i))where- U is estimated utility,- R is reward-  is learning rate

  14. Q-learning • Q-learning: a variant of reinforcement learning in which the agent incrementally computes a table of expected aggregate future rewards • Agent modifies the values in the table to refine its estimates. • Using the temporal-difference learning approach, update formula is calculated after the learner goes from state i to state j:Q(a, i) Q (a, i) + (R(i) + maxaQ(a, j) -Q (a, i))

  15. Q-values • Definition: Q-values are values Q(a, i) of expected utility associated with a given action in a given state • Utility of state:U(i) = maxaQ(a, i) • Q-values permit decision making without a transition model • Q-values are directly learnable from reward percepts

  16. UML design of agents • Standard UML did not provide a complete solution for depicting the design of multi-agent systems. • Multi-agent systems being actors and software, their design does not follow typical UML design • Goals, complex strategies, knowledge, etc. are often missed

  17. Reactive use cases

  18. A maze problem • Simple example consisting of a maze for which the learner must find a policy, where the reward is determined by eventually reaching or not reaching a goal location in the maze. • Original problem definition may be modified by permitting multiple distributed agents that communicate, either directly or via the environment

  19. Cat and Mouse problem • Example of reinforcement learning • The rules of the Cat and Mouse game are: - Cat catches mouse;- Mouse escapes cat;- Mouse catches cheese;- Game is over when the cat catches the mouse. • Source: T. Eden, A. Knittel, R. van Uffelen. Reinforcement learning. www.cse.unsw.edu.au/~aek/catmouse • Our project included modifying existing Java code to enable remote deployment of learning agents and to begin exploration of a multiagent version

  20. Cat-Mouse GUI

  21. Use cases in the Cat-Mouse problem

  22. Classes for the Cat-Mouse problem

  23. Sequence diagram

  24. Maze creation and registration

  25. Cat creation and registration

  26. JADE Cat look up maze from AMS and DF service

  27. JADE Mouse Agent Creating and Registration

  28. Mouse Agent joins game

  29. Game begins Game begins and Maze (master) and Mouse agents exchange information by ACL messages

  30. Remote deployment of learning agents • Using JADE, we can deploy maze, mouse, and cat agents: Jademaze maze1 Jademouse mouse1 Jadecat cat1 • Jademaze, jademouse, jadecat are batch file names to deploy maze and cat agents. If we want to create them from a remote PC, we will use the following commands: Jademaze –host hostname mazename; Jademaze –host hostname catname; Jademaze –host hostname mousename;

  31. Cat-Mouse in JADE • JADE allows services to be hosted and discovered in a distributed dynamic environment. • On top of those “basic” services, mouse/cat agents can conceive maze/mouse/cat services provided and join/quit from the maze server they discovered from DF service.

  32. Innovation • A backbone for a core platform encouraging other agents to connect and join • Access to ontologies and service description to move towards interoperability at the service level • A baseline set of deployed agent services that can be used as building blocks by application developers to create innovative value added services • A practical test for a learning agent system complying with FIPA standards.

  33. Deployment Scenario • Infrastructure Deployment - Enable their agents to interact with service agents developed by others - Test applications in a realistic, distributed, open environment • Agent and Service Deployment - FIPA ACL messages to exchange information- Standard FIPA ACL compatible content languages- FIPA defined agent management services| (directories, communication and naming).

  34. Conclusions • Demonstration of a feasible research approach exploring the relationship between reinforcement learning and deployment of component-based distributed agents • Communication between agents • Issues with the space complexity of Q-learning:where n = grid size, m = # mice, c = # cats, space complexity is 64n2(m+c+1) 1 mouse + 1 cat => 481Mb of memory storage for Q-Table

  35. Future work • Learning in environments that change in response to the learning agent • Communication among learning agents; multiagent learning • Overcoming problems of table size under multiagent conditions • Security in message-passing

  36. Partial list of references • S. Flake, C. Geiger, J. Kuster. Towards UML-based analysis and design of multi-agent systems. ENAIS’2001. • T. Mitchell. Machine learning. McGraw-Hill, 1997. • A. Printista, M. Errecalde, C. Montoya. A parallel implementation of Q-Learning based on communication with cache. http://journal.info.unlp.edu.ar/journal6/papers/ p4.pdf. • S. Russell, P. Norvig. Artificial intelligence: A modern approach. Prentice Hall, 1995. • S. Sen, G. Weiss. Learning in multiagent systems. In G. Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999. • R. Sutton, A. Barto. Reinforcement learning: An introduction. MIT Press, 1998. • K. Sycara, A. Pannu, M. Williamson, D. Zeng, K. Decker.Distributed intelligent agents. IEEE Expert, 12/96.

More Related