machine learning and ilp for multi agent systems n.
Skip this Video
Loading SlideShow in 5 Seconds..
Machine Learning and ILP for Multi-Agent Systems PowerPoint Presentation
Download Presentation
Machine Learning and ILP for Multi-Agent Systems

Machine Learning and ILP for Multi-Agent Systems

162 Views Download Presentation
Download Presentation

Machine Learning and ILP for Multi-Agent Systems

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Machine Learning and ILP for Multi-Agent Systems Daniel Kudenko & Dimitar Kazakov Department of Computer Science University of York, UK

  2. Why Learning Agents? • Agent designers are not able to foresee all situations that the agent will encounter. • To display full autonomy Agents need to learn from and adapt to novel environments. • Learning is a crucial part of intelligence.

  3. A Brief History Disembodied ML Single-Agent Learning Machine Learning Multiple Single-Agent Learners Social Multi-Agent Learners Social Multi-Agent System Multiple Single-Agent System Agents Single-Agent System

  4. Outline • Principles of Machine Learning (ML) • ML for Single Agents • ML for Multi-Agent Systems • Inductive Logic Programming for Agents

  5. What is Machine Learning? • Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. [Mitchell 97] • Example: T = “play tennis”, E = “playing matches”, P = “score”

  6. Types of Learning • Inductive Learning (Supervised Learning) • Reinforcement Learning • Discovery (Unsupervised Learning)

  7. Inductive Learning [An inductive learning] system aims at determining a description of a given concept from a set of concept examples provided by the teacher and from background knowledge. [Michalski et al. 98]

  8. Inductive Learning Examples of Category C1 Examples of Category C2 Examples of Category Cn Inductive Learning System Hypothesis (Procedure to Classify New Examples)

  9. Inductive Learning Example Ammo: low Monster: near Light: good Category:shoot Ammo: low Monster: far Light: medium Category: ¬shoot Ammo: high Monster: far Light: good Category:shoot Inductive Learning System If (Ammo = high) and (light {medium, good}) then shoot; ………..

  10. Performance Measure • Classification accuracy on unseen test set. • Alternatively: measure that incorporates cost of false-positives and false-negatives (e.g. recall/precision).

  11. Where’s the knowledge? • Example (or Object) language • Hypothesis (or Concept) language • Learning bias • Background knowledge

  12. Example Language • Feature-value vectors, logic programs. • Which features are used to represent examples (e.g., ammunition left)? • For agents: which features of the environment are fed to the agent (or the learning module)? • Constructive Induction: automatic feature selection, construction, and generation.

  13. Hypothesis Language • Decision trees, neural networks, logic programs, … • Further restrictions may be imposed, e.g., depth of decision trees, form of clauses. • Choice of hypothesis language influences choice of learning methods and vice versa.

  14. Learning bias • Preference relation between legal hypotheses. • Accuracy on training set. • Hypothesis with zero error on training data is not necessarily the best (noise!). • Occam’s razor: the simpler hypothesis is the better one.

  15. Inductive Learning • No “real” learning without language or learning bias. • IL is search through space of hypotheses guided by bias. • Quality of hypothesis depends on proper distribution of training examples.

  16. Inductive Learning for Agents • What is the target concept (i.e., categories)? • Example: do(a), ¬do(a) for specific action a. • Real-valued categories/actions can be discretized. • Where does the training data come from and what form does it take?

  17. Batch vs Incremental Learning • Batch Learning: collect a set of training examples and compute hypothesis. • Incremental Learning: update hypothesis with each new training example. • Incremental learning more suited for agents.

  18. Batch Learning for Agents • When should (re-)computation of hypothesis take place? • Example: after experienced accuracy of hypothesis drops below threshold. • Which training examples should be used? • Example: sequences of actions that led to success.

  19. Eager vs. Lazy learning • Eager learning: commit to hypothesis computed after training. • Lazy learning: store all encountered examples and perform classification based on this database (e.g. nearest neighbour).

  20. Active Learning • Learner decides which training data to receive (i.e. generates training examples and uses oracle to classify them). • Closed Loop ML: learner suggests hypothesis and verifies it experimentally. If hypothesis is rejected, the collected data gives rise to a new hypothesis.

  21. Black-Box vs. White-Box • Black-Box Learning: Interpretation of the learning result is unclear to a user. • White-Box Learning: Creates (symbolic) structures that are comprehensible.

  22. Reinforcement Learning • Agent learns from environmental feedback indicating the benefit of states. • No explicit teacher required. • Learning target: optimal policy (i.e., state-action mapping) • Optimality measure: e.g., cumulative discounted reward.

  23. Q Learning Value of a state: discounted cumulative reward V(st) = i  0ir(st+i,at+i) 0   < 1 is a discount factor ( = 0 means that only immediate reward is considered). r(st+i ,at+i) is the reward determined by performing actions specified by policy . Q(s,a) = r(s,a) + V*((s,a)) Optimal Policy: *(s) = argmaxa Q(s,a)

  24. Q Learning Initialize all Q(s,a) to 0 In some state s choose some action a. Let s’ be the resulting state. Update Q: Q(s,a) = r +  maxa’ Q(s’,a’)

  25. Q Learning • Guaranteed convergence towards optimum (state-action pairs have to be visited infinitely often). • Exploration strategy can speed up convergence. • Basic Q Learning does not generalize: replace state-action table with function approximation (e.g. neural net) in order to handle unseen states.

  26. Pros and Cons of RL • Clearly suited to agents acting and exploring an environment. • Simple. • Engineering of suitable reward function may be tricky. • May take a long time to converge. • Learning result may be not transparent (depending on representation of Q function).

  27. Combination of IL and RL • Relational reinforcement learning [Dzeroski et al. 98]: leads to more general Q function representation that may still be applicable even if the goals or environment change. • Explanation-based learning and RL [Dietterich and Flann, 95]. • More ILP and RL: see later.

  28. Unsupervised Learning • Acquisition of “useful” or “interesting” patterns in input data. • Usefulness and interestingness are based on agent’s internal bias. • Agent does not receive any external feedback. • Discovered concepts are expected to improve agent performance on future tasks.

  29. Learning and Verification • Need to guarantee agent safety. • Pre-deployment verification for non-learning agents. • What to do with learning agents?

  30. Learning and Verification[Gordon ’00] • Verification after each self-modification step. • Problem: Time-consuming. • Solution 1: use property-preserving learning operators. • Solution 2: use learning operators which permit quick (partial) re-verification.

  31. Learning and Verification What to do if verification fails? • Repair (multi)-agent plan. • Choose different learning operator.

  32. Learning in Multi-Agent Systems • Classification • Social Awareness. • Communication • Role Learning. • Distributed Learning.

  33. Types of Multi-Agent Learning[Weiss & Dillenbourg 99] • Multiplied Learning: No interference in the learning process by other agents (except for exchange of training data or outputs). • Divided Learning: Division of learning task on functional level. • Interacting Learning: cooperation beyond the pure exchange of data.

  34. Social Awareness • Awareness of existence of other agents and (eventually) knowledge about their behavior. • Not necessary to achieve near optimal MAS behavior: rock sample collection [Steels 89]. • Can it degrade performance?

  35. Levels of Social Awareness [Vidal&Durfee 97] • 0-level agent: no knowledge about existence of other agents. • 1-level agent: recognizes that other agents exist, model other agents as 0-level. • 2-level agent: has some knowledge about behavior of other agents and their behavior; model other agents as 1-level agents. • k-level agent: model other agents as (k-1)-level.

  36. Social Awareness and Q Learning • 0-level agents already learn implicitly about other agents. • [Mundhe and Sen, 00]: study of two Q learning agents up to level 2. • Two 1-level agents display slowest and least effective learning (worse than two 0-level agents).

  37. Agent models and Q Learning • Q: S  An R, where n is the number of agents. • If other agent’s actions are not observable, need assumption for actions of other agents. • Pessimistic assumption: given an agent’s action choice other agents will minimize reward. • Optimistic assumption: other agents will maximize reward.

  38. Agent Models and Q Learning • Pessimistic Assumption leads to overly cautious behavior. • Optimistic Assumption guarantees convergence towards optimum [Lauer & Riedmiller ‘00]. • If knowledge of other agent’s behavior available, Q value update can be based on probabilistic computation [Claus and Boutilier ‘98]. But: no guarantee of optimality.

  39. Q Learning and Communication[Tan 93] Types of communication: • Sharing sensation • Sharing or merging policies • Sharing episodes Results: • Communication generally helps • Extra sensory information may hurt

  40. Role Learning • Often useful for agents to specialize in specific roles for joint tasks. • Pre-defined roles: reduce flexibility, often not easy to define optimal distribution, may be expensive. • How to learn roles? • [Prasad et al. 96]: learn optimal distribution of pre-defined roles.

  41. Q Learning of roles • [Crites&Barto 98]: elevator domain; regular Q learning; no specialization achieved (but highly efficient behavior). • [Ono&Fukumoto 96]: Hunter-Prey domain, specialization achieved with greatest mass merging strategy.

  42. Q Learning of Roles [Balch 99] • Three types of reward function: local performance-based, local shaped, global. • Global reward supports specialization. • Local reward supports emergence of homogeneous behaviors. • Some domains benefit from learning team heterogeneity (e.g., robotic soccer), others do not (e.g., multi-robot foraging). • Heterogeneity measure: social entropy.

  43. Distributed Learning • Motivation: Agents learning a global hypothesis from local observations. • Application of MAS techniques to (inductive) learning. • Applications: Distributed Data Mining [Provost & Kolluri ‘99], Robotic Soccer.

  44. Distributed Data Mining • [Provost& Hennessy 96]: Individual learners see only subset of all training examples and compute a set of local rules based on these. • Local rules are evaluated by other learners based on their data. • Only rules with good evaluation are carried over to the global hypothesis.

  45. BREAK

  46. Machine Learning and ILP for MAS: Part II Integration of ML and Agents ILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection and Language

  47. Machine Learning and ILP for MAS: Part II Integration of ML and Agents ILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection and Language

  48. From Machine Learning to Learning Agents Machine Learning: Learning as the only goal Classic Machine Learning Active Learning Closed Loop Machine Learning Learning as one of many goals: Learning Agent(s)

  49. Integrating Machine Learning into the Agent Architecture • Time constraints on learning • Synchronisation between agents’ actions • Learning and Recall

  50. Time Constraints on Learning • Machine Learning alone: • predictive accuracy matters, time doesn’t (just a price to pay) • ML in Agents • Soft deadlines: resources must be shared with other activities (perception, planning, control) • Hard deadlines: imposed by environment: Make up your mind now! (or they’ll eat you)