1 / 63

Student presentations (starting april 13th)

Student presentations (starting april 13th). Papers to select from. Coalitional Games in Open Anonymous Environments, by M. Yokoo, V. Conitzer, T. Sandholm, N. Ohta, and A. Iwasaki. In Proc. AAAI 2005.

orrick
Télécharger la présentation

Student presentations (starting april 13th)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Student presentations (starting april 13th)

  2. Papers to select from • Coalitional Games in Open Anonymous Environments, by M. Yokoo, V. Conitzer, T. Sandholm, N. Ohta, and A. Iwasaki. In Proc. AAAI 2005. • A polynomial-time Nash equilibrium algorithm for repeated games, by M. L. Littman and P. Stone. In Proc. 2003 ACM Conference on Electronic Commerce (EC'03). • Communication Complexity as a Lower Bound for Learning in Games, by V. Conitzer and T. Sandholm. In Proc. ICML 2004. • Coordination in Multiagent Reinforcement Learning: A Bayesian Approach, by G. Chalkiadakis and C. Boutilier. In Proc. AAMAS 2003. • Distributed Implementations of Vickrey-Clarke-Groves Mechanisms, by D. C. Parkes and J. Shneidman. In AAMAS 2004. • If Multi-Agent Learning is the Answer, What is the Question? , by Y. Shoham, R. Powers and T. Grenager. In JAI 2006. • Envy-Free Auctions for Digital Goods. by A.V. Goldberg and J.D. Hartline. In Proc. 2003 ACM Conference on Electronic Commerce (EC'03). • Distributed Perception Networks: An Architecture for Information Fusion Systems Based on Causal Probabilistic Models. G.Pavlin, P. de Oude, M. Maris, T. Hood. In proc. Int. conf. on multisensor fusion and integration for intelligent systems. Heidelberg, 2006. • Improvement Continuous Valued Q-learning and its Application to Vision Guided Behavior Acquisition. Y. Takahashi, M. Takeda, and M. Asada. In proc. fourth int. workshop on Robocup, 2000. • Using the Max-Plus Algorithm for Multiagent Decision Making in Coordination Graphs. J. Kok and N. Vlassis. Robocup 2005 symposium (best paper award).

  3. What to do? • 2 persons per paper • Presentation length: 30 min (+ 15 min discussion) • Not twice the same paper

  4. Multiagent reinforcement learning

  5. Multiagent reinforcement learning • We assume that each state sS is fully observable to all agents. • Each sS defines a local strategic game Gs with corresponding payoffs. • We also assume a stochastic transition model p(s’|s, a), where a is the joint action of the agents. • The task is to compute an optimal joint policy *(s) = i*(s) that maximizes discounted future reward. • In cooperative agents the challenge is to guarantee that the individual optimal policies *i(s) are coordinated.

  6. Independent learning • One approach is to let each agent run Q-learning independently of the others. • In this case the other agents are treated as part of a dynamic environment and are not explicitly modeled. • Problem is that p(s’|s, ai) is in this case nonstationary (changes with time) because the other agents are also learning. • Convergence of Q-learning cannot be guaranteed anymore. • However the method has been used in practice with reported success.

  7. Joint action learning • Better results can be obtained if the agents attempt to model each other. • Each agent maintains an action value function Q(i)(s, a) for all states and joint action pairs (i denotes agent i). • In this case Q-learning becomes: Q(i)(s, a) := (1-)Q(i)(s,a) + [R +  maxa’ Q(i) (s’,a’)] • Issues to consider here: - Representation: how to represent Q(i)(s, a). - Optimization: how to compute maxa’ Q(i)(s’, a’). - Exploration: how to choose exploration actions a.

  8. Representing Q(i)(s, a) • The simplest choice is to use a tabular representation: Q(i)(s, a) is a matrix with as many entries as the pairs of states sS and joint actions axiAi. • Computing maxa’ Q(s’, a’) involves just a for-loop. • Alternatively, if many agents are involved, a coordination graph can be used. In this case we assume Q(s, a) = PjjQj(s, aj) where aj is the joint action of a subset of agents. • In this case maxa’ Q(s’, a’) can be computed with variable elimination.

  9. Exploration in multiagent RL • We assume for simplicity that all agents receive exactly the same reward. • Then each agent can select an exploratory joint action a according to a Boltzmann distribution over joint actions. • This requires that each agent samples the same joint action! • Each agent runs Q-learning over joint actions identically and in parallel. • In this case, the whole multiagent system is effectively treated as a `big' single agent.

  10. Bayesian Networks

  11. Example Bayesian network Directed Acyclic Graph (DAG) with nodes {A,B,C,D,E} Causal network

  12. Important relations Range of probability P(A): 0  P(A)  1 Sum rule: If mutually exclusive: Logic equivalent: P(A  B) = P(B A) Product rule:

  13. Conditional probability Given event b, the probability of a equals x Or: P(a|b) = x The jointprobability van a b equals: P(a,b) = P(a|b) P(b)

  14. Bayes’ rule 1. P(a,b) = P(a|b) P(b) Similarly: 2. P(a,b) = P(b|a) P(a) Combining 1. and 2. results in Bayes rule: P(a|b) P(b) = P(b|a) P(a) Or:

  15. Independence and joint probability Somebody is White and Male Suppose P(White) = 0.5 en P(Male) = 0.4 Suppose also that P(White) and P(Male) are independent, Then it holds that: P(White|Male) = P(White) And the joint probability: P(White  Male) = P(White|Male) *P(Male) = P(White) *P(Male) = 0.5*0.4 = 0.2 1 Somebody is Tall and Male Suppose P(Tall) = 0.5 and P(Male) = 0.4 Suppose also that P(Tall) and P(Male) are dependent and that the conditional probability of Tall given Male equals P(Tall | Male) = 0.8 Now is the joint probability: P(Tall  Male) = P(Tall|Male) * P(Male) = 0.8*0.4 = 0.32 2

  16. It holds that: P(White  Male) = P(Male  White) P(White | Male) * P(Male) = P(Male | White) * P (White) 0.5 * 0.4 = P(Male | White) * 0.5 So: P(Male | White) = 0.4 1 It holds that: P(Tall  Male) = P(Male  Tall) P(Tall | Male) * P(Male) = P(Male | Tall) * P (Tall) 0.8 * 0.4 = P(Male | Tall) * 0.5 So: P(Male | Tall) = 0.32 / 0.5 = 0.64 2

  17. Symmetrie, the “inverse fallacy” So don’t confuse the probability that someone is Tall given Male P(Tall|Male) with the probability that someone is Male given Tall P(Make|Tall) .

  18. Definition of a Bayesian Network (BN) • A Bayesian network consist of: • A set variables and a set directed connections between the variables • Every variable has a finite number of states • The variables form a directed acyclic graph (DAG) • Each variable A met parents B1, …,Bn has a conditional probability table P(A | B1, ….,Bn)

  19. Variables and states The nodes are variables with states. The states are assigned probability values. The collection of probability values for all states is called a probability distribution of that variable If A is a variable with states {a1, a2 … an}, then P(A) is the probability distribution over these states: P(A) = {P(a1), P(a2) … P(an)} : SP(ai) = 1 i

  20. Example Conditional probabilities: A a1,a2 P(B|A): B b1,b2,b3 Sum of rows = 1

  21. Compute the joint probability from the conditional probabilities Given the probability distribution of A: P(A) = (0.4, 0.6) P(B|A): It holds that: P(bi, aj) = P(bi|aj)P(aj) And so: P(B,A):

  22. Calculate P(B) from P(B,A) P(B) = SP(B,A) A P(bi) = SP(bi,aj) j And so: P(B) = (0.52, 0.18, 0.3) This process is called Marginalisation

  23. Calculate P(A|B)

  24. Evidence There are two types of evidence: Hard evidence (instantiation). It is known that a node X is for sure in a particular state. Example: a soccer match can be in three states {win, loose, draw}. After the game ended, the state is known. Soft evidence. For a node X is known an indication that enables increasing the probability of a certain state. Example. If after the first half of a match, one team leads with 3 to 0, the probability of that teams’ win state can be increased.

  25. Other example S = stiff neck, M = meningtitis What is the probability that somebody has meningtitis, given a stiff neck ? a-priori: P(S) = 0.05 P(M) = 0.0002 P(S|M) = 0.9 (i.o.w. if somebody has meningtitis, there is a high probability of having a stiff neck) (so, if someone has a stiff neck, there is a small probability that he has meningtitis)

  26. Visit to Asia ASIA

  27. Types of connections • In Bayesian networks, there are three types of connections: • Serial • Converging • Diverging

  28. Serial connections Evidence in node A influences both nodes B and C If there is evidence in node A, evidence in node B has no influence on node C Example

  29. Converging connection

  30. Convergerend One of the parents is known. This doesn’t influence the other parent node (“hoofdpijn”) One of the parents is known and the child node “antwoord” is also known. In this case, a parent influences the other parent node (“hoofdpijn”). So, if no evidence is available about the child node, then the parent nodes are independent, othetwise not. It is said the parent nodes are conditionally dependent of child nodes. This is called d-separation. Example

  31. Diverging connection

  32. Diverging Child node “geleerd” does not influence the child node “hoofdpijn” through parent node “antwoord” if this is known (d-separated). (If “antwoord” is not known, then node “geleerd” influences node “hoofdpijn”). The parent node “antwoord” influences The both child nodes “geleerd” and “hoofdpijn” Example

  33. D-separation • Two nodes B en C in a Bayesians network are d-separated if for all • Paths between B and C, there exists a node A in between for which holds: • The connection is serial or diverging and the state of A is known; • The connection is converging and the state of A is not known.

  34. Why d-separation is important If we know that two variables are d-separated, they can be treated as independent (at that moments that evidence is available) and we don’t have to compute or use conditional probabilities. So it speeds up computation. I can also be used for modeling by creating models that exploit such causal relations, for example for the sake of distributedness.

  35. Example: learned for an exam Problem: A student has to learn for an exam. What is the probability that he gives the correct answer? Solution: We start with the probability that the student has learned the material of 0.5. So P(L=true) = 0.5 The student can give a correct or a wrong answer A. So P(A=correct) of kortweg P(A). This probability is conditionally dependent of whether the student has learned the material.

  36. The Bayesian network P(A | L) In Netica (www.norsys.com) The a-priori probabilities for P(A) given Learned are: P(A=correct | L=true) = 0.9 P(A=wrong | L=true) = 0.1 P(A=correct | L=false) = 0.4 P(A=wrong | L=false) = 0.6

  37. Computation P(A=correct) The probability P(A=correct) is calculated as: P(A=correct) = P(A=correct|L=true). P(L=true) +P(A=correct|L=false). P(L=false) = 0.9*0.5 + 0.4*0.5 = 0.65

  38. Influence of evidence Suppose that the student takes the exam and produces a correct answer. This is information that we can feed as evidence into the network. With this evidence, what is the probability that the student has learned the material, so what is P(L=true|A=correct) or P(L|A)?

  39. Evidence P(A | L) P(L) = 0.5 P(A) = 0.65 P(A | L) = 0.9 And so: P(L | A) = 0.9 * 0.5 / 0.65 = 0.692

  40. Example, more nodes Suppose that we want to model the influence of a headache H on the results. Then we can make a (converging) network:

  41. Calculation P(A=correct) • The probability of a correct answer is the sum of 4 possibilities, • Multiplied with the conditional probabilty on a correct answer: • So: P(A=correct) = {P(A=correct|L,H) . P(L) . P(H)} • P(A=correct|L=true,H=true). P(L=true).P(H=true) = 0.3*0.5*0.2 = 0.03 + • P(A=correct|L=true,H=false). P(L=true).P(H=false) = 0.9*0.5*0.8 = 0.36 + • P(A=correct|L=false,H=true). P(L=false).P(H=true) = 0.05*0.5*0.2 = 0.005 + • P(A=correct|L=false,H=false). P(L=false).P(H=false) = 0.4*0.5*0.8 = 0.16 • Total = 0.555

  42. Explaining away From the evidence that the car does not start, the network calculates that most likely the startmotor is broken (64.4% against 21.7% for the battery being broken). However, from new evidence that the lights neither work, the network now calculates that the probability that the battery is broken is highest. Comparing variables in this manner and drawing conclusions is called “explaining away”.

  43. BNs are populair because… • They can model events using semantics • Modeling is done with a graphical representation. This makes interaction between the domain expert (who makes the model) and the engineer (who makes computation possible) very fruitful. Furthermore, expertise from experts is easily understandable in this manner. • They can calculate uncertainties of events taking place using probabilities • The use of evidence is a very powerful method, for processing, modelling and learning. • Best known popular application is the “Help Wizard” in Microsoft Office. • BNs are historically used in (medical) diagnostic systems • Lately, they are used in real-time reasoning systems (processing speed allows this now)

  44. Bayesian networks and Multi-agent Systems

  45. The Rhino Cooperative Framework • Created by Michael van Wie, Univ. of Rochester, NY • Used for robot soccer (RoboCup) to observe the other robots in the field • Display of actions and observations are the only type of communication

  46. Assumptions in Rhino • Each agent may hold only one intention at any given time • All agents have the same reasoning ability • When choosing actions, agents refer to the same “recipes” (descriptions of how to carry out plans)

  47. Observations in Rhino • An observation (of an action) is made when an action is recognized by an agent at an observed teammate. A likelihood is returned by the database storing all possible actions • Action recognition is therefore the process of generating observations

  48. First, define what an action is An action is a tuple <a, t, F, W>: • a is an action • t is a time interval during which the action a must run • F is a list of preconditions (the agent’s goals) that must be true before the action gets executed • W is a list of effects that will be true when the action is completed

More Related