280 likes | 490 Vues
A Decision-Theoretic Approach to Designing Proactive Communication in Multi-Agent Teamwork. Thomas R. Ioerger, Yu Zhang, Richard Volz, John Yen (PSU-IST) Dept. of Computer Science Texas A&M University. Motivation. Team. Agents share a large amount of knowledge about the teamwork.
E N D
A Decision-Theoretic Approach to Designing Proactive Communication in Multi-Agent Teamwork Thomas R. Ioerger, Yu Zhang, Richard Volz, John Yen (PSU-IST) Dept. of Computer Science Texas A&M University
Motivation Team Agents share a large amount of knowledge about the teamwork. Hard coded Interactions among participants. High-frequency message exchange. Communication risk. Multi-Agent Agent
Each agent has incomplete information from which uncertainties arise. Each agent has different problem solving capabilities. Data are decentralized and lack systems’ global control. Excessive/unrestricted communication leads to lack of scalability Challenging Issues in Designing Communication Protocols
Proactive Communication OBPC: Reduction of communication load through OBservations. DIP: Dynamicestimation of the probability distribution of Information Production and need. DTPC: Decision-Theoretic determination of communication strategies. Our Approach and Its Contributions
CAST (Collab. Agents for Simulating Teamwork) MALLET (Multi-Agent Logic-based Language for Encoding Teamwork) Background (team-plan killwumpus(?w) (process (seq (agent-bind ?ca (constraint (play-role ?ca scout))) (DO ?ca (findwumpus ?w))) (agent-bind ?fi (constraint ((play-role ?fi fighter) (closest-to-wumpus ?fi ?w)))) (DO ?fi (movetowumpus ?w)) (DO ?fi (shootwumpus ?w)))))) (ioper shootwumpus (?w) (pre-cond (wumpus ?w) (location ?w ?x ?y) (dead ?w false)) (effect (dead ?w true)))
Proactive Communication OBPC DIP DTPC Overview CAST Team Structure & Teamwork Procedure KB KB KB KB KB Optimal Communication Strategy KB
Agent Execution Cycle Observe Sense Predict Info. need and production Execution Cycle Act Effect Decide Strategy Communicate Information
Syntax of Observability <observability> ::= (CanSee <viewing>)* (BelieveCanSee <believer><viewing>)* <viewing> ::= <observer><observable> <cond> <believer> ::= <agent> <observer> ::= <agent> <observable> ::= <property>|<action> <cond> ::= (<property>)* <property> ::= (<property-name> <object> <args>) <action> ::= (DO <doer> (<operator-name> <args>)) <object> ::= <agent>|<non-agent> <doer> ::= <agent>
(CanSee ca (location ?o ?x ?y) (location ca ?xc ?yc) (location ?o ?x ?y) (inradius ?x ?y ?xc ?yc rca) ) //The carrier can see the location property of an object. (CanSee ca (DO ?fi (shootwumpus ?w)) (play-role fighter ?fi) (location ca ?xc ?yc) (location ?fi ?x ?y) (adjacent ?xc ?yc ?x ?y) ) //The carrier can see the shootwumpus action of a fighter. (BelieveCanSee ca fi (location ?o ?x ?y) (location fi ?xi ?yi) (location ?o ?x ?y) (inradius ?x ?y ?xi ?yi rfi) ) //The carrier believes the fighter is able to see the location property of an object. (BelieveCanSee ca fi (DO ?f (shootwumpus ?w)) (play-role fighter ?f) ( ?f fi) (location ca ?xc ?yc) (location fi ?xi ?yi) (location ?f ?x ?y) (inradius ?xi ?yi ?xc ?yc rca) (inradius ?x ?y ?xc ?yc rca) (adjacent ?x ?y ?xi ?yi) ) //The carrier believes the fighter is able to see the shootwumpus action of another fighter. Example Observability Rules
ProactiveTell A provider reasons about what information it will have. A provider reasons about whether to deliver a piece of information when having the information. ActiveAsk A needer reasons about what information it will need. A needer reasons about whether to ask for a piece of information when needing the information. Proactive Communication Based on Observation
Evaluation Multi-Agent Wumpus World 20 wumpuses, 8 pits, and 20 piles of gold per world. 1 carrier and 3 fighters compose a team. The team goal is to kill wumpuses and get the gold without being killed. 5 randomly generated worlds with 20×20 cells.
Strategies Utility Function Cost Function Value Function Decision-Making Decision-Theoretic Proactive Communication
Decision-Making on Situation PA Situation PA: Provider produces a new piece of information b-a: Accept e 1 a-b: ProactiveTell 0 b-a: Wait e a-b: Silence b-a: Silence 2 e e b-a: ActiveAsk a: provider b: needer e: end
DM on Situation PB Situation PB: Provider receives a request for a piece of information a-b: Reply e 0 a-b: WaitUntilNext e
DM on Situation NA Situation NA: Needer needs a piece of information e b-a: Silence a-b: Reply t 1 0 b-a: ActiveAsk a-b: WaitUntilNext e a-b: Silence b-a: Wait e 0 t a-b: ProactiveTell t: transfer
DM on Situation NB Situation NB: Needer receives a piece of information 0 e b-a: Accept t
Parameters in utility function: I: information about which communication occurs t: time of decision-making t1: time at which I is needed t2: time at which the value for I used is produced SU: situation at t S: strategy available at SU M: a set of messages involving in obtaining I E: environment state at t U(I, t, t1, t2, SU, S, M, E) =V(I, t, t1, t2, SU, S)–C(M) Utility Function
V(I, t, t1, t2, SU, S) =T(I, t, t1, t2, SU, S)//Timeliness +R(I, t, t1, t2, SU, S)//Relevance Value Function
Timeliness Whether agents use a value that can be produced in time when they need I. d(I, t, t1, t2, SU, S) = max(0, t2–t1) ft(d(I, t, t1, t2, SU, S)) s.t. ft(x) < ft(y) if y < x T(I, t, t1, t2, SU, S) = ft(d(I, t, t1, t2, SU, S)) Timeliness Function
Relevance Unprocessed, Most recent, Important P(I, t, t1, t2, SU, S) = Pr(I t t1 t2 no other value for I was produced between Int[t1,t2] | S SU) frI(P(I, t, t1, t2, SU, S)) s.t. frI(x) < frI(y) if x < y R(I, t, t1, t2, SU, S) = frI(P(I, t, t1, t2, SU, S)) Relevance Function
Cost Function 0 if Mi= C(Mi) = k1 + k2 × len(Mi) otherwise
Expected Utility E(U) =
Strategies Situation PA: provider produces I ProactiveTell? Silence? Unfulfilled need Next production Unknown t Known Last need aware of Current time Last not sent Last sent
Strategies Situation PB: provider receives a request for I Reply?WaitUntilNext? Next production Unknown t Known Current time Last production
Strategies Situation NA: needer needs I ActiveAsk? Wait? Silence? Next production Most recent production Unknown t Known Last I received Current time
Strategies Situation NB: needer receives I Accept
Summary • Advantages of Approach: allows agents to make intelligent choices of communication policy based on: • frequencies: of needs, of sensing, of info. change • costs: of messages, plus penalities for delays in action, or acting with incorrect information
There are information needs among the team. Agents can communicate. There is uncertainty in the environment. Stochastic properties of teamwork process. Agents have incomplete/disjoint knowledge about the world. The team acts under critical time constraints, so proactive assistance becomes important. Criteria for Applicable Domains