Learning Mediation Strategies in heterogeneous Multiagent Systems Application to adaptive services. Romaric CHARTON MAIA Team - UMASS Workshop Wednesday, 23 rd June, 2004. Presentation Overview Learning Mediation Strategies in heterogeneous Multiagent Systems.
Presentation OverviewLearning Mediation Strategies in heterogeneous Multiagent Systems • Research and application fields • heterogeneous Multiagent Systems (h-MAS) • Typical example of interaction • Markov Decision Process based Mediation • Experiments • Works in progress
Research fields Domain : heterogeneous Multiagent Systems (h-MAS) • Learning behaviours of agents that interact with human beings • Organization of agents with different nature Approach : • Inspiration from the Agent-Group-Role model (Gutknecht and Ferber 1998) • Deal with real applications • dynamic environments • uncertainty • incomplete knowledge Use of Stochastic Models (MDP) + Reinforcement Learning
To Adaptive services • Ease the design and the control of interactions • Robustness of the solution(particular cases, unforeseen cases, etc.) • Adapt the interaction to the user's behaviour, characteristics and preferences Applicative Domains : Interactive services Interaction with humans in real applications • Provided on computers and network supports • Use of various communication media (telephone, e-mail, web, etc.) • Examples : order online, search information, manage shares, etc. (focus on Information Search services) From Classical Interactive Services Most of time controlled with handwritten finite state machines (static scripts) • Complexity (particular cases and errors) • Need of implicit / expert knowledge (for instance : the user model)
Heterogeneous Multiagent Systems (h-MAS) • Common features • Bounded Rationality Agents(Russell and Norvig 1995) • Ability to communicate and to manage knowledge and resources • Partition of the agent set according to • Their nature (human, software, etc) • Their subjective "confidence" (knowledge and influence on the others : goal delegation, ...) • Problems • How to bridge the language gap ? • How to matchneeds to capabilities ? • What if agentscannotbemodified ? • What if some agents are humanbeings (Grislin-LeSturgeon and Peninou 1998) ? Our solution: add a Mediator Agent that will manage the interaction
Goal : book a flight from Paris to Moscow Don't know how to formulate a request Too many/raw results... Information Source (not owned, cost) Query Interaction Results Mediator An Information Search problemFlight booking Customer (occasional, novice) Objective : Enhance the servicequality relatively to classical search
Role of the Mediator Agent Its goals • Build a query that matches the most the user goal • Provide relevant results to the user • Maximize its utility (user satisfaction - source costs) • At any time, it can • Ask the user about the query, • Send the query to the information source or • Propose a limited number of results to the user In return, it perceives the other agent's answers (values, results, selections, rejections, etc.) • It has to manage uncertainty and incomplete knowledge : • From users (misunderstandings, partial knowledge of their needs) • From the environment (noise and imperfect sensors)
MDP based Interaction Control Need to define : < S, A, T, R > • S : State space • A : Mediator actions • T : Transition functions • R : Reward function Proposition : Control an interaction sequence as a Markov Decision Process (MDP) find Mediation Strategies (MDP Policies) Mediator's Environment User Source Interaction Sequence (MDP to control) T S, R A Mediator Problem : T and R depend on user and source agents ! Solution : Learn the mediation strategy online by reinforcement Choice : Q-Learning (Watkins 1989)
Interaction with the user Interaction with the source • S Uset of partial user queries • S Rpower set of all source objects Current partial query (attribute values) s U = { ( ea 1 , val 1 ) , ... , ( ea m , val m ) } Known objects matching the current query s R = {flight 1 , ... , flight r} or {unknown} • Attribute stateea : • ‘?’ val is unknown • ‘A’ val is assigned • ‘F’ valcannot be specified State Space of Interactions Sequences S = SUSR Complexity Problem ! || = (2 n + 1) (2 + i) m n : number of total source objects m : number of attributes i : average value count per attribute Idea : use a State Abstraction for the MDP
|sR | + qr = * nrmax qr = + 0 qr = 0 unknown qr = ? Abstract State Space (used for the MDP) Interaction with the user S = S U S R Interaction with the source SUset of user queries formulation state S R ={?, 0, +, *} Quantity Classes Current partial query formulation state s U = { ea 1 , ... , ea m } Response quantity for the current query s R= qr( |s R | ) • Attribute stateea : • ‘?’ val is unknown • ‘A’ val is assigned • ‘F’ valcannot be specified |S| = 4 3 m m : number of attributes A more tractable state space !
Actions and Rewards • Actions of the mediator • Ask the usera question about an attribute (valuation, proposition, confirmation) • Send the current query to the information source • Ask the user to select a response • Rewards can be obtained through interaction • with the user • + R selection user selects a proposition • R timeout too long interaction (user disconnection / time limit) • with the information source • + R noresp no results for a fully specified query • - R overnum too many results (response quantity s R = *)
ExperimentationFlight booking Training of the mediator on tasks with • 3 attributes (cities of departure/arrival and flight class) • 4 attributes (+ the time of day for taking off) • 5 attributes (+ the airline) Complexity growth as function of the number of attributes:
Learning resultsFlight booking Successful mediations Average interaction length • 3 / 4 attributes : 99% of success, minimal mediation length length reached • 5 attributes : more time required to converge and longer mediation
Conclusion Mediation Strategies in h-MAS • Reinforcement learning of mediation strategies is possible • Answer to users needs (majority, but also particular, through profiles) Software model • Towards "user oriented" design (utility based on user's satisfaction) • Implementation of a Mediator prototype Limits • Limited richness of the learning due to the simulated answer generator • User is at most partially observable • Degradation of performance for more complex tasks
Current Works Deal with Partial Observation • Challenge : Get rid of the ad-hoc state space abstraction • Key question : "What must be kept in / from the interaction history ?" • Study of memory based approaches : • HQ-Learning (Wiering and Schmidhuber 1997) • U-Trees (McCallum 1995) • ... Deal with structured tasks • Challenge : Reduced state space complexity, better guidance ... and service composition ? • Main idea : Exploit or discover the task structure (sub-tasks, dependencies, etc.) • Hierarchical models are promising • MAX-Q (Dietterich 2000) / HEX-Q (Hengst 2002) • HAM (Parr 1998) / PHAM (Andre and Russell 2000) • H-MPD and H-POMDP • ...
