280 likes | 396 Vues
Modeling the Process of Collaboration and Negotiation with Incomplete Information. Katia Sycara , Praveen Paruchuri, Nilanjan Chakraborty Collaborators: Roie Zivan, Laurie Weingart, Geoff Gordon, Miro Dudik. Virtual Humans USC. RESEARCH PRODUCTS. Computational Models CMU, USC.
E N D
Modeling the Process of Collaboration and Negotiation with Incomplete Information Katia Sycara, Praveen Paruchuri, Nilanjan Chakraborty Collaborators: Roie Zivan, Laurie Weingart, Geoff Gordon, Miro Dudik
Virtual Humans USC RESEARCH PRODUCTS Computational Models CMU, USC Implementation CMU Validated Theories Models Modeling Tools Briefing Materials Scenarios Training Simulations Identify Cultural Factors CUNY, Georgetown, CMU validation validation Theory Formation validation Surveys & Interviews CUNY, CMU, U Mich, Georgetown Data Analysis CUNY, Georgetown, U Pitt, CMU Cross-Cultural Interactions U Pitt, CMU Common task MURI 14 Program Review-- September 10, 2009 Subgroup task
Problem • Computational model of reasoning in Cooperation and Negotiation (C&N) • Capture the rich process of C&N • Not just outcome • Not just offer-counteroffer but additional communications • Account for cultural, social factors • Rewards of other agents not known • Uncertain and dynamic environment MURI 14 Program Review-- September 10, 2009 MURI 14 Program Review-- September 10, 2009
Contributions • Created an initial model from real human data. The model: • Applicable in a uniform way to both collaboration and negotiation • Derives sequences of actions for an agent from real transcripts, as opposed to state of the art work where action selection is constructed heuristically • Adapts its beliefs during the course of the interaction • Learns elements of the negotiation (e.g. other party type) as the interaction proceeds • Produces optimal activity sequences considering also the other agents • Has only incomplete information about others MURI 14 Program Review-- September 10, 2009
POMDP: Partially Observable Markov Decision Process The World (Other agents) • Agent has initial beliefs • Agent takes an action • Gets an observation • Interprets the observation • Updates beliefs • Decides on an action • Repeats Agent takes optimal action considering world/other agents Action Observation Agent Elements: {States, Actions, Transitions, Rewards, Observations } MURI 14 Program Review-- September 10, 2009
Why POMDP based modeling ? • Decentralized algorithm • Incorporated in an agent that interacts with others • Can represent communication (arguments, offers, preferences etc) • Many conversational turns • Learns e.g. the model of the other player • Adaptive best response • Computationally efficient for realistic interactions • Extendable to more the two agents Natural way to represent cultural and social factors in C and N MURI 14 Program Review-- September 10, 2009 MURI 14 Program Review-- September 10, 2009
Output of POMDP • The output is a policy matrix • Policy: Optimal action to take, given current state (observations and other’s model) • At run-time, agent consults the matrix and takes appropriate action MURI 14 Program Review-- September 10, 2009
Simplified Example • Two agents negotiating • Seller S (POMDP Agent) • Buyer B (Other player) • Single item negotiation • Initially buyer at 0 price and seller at max = 10 MURI 14 Program Review-- September 10, 2009 MURI 14 Program Review-- September 10, 2009
Example: State Space • State composed of 2 parts – • Seller Type, Buyer type • Negotiation status: current offers • Agent types: cooperative or non-cooperative • Negotiation modeled from Seller’s perspective • Initially high uncertainty of Buyer type • Seller’s belief about Buyer, and state of negotiation are dynamic MURI 14 Program Review-- September 10, 2009 MURI 14 Program Review-- September 10, 2009
Example: POMDP State • Agent Type: cooperative vs non-cooperative • 0 cooperative, 1 non-cooperative • Discretized to {0, .5 , 1} • Price discretized to the set {0,1,..,9,10} • Sample state: • State space = Number of Buyer types * Negotiation states = 363 MURI 14 Program Review-- September 10, 2009 MURI 14 Program Review-- September 10, 2009
Example: Action & Transition • Action set: {Concede 2, Concede 1, Concede 0, Accept, Reject} • Transition: Probability of ending in some state if agent takes a particular action in current state MURI 14 Program Review-- September 10, 2009 MURI 14 Program Review-- September 10, 2009
Concede 2 Agree Concede 0 Concede 1 Concede 1 0.5 0.65 0.5 0.35 0.1 0.7 0.2 0.6 0.35 0.05 0.25 Concede 1 0.75 Concede 0 Concede 2 Concede 0 0.1 0.2 0.1 0.5 0.7 0.4 MURI 14 Program Review-- September 10, 2009 ( $5, $5) ( $4, $6) ( $6, $4)
Building Initial Simplified POMDP • Human negotiation transcripts • 2 players (Grocer and Florist) with 4 issues • Mapped dialogues to 14 base codes (actions) • Other player’s type known for each transcript • Used for training and validation of the model • Transition: Frequency of reaching some state, given a code • Observation: Frequency of observing a code given some negotiation state MURI 14 Program Review-- September 10, 2009 MURI 14 Program Review-- September 10, 2009
POMDP construction Model Generator Grocer-Florist Transcript <Player, Action code> Learns (Empty) Model generated Prescription of optimal actions given state of interaction Reasoning over model MURI 14 Program Review-- September 10, 2009 MURI 14 Program Review-- September 10, 2009 14
Codes used Courtesy of Laurie Weingart MURI 14 Program Review-- September 10, 2009
Sample Grocer-Florist Transcript Speaker Code Unit Florist PC So let’s start with temperature Grocer RPS Okay Florist OS So I would suggest a temperature of 64 degrees Grocer RPS Okay Florist Q How does that work for you? Grocer IP Well personally for the grocery I think it is better to have a higher temperature Grocer SBF Just because I want the customers to feel comfortable Grocer SBF And if it is too cold that might turn the customers away a little bit Florist RPS Okay Grocer SBF "And also if it is warm, people are more apt to buy cold drinks to keep themselves comfortable and cool" Florist RPS That's true. Grocer OS I think 66 would be good. Grocer SBF That way it is not too cold and it is not too hot as well. Grocer SBF And its good for the customers. Florist RPO "Okay, yeah" Assumed Florist is Cooperative MURI 14 Program Review-- September 10, 2009
Grocer POMDP generated Discuss preferences and support their positions Florist 64F Agrees without committing Florist Florist Proposes 66F Doesn’t commit Agrees to 66F Grocer substantiates his offer Reward 60 points for both Grocer and Florist MURI 14 Program Review-- September 10, 2009
Negotiation Game Grocer Action Human (Florist) Agent: (Grocer) Optimal POMDP policy Florist Action • Sequential • Process oriented • Blends computational and social science results MURI 14 Program Review-- September 10, 2009 MURI 14 Program Review-- September 10, 2009 18
Initial results – Classification of Florist Uncertainty of belief • 10 transcripts for training: 4 cooperatives, 6 non-cooperatives • 5 for testing –average of correctly classified • X axis – Number of communications • Y axis – Uncertainty of belief of grocer about florist MURI 14 Program Review-- September 10, 2009 19
Modeling Cultural Factors • How do we model cultural factors for C and N in a POMDP? • How do we validate the model? • Is the model general enough to exhibit plausible culturally-specific human behavior? MURI 14 Program Review-- September 10, 2009
Culture and POMDP • Initial beliefs about others’ social value orientation and behavior usually reflect own culture beliefs about the interaction • Culture influences frequency of particular actions and communications • Interpretation of each observation refines the agent’s model of others • Interpretation is influenced by culture • Model can capture cultural misinterpretations and their consequences in terms of strategy and outcomes • Agents from different cultures can have different rewards for the same actions MURI 14 Program Review-- September 10, 2009
Other’s type • Includes factors such as: • Social Value Orientation • Pro-Social/cooperative, individualistic, competitive, altruistic • Trust, Reputation etc • Cultural factors • Individualist vs Collectivist • Egalitarian vs Hierarchy • Direct vs Indirect communication MURI 14 Program Review-- September 10, 2009 MURI 14 Program Review-- September 10, 2009
A’s schema A’s real intent A’s behavior Reward A’s interpretation of B’s intent A’s culture A’s history with B Context Actions Transition State Space B’s schema A’s schema Initial Beliefs B’s culture B’s history with A Context B’s schema B’s behavior B’s real intent B’s interpretation of A’s intent Reward Observations Cognitive Schema of A POMDP
A’s schema A’s real intent A’s behavior Reward A’s interpretation of B’s intent A’s culture A’s history with B Context Actions Transition Survey experiments State Space B’s schema A’s schema Initial Beliefs B’s culture B’s history with A Context B’s schema B’s behavior B’s real intent B’s interpretation of A’s intent Reward Observations Observer Experiments Capturing initial state of model
A’s schema A’s real intent A’s behavior Reward A’s interpretation of B’s intent A’s culture A’s history with B Context Actions Transition State Space B’s schema A’s schema Initial Beliefs B’s culture B’s history with A Context B’s schema B’s behavior B’s real intent B’s interpretation of A’s intent Reward Observations Intercultural transcripts Capturing model dynamics
Plans for Next Year • Initial beliefs from Observer Experiment and from surveys (US, Turkey, Egypt, Qatar) • Collect intra-cultural negotiation transcripts • US, Turkey, Egypt • Build POMDPs from intra-cultural negotiation transcripts • US, Turkey, Egypt • Build POMDPs from inter-cultural negotiation transcripts • US-Hong Kong, US-German, US-Israeli (have) (courtesy of Wendi Adair and Jeanne Brett) • US-Turkish, US-Egyptian, US-Qatari (collect) MURI 14 Program Review-- September 10, 2009
Plans for Next Year • Validate the predictive behavior of the models • Using the transcripts for training and testing • Use the models in negotiation with humans • Use the models in what-if scenarios • Use the models to generate hypotheses to test with human subjects • Initial models for collaboration scenarios using POMDP MURI 14 Program Review-- September 10, 2009
Thank You Any questions ? MURI 14 Program Review-- September 10, 2009 MURI 14 Program Review-- September 10, 2009