Enhanced Social Learning via Trust and Reputation Mechanisms in Multi-agent Systems

Enhanced Social Learning via Trust and Reputation Mechanisms in Multi-agent Systems PhD Completion Seminar Golriz Rezaei Supervisors: Dr. Michael Kirley Dr. Shanika Karunasekera Dept. Computer Science and Software Engineering The University of Melbourne, Australia 20 April 2011

Outline • Overview • Motivation • Enhanced Social Learning • Research Goals / Questions / contributions / publications • Background • Trust and Reputation in Multi-agent Systems • Trust and Reputation in Evolutionary Game Theory • Evolutionary Games on Graphs • The Research work • First Model • Second Model • Third Model • Concluding Discussion • Acknowledgment and Questions?

Motivation • Multi-agent Systems (MAS)? • Interacting autonomous agents • Different geographical locations • Varying cognitive / processing abilities • Limited information / partial knowledge • Perform tasks  Receive utility • Difficult tasks  Beyond individual agent capacity • Maximise utility  Interact (collaboration / resource sharing) Problem? • Appropriate partners  Successful performance  Maximise utility • Open dynamic MAS  Uncertainty + Partial knowledge Establishing strategic connections is difficult!

Enhanced Social Learning • Social Learning (biological background)? • Learning through observation / interaction with others • Knowledge transmission without genetic materials • Acquire knowledge from others without incurring the cost/time • Major mechanism  Imitation(perceive and reproduce behaviour) • Why good? • keep track of beneficial interaction partners • save time / energy / cost • Improve long term performance (individual / system) • Problem?  error-prone / outdated / inappropriate information

Enhanced Social Learning cont. • Solution?  selective • When  High individual trial-and-error cost • Intermediate environment change rate • How  Mixed with personal innovation • From whom  • Agents are heterogeneous • Appropriate role models  Important for performance • Partner selection

Enhanced Social Learning cont. • Top-down • Plan at design time • Ability of the designer  predict optimal connections in advance • Fixed structure of relations (random / particular topology) • Autonomy condition + Environmental condition  not realistic • Automatic learning  • Build and sustained adaptively at run time • Trust & Reputation  Formal definition? • Evaluate before interaction  Partner selection / Decision making • Relations evolve  Partner’s reliability / trustworthiness Survey in Ch2 Evolutionary game theory Concrete App MAS

Coevolutionary Endogenous Social Networks Dynamic relation formation Social ties Agents’ strategies Topology Behaviour

Proposed framework Enhanced Social Learning Trust & Reputation • Life-experiences • Endogenous Evolving • Social Networks •  Evaluation ? Social Learning 1) Social Dilemma  Evolutionary Games 2) Advice-seeking in Distributed Service Provision Applications

Research goals and questions Central hypothesis: “Does incorporating concepts of trust and reputation within a social learning framework help to enhance the agents’ interactions in a MAS? And consequently does it help to improve their long term performance?” (Life-experiences / Aging) + (Coevolutionary endogenous social networks)Trust / Reputation?  Effective social learning approaches? Encourage cooperation in social dilemmas? Broader perspective of general MAS applications (Advice-Seeking for Resource Discovery in Distributed Service Provision) Impacts of agents’ heterogeneity (behaviour/attributes/preferences) Structural characteristics of the underlying evolved relationship networks? Interaction patterns system's behaviour? Interaction pattern System behaviour

Publications • Life Experiences in Spatial 2-player Prisoners’ Dilemma Game • G. Rezaei and M. Kirley (2008). Heterogeneous payoffs and social diversity in the spatial prisoner's dilemma game. In X. Li, M. Kirley, and M. Zhang, editors, Proceedings of 7th International Conference on Simulated Evolution and Learning (SEAL), volume 5361 of Lecture Notes in Computer Science, pages 585--594, Springer. • G. Rezaei and M. Kirley (2009). The effects of time varying rewards on the evolution of cooperation. Evolutionary Intelligence, 2(4):207-218. First Model

Publications cont. • N-player Prisoners' Dilemma Game on an Evolving Social Network • G. Rezaei, M. Kirley and J. Pfau (2009). Evolving cooperation in the N-player prisoner's dilemma: A social network model. In K. B. Korb, M. Randall, and T. Hendtlass, editors, Artificial Life: Borrowing from Biology (ACAL), volume 5865 of Lecture Notes in Computer Science, pages 32-42, Springer Verlag, Berlin. • An extended version is under preparation (2011). • Distributed Advice-Seeking on an Evolving Social Network • G. Rezaei, J. Pfau and M. Kirley (2010). In Distributed Advice-Seeking on an Evolving Social Network. 2010 IEEE/WIC/ACM International Conference on Intelligent Agent Technology. Second Model Third Model

BackgroundTrust and Reputation in MAS • Trust: [Gambetta 1988] • Subjective probability expects performs a given action  welfare depends on. • Reputation:Information about an agent’s behavioural history. • [Ismail et. al. 2007] • Challenging • Confusing • Inconsistent Typology  A B A Survey in Ch2

Background cont.Typology • Suitable • mechanisms 1) Variety of sources of information 2) Individuals/distributed evaluation 3) Robust against possible lying/fraud

Background cont. Evolutionary Games • Game Theory (GT)? • Evolutionary GT? • Social Dilemmas?“Cooperation”  “Tragedy of the commons” • Autonomous individuals • Theory  individuals behave selfishly • Nature  cooperation exists • Abstract framework  many real-life scenarios • Simple games + rich dynamics  • Appropriate mathematical tools  • Study complex Strategic interactive scenarios [Hardin 1968] • Biology, Economics, Sociology (IEEE Trans, Statistical Physics, Nature, CEC, GECCO …) • Distributed systems (P2P) (DAI) • Crucial for performance of MAS act cooperatively  contribute to the social welfare Still an open ended question! • (AAMAS) behave selfishly (not investing anything )  enjoy the free benefits shared among all the members (free-riding) • Mechanisms?

Background cont. Prisoners’ Dilemma • Why? • The most difficult settings for cooperation • Robust and fundamental method of modelling • Simplicity of statement and design MAS • (2-PD) • 2 players / agents • 2 choices (C or D) • Payoff joint actions • Actual values  order • Order change  game change • (D,D)  Nash Equilibrium i) T > R > P > S ii) 2R >= (T + S)

Trust and Reputation in Evolutionary Games • 5 Fundamental mechanisms  Evolution of “Cooperation” • Kin selection vs. Group selection • Direct Reciprocity • -Iterated encounters • -Return of altruistic act / punishment • -“You scratch my back, I’ll scratch yours!” • Indirect Reciprocity • -Unlikely repeated interactions • -Return from third parties • -Image/Reputation score • -“You scratch his back, I'll scratch yours!” • Network Reciprocity • -Social / spatial constraints  Non-uniform / Local neighbourhood interactions • -Clustering effect (community structure)  Enhances cooperation [Nowak 2006] Compare  Trust & Reputation

Background cont.Basics of the Networks • Network graph, G(N, E), • N finite set of nodes (vertices) • E finite set of edges (links) • G represented by N×N adjacency matrix • aij = 1 there is an edge between node i and j • aij = 0 otherwise A graph with 8 vertices and 10 edges Network of computers

Background cont.Topological properties • Degree, ki, of a node • Path length, L average separation between any two nodes • Clustering coefficient, Ci , of a node • probability that two nearest neighbours of a node are also nearest neighbours of each other.

Background cont.Types of Networks ? • Random  uniform probability p • Mathematical objects  Comparison only (not good for real social network) • Regular  • Not good for real networks • Small-World  • Regular lattice Random graph • One end of each link  rewired small probability p • Highly clustered + Short path length • Scale-Free • Grow  preferential attachment • Power-law degree distribution • Most nodes very few links, small nodes highly connected The same degree 2-D square grid (lattice) transition 1-D circular 0  p  1 Small-world graph

Background cont.Evolutionary Games on Graphs • Local neighbourhood interaction • Population Structure  system dynamics • Clusters of cooperators Enhance cooperation • Developmental stages • -scaffolding interaction  different types of network topology • -parameters (magnitude rewards/punishments, population size, initial condition, update rules) • -mathematical analysis difficult  Computational simulations Socio-biological Uniform interactions Non-uniform interactions Dynamic Networks Non-uniform interactions Static Networks Realistic Social Net 2-D Grids

First ModelLife Experiences in Spatial 2-PD Game • Only Decision making • NoPartner selection • Cooperative behaviour Enhanced Social Learning Trust & Reputation Life-experiences & Age Fixed Network (grid) ? Social Learning • Local neighbourhood interaction  Moore • Accumulates received payoffs  Fitness • End of each round  Imitate • the most successful neighbour (MSN) • Clusters of cooperators  • outweigh losses against defectors

First Model cont.The challenge • Typically  “Universal fixed payoff matrix” • Hypothesis  Introducing “social diversity” • alters trajectory of the population • Adaptive rewards  (Individual agent strategies + Life-experiences) • Given a limited agent life span • MSN (Highest accumulated normalized utility + Older) • Role model trustworthiness! • Ageαi(t+1) = αi(t) + 1 • Life-span λi randomly from a uniform distribution [min, max] • (αi(t) == λi  dies and replaced by a new random agent) • Personal version of payoff matrix  updated at each time step • based on experience level Each agent ? Contributions Update rule

First Model cont.Adaptive rewards • Update  • Where is the payoff values for agent iat time t • is the default payoff matrix values T, R, P, S • is the magnitude of the rescaled values • is the age of agent i at time t • is the expected life time of agent i • is limiting factor and characterises the uncertainty related to • the environment 1) 2)

First Model cont.Scenarios • Standard PD Universal fixed Payoffs + Age • Homogeneous modelUniversal fixed Payoffs+Age • Heterogeneous model Individual Adaptive Payoffs + Age • (3 versions: update 4 elements / update 1 element / update 1 element capped) • What is the equilibrium state? • Coevolution • Altruistic behaviour + Non-stationary dynamic rewards (S)  (HOM)  (Het 1) (Het 2) (Het 3)

First Model cont.Experimental setup • 2-D grid (32*32)  Implemented in Netlogo 4.0 [Wilensky 2002] • Population initialization  (20% C – 80% D) / (50% C – 50% D) • Payoff (small: T=1, R=1, P=0, S=0) / (Big: T=5, R=3, P=1, S=0) • Life-span distributions (λi )  [0,50] / [0,100] / [50,100] • Environmental constraint K  [0.1 : 0.025 : 0.2] • Each trial  10000 iterations & All configurations  30 times • Statistical results are reported

First Model cont.Sensitivity to the base payoff values Payoff (small: T=1, R=1, P=0, S=0) / (Big: T=5, R=3, P=1, S=0) Standard (S) Homogeneous (HOM)

First Model cont.Heterogeneous vs. Homogeneous Payoff: (Big: T=5, R=3, P=1, S=0) / Population initialization (20% C – 80% D) (50% C – 50% D)

First Model cont.Snapshots Payoff: (Big: T=5, R=3, P=1, S=0) / Population initialization (20%C – 80% D) (Het 1) Varying size clusters of cooperators (black) (Het 2) (Het 3) Other extra results for different parameters K, life-span, replacement … (HOM)

Second ModelN-PD on an Evolving Social Network • Decision making • Partner selection • Coevolution (Interaction network + Individuals’ strategy) Enhanced Social Learning • 2-PD  N-PD • Cooperative behaviour in larger groups  More difficult ! (N > 2) • Real-world social communities • Fixed underlying network  Relaxed • Relations evolve over time • Link weights  Trust & Reputation Trust & Reputation Social Learning Endogenous Evolving Social Networks

Second Model cont.N-player Prisoners’ Dilemma • Natural extension of 2-PD • Utility  • [Boyd and Richerson 1988] • Conditions • defection is preferred for individuals • contribution to social welfare is beneficial for the group Conventional EG (D,D, … all D)

Second Model cont.Evolving Relations • Agents play cooperatively  form social links (reinforced) • One agent defects breaks his links with the opponents slow positive / fast negative

Second Model cont.Contribution - Hypothesis • Incorporating “social network” into N-player PD  • Network evolves by cooperative behaviour • Introducing “cognitive” agents  • Decision making based on some function of the opponents • Encourage high levels of cooperation • Persist for longer • Analyse the state of the underlying network

Second Model cont.Schematic Algorithm Algorithm: Social network based N-PD model Require:Population of agents P, iteration = imax, players N 2 1: fori = 0 to imaxdo 2: G = 0; 3: while g = NextGame(P,G, N) do 4: G = G {g} 5: PlayGame(g) 6: AdaptLinks(g) 7: endwhile 8: a,b = Random Sample(P) 9: CompareUtilityAndSelect(a,b) 10: end for Partner selection Decision making

Second Model cont.Game Formation Partner selection • First agent  Randomly from remaining population • Two Scenarios • (N-1) partners Randomly from remaining population From the first agent remaining social contacts probabilistically

Second Model cont.Game Execution Decision making • Two scenarios (cognitive abilities) • Pure strategy (always cooperate/defect) • Mixed strategy (play probabilistically) • Discriminators function of • Agents receive corresponding • payoff based on outcomes • (Boyd and Richerson function) gradient generosity • Average links weight

Second Model cont.Snapshots |P| = 25, N = 3, Defector, Cooperator, Discriminator • Self-organize social ties based on their self-interest • Strategy update cultural evolution

Second Model cont.Scenarios •  Partner selection + Decision making • (Random matching) (Pure strategy) •  Partner selection + Decision making • (Social Network game formation) (Pure strategy) •  Partner selection + Decision making • (Random matching) (Pure strategy + Discriminators) •  Partner selection + Decision making • (Social Network game formation) (Pure strategy + Discriminators) Step 1  Step 2  Step 3   Step 4

Second Model cont.Experimental Setup • Population size = 1000 • Group sizes = (2, 4, 5, 10, 15, 20) • ε = 0.9 Game formation probability • b = 5 and c = 3 (payoff values benefit & cost) • Pure strategy scenario (50% pure C – 50% pure D) • Mixed strategy scenario (33.3% each) • α = 1.5 and β = 0.1 (decision function) • average 20 independent trials up to 40000 iterations What is the equilibrium state and network topology?

Second Model cont.Group size vs. Strategy Step 1 Step 2 Step 3 Step 4

Second Model cont.Emergent Social Networks Clustering Coefficient Step 2 Step 3 Step 4

Second Model cont.Final Degree Distribution Step 4 N=2 Step 4 N=5 • Cooperation higher  degree distribution higher • Size & shape  depend on N

Third ModelDistributed Advice-Seeking on an Evolving Social Network • Decision making • Partner selection • Coevolution (Interaction network + System’s behaviour) Enhanced Social Learning Trust & Reputation • Games  Advice-Seeking in Distributed Service Provision • Relations evolve over time (Link weights  Trust & Reputation) Life-experiences Social Learning ? Endogenous Evolving Social Networks

Third Model cont.Distributed Infrastructure Technology • Characteristics • Unknown large environment • Varieties of selection options • Users are heterogeneous • Exact characteristics not available • until accessed, if it is made explicit at all • Ex./ Specialized protein search engines, Netflix • Approaches • Individual try & error • Central registration directory (Brokers, Web Service [Facciorusso et. al. 2003]) • Advice seeking Direct exchange of “selection advice” beneficial! • ex./ Learning [Nunes and Oliveira 2003 ], Distributed Recommender Systems Question?

Third Model cont.Advice-Seeking • Question: • Heterogeneousindividual requirements Whom? • Challenge:Identify other suitable users difficult! - Large number of them - Preferences not publicly available - Not in a position to make their own preferences explicit Social Networks! • Social contacts serve as valuable resources • Manage improve long term payoff gains

Third Model cont.Abstract Framework • Agent-based simulation (resources + agents) • Repeatedly • Subjective Utility • Goal = Maximize long term utility, limited selections • Challenge = Identify appropriate resources • Evolving Social Network • - Connect with similar minded  Autonomously • based on local information only - Receive advice  improve resource selection - Learn their own subjective utility  advice accuracy decide retain / drop the contact • - Form new connections Seek referrals Match?

Third Model cont.What we study? • This capability • Connection network Advice exchange • Agents’ interactions Social relationships • The evolving social network Utility gain Affect the match? How co-evolve? Change? Improve?

Enhanced Social Learning via Trust and Reputation Mechanisms in Multi-agent Systems