A Recall-Based Cluster Formation Game in Peer-to-Peer Systems

A Recall-Based Cluster Formation Game in Peer-to-Peer Systems Georgia Koloniari and Evaggelia Pitoura Department of Computer Science University of Ioannina, Greece http://dmod.cs.uoi.gr

? Peer-to-Peer Systems ? ? Aclass of systems in which autonomousnodes of equal roles (peers) share their resources and exchange their data Each peer connects with a subset of other peers thus forming logical overlay networkson top of the physical network (i.e. Internet) Queries are routed through the overlay network to discover the peers that store relevant results

? Clustered Overlays content query workload Peers form groups based on their content/interests(query workload) so that peers with similar content or/and interests are nearby, in the same group (cluster),in the overlay network

Motivation for Clustering Previous work oncontent-basedclustering: • SONS [Stanf. Univ. ‘02]: clusters formed based on predefined classification hierarchies • SETS [SIGIR ‘03]: peers partitioned in clusters corresponding to fixed globally known topic segments • Cohen et al [INFOCOM ‘03]: based on a learning approach that generalizes and learns the semantic categories of the data • Triantafillou et al [CIDR ‘03]: fixed set of clusters formed based on predefined semantic categories, focus on fair load distribution and reducing response times • Garbacki et al [ICDCS ‘07]: superpeer-based architecture in which peers with common interests are organized based on their caches • Doulkeridis et al [JSAC ‘07]: clustering applied first on the documents of each peer, and then on the feature vectors describing the derived clusters Peers find and exchange within their cluster data relevant to their interests with less effort Once the relevant clusters for a query are identified, the peers in them maintain relevant content that can be exploited to evaluate and refine a query

Our Contributions • We provide a novel model for cluster formation using agametheoretic approach • We address both cluster formation and cluster evolution / maintenance and cope with peer dynamics • We exploit both content and queries and aim at maximizing the overall query recall of the system • We propose an uncoordinated protocolbased on local peer decisions for playing the game with performance comparable to a corresponding coordinated protocol

Game Theory for P2P We consider dynamic clustered overlays,focus on queriesand aim at increasing theirrecall Game theoretic approaches have been applied to the overlay network creation problem and link selection in p2p Fabricant et al [PODC’03]: • Internet-like network modeled as a game • peers establish links to reduce the shortest distance to any other peer and pay for those links Moscibroda et al [PODC’06]: • proves that allowing peers complete freedom performs worse than collaboration and • even static p2p systems may never reach convergence Laoutaris et al [INFOCOM’07]: • strict bounds enforced on out-degree • directed links • peers express preferences for their neighbors Our contribution:

Overview • Cluster formation as a strategic game • Utility functions for selfish and altruistic behavior • Global system performance criteria • Stability and optimality • Case Studies • Playing the game • Relocation policies • Cluster formation and cluster evolution/maintenance • Uncoordinated cluster reformulation protocol • Protocol variations: trigger and event-based • Auxiliary mechanisms for controlling the overhead • Experimental Evaluation • Conclusions and Future Work

The Game Cluster formation modeled as a strategic game • Each peer pi is modeled as a player • A player selects a strategysi which consists of the set of clusters the peer will join (si C) • The goal of the game is for each peer to select the strategy that minimizes itsutilityfunction • The utility function is defined based on query recall

Recall-based Clustering Selfish peers:Join the cluster that will provide the most answers to the peer’s local query workload (maximizes its recall) Altruistic peers:Join the cluster to which it can offer the most to (maximize recall of the cluster members) Utility function: • Individual peer cost for selfish peers • Individual peer contribution for altruistic peers ?

Individual Peer Cost membership cost recall cost • The cost of evaluating its queries against the other clusters : recall of q if evaluated solely on p’ : number of appearances of q in the local workload Q(pi) of pi : number of queries in the local workload Q(pi) of pi : peers that are members of the clusters in pi’s strategy This is the cost that measures the loss in recall for evaluating queries at cluster not in pi’s strategy

Individual Peer Cost (cont’d) membership cost recall cost • The cost of joining a cluster measured by: • a function θ which is descriptive of the communication cost entailed in belonging to a cluster and depends on: • the size of the cluster |ci| • the topology within the cluster • a parameter a that quantifies the cost of each of the connections of a peer to the other cluster members This cost prevents the system from forming just one cluster that would otherwise minimize the cost function for all peers

Individual Peer Contribution • The contribution to the queries of the other peers in the cluster • The cost the other peers in a cluster pay when pi joins Hybrid Cost Function • d [0,1] – degree of selfishness

Global Cost • Social Cost: the sum of the peers individual costs • Workload-based Cost: the average cost of attaining results for the global query workload frequency of q in global query workload   frequency of q in local query workload of pi While social cost treats all peers as equals, the workload-based cost considers more demanding peers as more important

Social vs Workload Cost In general: If each peer pi P gets an equal portion of the global query workload ( ) then: The social and workload cost are proportional

Global Contribution • Social Contribution: the sum of the peers individual contributions • Workload-based Contribution: Social contribution favors queries popular locally to specific users, while workload contribution favors overall popular queries If the local distribution queries at each peer follows the global query distribution, then the two measures are proportional

If we ignore the membership cost, then the workload cost and workload contribution are complementary Cost vs Contribution For uniform query workload among the peers, the social cost and social contribution are also complementary

Stability Payoff Table Nash equilibrium: No peer has an incentive to change its strategy c1 c2 c1 c2 p1 p2 p2 p1

Lemma: In any stable state, there are no clusters ci, cjsuch that ci cj, i ≠ j Stability Properties Corollary: When a peer forms a cluster by itself, it cannot belong to any other cluster

Stability does not always ensure a satisfying social cost Optimality Price of Anarchy: ratio between the social cost of the worst Nash equilibrium to the social optimum • The social optimum is obtained byminimizing the social cost measure over all possible configurations • To bound the social optimum: • consider each peer separately • Select the configuration that yields the minimum individual cost • aggregate over all peers

Configurations: A: all peers in single cluster B: all peers a cluster by their selves C: k clusters Linear θ: θ(n) = φn Case Studies Case I: No underlying clustering Case II:Symmetric clusters For the peers in each cluster c

Playing the Game Consider an instance of the system (Scur) and that each peerpi belongs to a single cluster (si = {cl}), cl Ccur (set of non empty clusters) When it is its turn to play, pi considers all possible configurations Sjthat differ with Scur only in si pi has two options: • Move to a different existing cluster cv • If |cl|≠1, form a cluster by its own We discern different policies according to peer behavior: selfish, altruistic and hybrid

Selfish Policy A peer pi selects to move to the cluster which maximizes its recall ? content query workload

Altruistic Policy Peers move to the cluster to which they offer the most recall to its other members ?

Given an initial configuration • Each peer forming a cluster by its own • All peers in a single cluster Our game model addresses cluster formation Clustered Overlay Formation

Clustered Overlay Evolution Given a clustered overlay configuration • dynamic peers change their content/query workload Our game model addresses cluster evolution /maintenance

Cluster Reformulation Protocol The protocol is: uncoordinated and based on local decisions made by each peer independently Each peer: • evaluates its cost orcontributionfor all clusters in the system • selects the best strategy • computes the gain for the best strategy (cluster) • if gain > 0, pi moves to best cluster

Whendoes a peer determine it is its turn to play? • Event-based: after it becomes aware of a relevant event - evaluation of queries in their local workload for selfish peers -providing results to a query for altruistic peers • Trigger-based: when it registers a change in its gain - continuously monitoring gain • Batch-based: after a number (batch) of relevant events

Controlling Parameters Relocations induce communication and processing overhead The gain wrt the cost or contribution is not always such that justifies the entailed cost Required mechanisms to control the excessive overheads • Stopping Condition: A peer moves only if its gain is greater than a predefined threshold ε • Playing Probability: A peer does not play at each of its turns, but with a probability Pr • Quota: Each peer is assigned n moves for a time period Tq

Uncoordinated Vs Coordinated • Coordinated protocol: • cluster representatives gather and exchange the relocation requests from their clusters • requests are sorted according to gain and granted in that order Uncoordinated Coordinated • The uncoordinated protocol performs as well asthe coordinated one without the additional coordination overhead • The trigger-based protocol is the most expensive variation and the batch-based the most efficient one • The trigger-based variation adjusts faster to changes

Controlling Parameters • The stopping condition is the main factor determining the value of the achieved social cost • The playing probability and quota reduce the number of moves but increase the number of turns

Cluster Formation • Results: • The reformulation protocol identifies the underlying clusters if they exist • It does not require a priori definition of the target number of clusters • Its social cost in some cases (i.e. for symmetric and uniform peers) reaches a value close to the social optimum Setting: • Starting from different initial configurations (all peers in a single cluster, each peer a cluster by its own, k random clusters, etc) • Using symmetric, asymmetric, and uniform peers • With selfish, altruistic and mixed peer populations

Content Updates Cluster Adaptation Workload Updates • For different update scenarios our protocol copes with changes efficiently • Selfish peers react more efficiently to workloadchanges, while altruistic to contentchanges • Reclustering reduces social cost by 10%, but requires 250 turns while the reformulation protocol only 10

Clustering vs Caching • Consider a cache scheme: • Peers that provided results to previous queries are cached • Future queries are forwarded to them first • If peers that receive a query forward to the peers in their own cache (transitive) • Peers in the cache are sorted based on recall • Cache is updated after each query • Symmetric peers favor clustering • For asymmetric peers using an efficient cluster topology achieves results similar to caching • Clustering adapts to changes faster (replaces all links at once)

Summary • We modeled cluster formation as a strategic game • We defined utility functions based on query recall and cluster membership cost • We considered both selfish and altruistic peers • We derived theoretical results regarding the stability and optimality of our game

Summary (cont’d) • We proposed an uncoordinated protocol for playing our game • We presented two variations: an event and a trigger-based protocol • We combined the protocol with a set of parameters for controlling the overhead • We presented an experimental evaluation of the protocol that showed that: • there is no need for coordination • the protocol discovers the underlying clusters (no need for predefining the number) • it efficiently copes with dynamic updates

Future Work • Study further (theoretically and experimentally) the problem of cluster formation and evolution with multiple cluster memberships • Apply/adjust the cluster formation game for friends discovery in social networks • Use different criteria (besides recall) such as diversity, for determining the quality in clustered overlays • Compare our clustering goal (maximizing recall) to traditional goals in clustering applications (maximizing intra-cluster similarity and minimizing inter-cluster similarity)

Thank you

Input Parameters

A Recall-Based Cluster Formation Game in Peer-to-Peer Systems

A Recall-Based Cluster Formation Game in Peer-to-Peer Systems

Presentation Transcript

A reputation-based trust management in peer-to-peer network systems

Engineering peer-to-peer systems

Peer To Peer Distributed Systems

Peer-to-Peer Systems

Peer-to-peer (p2p) systems

Peer-to-peer systems

Peer-to-Peer Systems

Historic Integrity in Peer-to-Peer Systems

Distributed Hash-based Lookup for Peer-to-Peer Systems

Availability in Global Peer-to-Peer Systems

Peer-to-Peer Systems

9 IR in Peer-to-Peer Systems

Peer-to-Peer Systems

Peer-to-peer systems

“Information Retrieval in Peer-to-Peer Systems”

Peer-to-Peer Streaming Systems

Information Retrieval in Peer to Peer Systems

Information Retrieval in Peer to Peer Systems

Data Management in Peer-to-Peer Systems

Peer-to-Peer Information Systems

Peer-to-Peer Systems (cntd.)