Can Peer-to-Peer File-sharing be of Help for Research Communities ? Julita Vassileva Computer Science Department (MADMUC Lab) University of Saskatchewan
Outline • Motivation • Problems: user participation, trust • Motivating user participation • User modelling • Reward with better QofS • Social awareness (visualization) • Ensuring trust • Conclusions
maranGraphics Inc. Motivation • Need a search engine for locally stored papers • Web –links disappear, protected sites • Hard disks too large • Why P2P? • Harvest the resources of a community of users • Advantages of a distributed approach vs centralized
What is a P2P System? GNUTELLA
COMTELLA • A P2P (Gnutella based) system for file sharing and service • users share academic papers, code snippets • Non-centralized digital library for a research group / class • Can be downloaded from: http://bistrica.usask.ca/madmuc/news.htm
Vassileva J. (2002) Supporting Peer-to-Peer User Communities, in R. Meersman, Z. Tari et al. (Eds.) "On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE" Coordinated International Conferences Proceedings, Irvine, Springer LNCS 2519, 230-247. Vassileva, J. (2002) Motivating Participation in Peer to Peer Communities,in P.Peta, R.Tolksdorf, F. Zambonelli (Eds.) Engineering Societies in theAgents World III, Proceedings of the 3rd International Workshop ESAW'02,Madrid, Springer LNAI 2577, 141-155. Bretzke H., Vassileva J. (2003) Motivating Cooperation in Peer to Peer Networks, in P.Brusilovsky, A. Corbett, F.De Rosis (eds.) Proceedings of the 9th International Conference, on User Modelling, UM03, Johnstown, PA,Springer LNCS, 218-227. Lingling Sun Graduate student Christopher Cox NSERC Summer 2002 project Helen Bretzke CRA-W and NSERC Summer’ 2002 project Yamini Upadrashta Graduate student
Problems • User Participation • “critical mass” needed • most users are free-riders • why do people contribute? • satisfies a need (is useful) • doesn’t cost (effort, money, inconvenience) • there is some incentive (money, glory, power) • serves a greater cause (e.g. cancer research, SETI@home, etc.) • Trust • sure that contributing won’t cause harm • able to identify trustworthy peers
First condition: system must be useful • Allow searching own files • Any file stored on disk can be found with Comtella • Shared files can be stored anywhere on disk • Integration with other tools • With Browser (e.g. IE, Netscape, Mozilla, etc.) • allows viewing files directly from Comtella • prompts the user to share papers when a PDF file opened • With Word Processor (e.g. MS Word) • generating lists of references automatically • Additional functionality • Adding annotations and ratings to papers
Levels of participation • Bring new files • Provide disk space / processor time • Dispatch requests • Stay on-line • Use and quit
socially motivated altruistic materialistic utilitarian How to motivateparticipation? • Why do people offer their time and resources? Different people have different motivations: • Some are altruists • Some would help their friends and hope to make new friends through helping • Some seek glory • Some would expect better service • Some seek high marks • or money…
Incentives Micro-payments for each transaction? Shirky says it won’t ever work (e.g. Mojo-nation): Flat rates work better (e.g. Internet, cable) How to map virtual currency into real money?
socially motivated altruistic materialistic utilitarian How to motivateparticipation? • Why do people offer their time and resources? Different people have different motivations: • Some are altruists (for the cause) • Some would help their friends and hope to make new friends through helping • Some seek glory • Some would expect better service • Some seek high marks • or money…
Know your user! Modelling • User Type:Altruist? Socialist? Utilitarian? • User Interests:What does she search / need? • User Relationships and Community:Who shares interest with the user? Potential “friends” and “foes”.
Modelling user interests • Define a taxonomyof subject categories(e.g. ACM subject index) • Keep track of thecategories of queries( user interests) • Keep track of resources offered by the user in each interest category • Update user level of interestin each sub-category using reinforcement learning • Cluster users in interest-based groups
Computing user interests • Reinforcement learning The user’s strength of interest S in an area a is calculated based on how frequently and how recently the user has searched in this area. Sa(et, t) = i * Sa(e t-1, t-1) + (1 - i) * et where etÎ [0, 1] is calculated as et = 1/ d, and d = 1 + level_distance between the level of the sub-area of the query and the level of the area a in the ontology hierarchy. Currently, the ontology hierarchy has only 2 levels, so et= 0.5
Modelling user relationships • Monitor whose files the user chooses, the quality of the files (does the user keep the files), and who downloads files offered by the user • Represent each user relationship: For each area of interest • Strength – how successful service was given (reinforcement learning used, similar to user interests) • Balance – reciprocity of services used/ given • Adapt P2P topology – form a neighborhood for search using the best relationships (“friends”) in the area of search Gnutella
Computing the balance of a relationship • BXY = (N XY - N YX ) / (N XY + N YX ) BXYÎ [-1, 1] N XY - number of times X took from Y N YX- number of times Y took from X
Modelling user type • Monitor user’s actions regarding file sharing, relative time spent on-line, acts of interrupting service, total balance of user’s giving / taking • Update a number in [-1, 1] representing user’s cooperativeness • Motivational actions in the interface triggered by passing certain thresholds
Computing user type • The measure of user cooperativeness at time t C(wt, t) = i * C(w t-1, t-1) + (1 - i) * wt, w [-1,0) (0,1] represent the weight of evidence, wherew < 0 is a selfish act whilew > 0 is an altruistic act. overallBalance = (1/n)*SY (BXY) userType = (C(wt, t)+ overallBalance) /2 If userType is in [-1, -0.5) then user is selfish, if it is in [-0.5) ( 0.5] then user is reciprocal, and if it is in (0.5, 1] then user is altruistic.
Rewarding relationships • People who share a lot of useful files and behave cooperatively will have more friends • Friends are treated differently • Transfers not interrupted • Queries processed with priority • Queries are propagated farther • Queries sent to friends in the area • Higher chance of having relevant files • Faster responses • Better quality of files • People with more friends get better Quality of Service!
Evaluation results - simulation Comparing the round trip time obtained for queries without a friends’ list with the round trip time for queries with friends’ list
Evaluation results – user experiment 8 users over a week
Summary of results • The simulation results show that peers obtain results faster when searching for files in categories for which they have friends • The user evaluation still underway Does the QoS reward motivate participation?
Social awareness In isolation, selfishness is logical To gain perspective, users needfeedback about their social environment In cities, the sidewalks provide the right kinds and numbers of interactions from which neighborhoods emerge.
Provides visual feedback • Resolves scale • Attractive & interesting A matter of scale An astronomical metaphor
Views of the community • Connectivity (currently reachable peers) • Ranking of peers by contribution • number of shared files • balance of relationships • Papers shared by each peer • Interests of each peer
Architecture Introducing a non-vital server or many servers Server • Collect info. from peers • Generate community views Server Server
Personalized views • Who are my friends in this area? • How strong is my relationship with them? • How much have they contributed? • Do I owe them or do they owe me? • Which files do they share? • What have they been searching for / downloading recently?
Trust • We already model the strength of relationships between users • Based on counting # downloads /uploads • We can incorporate an explicit measure of the quality of resource • Idea: Let users: • Rate their resources (quality of paper) • Add annotations (summaries) of papers
Immediate benefit • Learning effect: compiling reviews of articles • Visualization of document ranking in given category of interest: “top 10 list” Professor / Boss will know who has read and annotated paper and who has not could have a motivation effect on participation.
Reputation • Global reputation of peers can be computed • Ranking of peers based on • how many highly rated papers they share • how many times they have introduced a new paper in the system that has become highly rated • how the users’ ratings correlate with those of their peers and with high-rank peers • Emergence of “Power peers”: • What extra rights will they have (reward)? • Could have a motivational effect, as in Slashdot.com
Community views • Connectivity (currently reachable peers) • What are these peers interested in / sharing • Ranking of peers by contribution • Shared interest clusters • Personalized views (who are my friends?) • Ranking of resources (papers) • Reputation of peers
S P Updating trust in peers • Relationships subjective trust in the source of the paper (the other peer) • Trust depends on the evaluation criteria of the peer • Compare own rating of paper with the rating given by the source If ratings are sufficiently close, increase trust in source, else decrease trust • Trust depends on category of interest • Combined trust measures for peers? • Peers sharetheir trust measures (gossip)
Trust and reputation Yao Wang Ph.D. student Wang Y., Vassileva J. (to appear) Bayesian Network-Based Trust Model, Proc. of IEEE/WIC International Conference on Web Intelligence (WI 2003), October 13-17, 2003, Halifax, Canada. (best paper award nominee)
Applying a Bayesian network trust model to COMTELLA T File quality Paper category (subject area) Reliability (download) Paper rating
Future work • Incorporating a trust & reputation mechanism into Comtella: • to protect from malicious file-sharers • to ensure that users share papers with appropriate peers and benefit most from their articles and comments
“take-home” messages • Motivating user participation is crucial • Building in mechanisms for trust and reputation • Encouraging contribution • building relationships • Rewards by better quality of service • reputation / visibility • Techniques: • Modeling user interests, relationships, user type • Creating community awareness through visualization • Will allow users to find reputable sources • May protect community from malicious or irresponsible peers