400 likes | 515 Vues
Peer-To-Peer Data Management. Hector Garcia-Molina ICDE Conference, February 28, 2002. What is P2P?. pastry. jxta. can. fiorana. napster. freenet. united devices. open cola. ?. aim. ocean store. netmeeting. farsite. gnutella. icq. ebay. maorpheus. limewire. seti@home.
E N D
Peer-To-Peer Data Management Hector Garcia-Molina ICDE Conference, February 28, 2002
What is P2P? pastry jxta can fiorana napster freenet united devices open cola ? aim ocean store netmeeting farsite gnutella icq ebay maorpheus limewire seti@home bearshare uddi grove jabber popular power kazaa folding@home tapestry mojo nation process tree chord
join get query file answer Napster central index ...
query Gnutella
... ... ... ... ... ... Morpheus super peer
Seti@Home satellite dish raw data chunk analyzed data central site ...
Lockss D3 D1 library D library A D2 library C library B library E
before: after: Stanford source PeerCast Stanford source
What is a P2P System? • Multiple sites (at edge) • Distributed resources • Sites are autonomous (different owners) • Sites are both clients and servers • Sites have equal functionality P2P Purity
P2P is BAD IDEA!! • Distribution is expensive! • Specialized functionality is good!
Example: Distributed Data Management • Distribution is expensive • If you must distribute: • build centralized directory, index • use backups for reliability • for replicated data, use primary copy • use backups for reliability
Computational Efficiency is NOT Main Goal • Main driving force in a P2P system: • exploiting existing (often free) resources • sharing costs among many • legal protection • autonomy • anonymity
Should We Do P2P Research? • Should we help people break the law? • Analogy: Should we develop pillows, knives, hammers, drugs, bath tubs, cars, airplanes, ... ??
Should We Do P2P Research? • YES: P2P not exclusively for breaking law • Remember the VCR • YES: P2P can liberate us from culture “plantation owners” (Lessig)
today economic activity rules of the game Is “Free Culture’’ Feasible? • Example: Legal texts • Can we afford it?
P2P Challenges • Easier to list NON-Research-Topics: • Color schemes for P2P Nodes • Impact of P2P on Moroccan 15th Century Literature
P2P Challenges • Search • Resource Management • Security & Privacy
Search Taxonomy lookup freenet can partial replicated SP gnutella content queries search morpheus napster routing global single site regional scope of index
Index Implementation Taxonomy routing replicated SP freenet yes gnutella morpheus index location correlated with content location partial napster no can P2P centralized distributed nature of index
Content Addressable Network (CAN) Nodes 1 Data 2
Can We Improve Flooding? routing replicated SP freenet yes gnutella morpheus index location correlated with content location partial napster no can P2P centralized distributed nature of index
Directed BFS in Gnutella • Heuristics for Selecting Direction >RES: Returned most results <TIME: Shortest satisfaction time <HOPS: Min hops for results >MSG: Sent us most messages (all types) <QLEN: Shortest queue <LAT: Shortest latency >DEG: Highest degree ? ... query
How Does One Evaluate? • Live Gnutella? • Use real Gnutella as “laboratory”
DB AI A 0 20 Q(DB) DB AI C 25 50 B 65 20 DB AI B 50 0 A 0 20 B 90 50 DB AI C 25 50 D 15 0 D 15 0 B 75 70 Routing Index C A B D
Types of Routing Indexes • Compound • Hop Count • Exponential Decay • Strategies for Cycles • Ignore (for Hop-Count, exponential) • Avoid Update Cycles • Detect Update Cycles and Recover
Resource Management • Resource: • storage (lockss) • CPU processing (seti@home) • bandwidth (PeerCast) • Issues: • fairness • load balancing
trade trade A1 A2 B1 B2 Example: Data Trading site 1 site 2 site 3 A1 C1 B1 A2 B2 C2
trade trade A1 B2 C2 B1 A2 trade C1 Example: Data Trading site 1 site 2 site 3 A1 C1 B1 A2 B2 C2
Data Trading • Order of trades impacts reliability • Issues: • Swaps vs. Deeds • Fixed price vs. bids • Preference to • sites with a lot of space? • reliable sites? • “desperate” sites?
Effect of Bid Policies bid more (ask more in return) when I have less free space bid more (ask more in return) when I have more free space
Effect of One Maverick Site always bids high
Security & Privacy • Issues: • Anonymity • Reputation • Accountability • Information Preservation • Information Quality • Trust • Denial of service attacks
Information Preservation • Example Policy: make 3 copies of documents A1 make copies What can go wrong?
What Can Go Wrong? • “Bad” sites make copies • “Bad” site alters copy • “Bad” site publishes fake • “Bad” site makes may copies of other docs • ... A1 A1 make copies A’1
Conclusion • P2P systems popular today • P2P systems vulnerable and inefficient • Many challenges ahead • Search • Resource Management • Security and Privacy
For Additional Information • Google: “Stanford Peers” • http://www-db.stanford.edu/peers/