250 likes | 382 Vues
This paper presents innovative solutions for data replication in Distributed Hash Tables (DHTs) to manage file freshness among peers in P2P systems. It introduces a Key-based Timestamp Service (KTS) that issues timestamps for file operations, ensuring only the latest versions are retrieved and updated. The study evaluates performance through experiments on a 64-node cluster, analyzing response times and communication costs in dynamic peer environments. This approach addresses challenges like concurrent updates and peer mobility, paving the way for efficient and reliable data sharing in P2P systems.
E N D
Data Currency in Replicated DHTs Reza Akbarinia, Esther Pacitti and Patrick Valduriez University of Nantes, France, INIRA ACM SIGMOD 2007 Presenter Jerry Wu
Motivation • P2P data sharing systems • Enable large amount of users to share a massive number of files • Query Reply Send request Download • Message forwarding on these systems • Flooding : KaZaA, Gnutella • DHT : CAN, Chord, Pastry, … etc.
Distributed Hash Table (DHT) • Use hash functions to locate files • h(meta data) = k (for identification) • g(k) = k1 (for routing) k1 U A Meta FreeLoop.mp3 B F g(k)=k1 (A) C E D
Data Replication • What if node A fails? • Duplicate several copies U A k1 Meta FreeLoop.mp3 B F g(h(FreeLoop.mp3))=k1 (A) C E g2(h(FreeLoop.mp3))=k2 (D) D k3 g3(h(FreeLoop.mp3))=k3 (E) k2
Basic Operations • putH(meta key k, File D) • Insert a file into the DHT • getH(meta key k) • Retrieve the file from the DHT • : { g(k , D) | g is used as a hash function} |H| : The replication level of the system Each file will be stored at |H| peers
Additional Problems • If the owner can modify the data … • The nature of P2P system • Peers can join and leave dynamically • Update while some peers depart and rejoins later? • Concurrent update?
Solution • If we have a timestamp for each transaction of update/insert ? • The currency of the file is judged by its timestamp • FileX = File + timestamp • Put (k, FileX) instead of (k, File) into the DHT!! • Then we know the freshness of the file • Only the latest update can succeed
How Can We Get A Timestamp? • KTS (Key-based Timestamp Service) • Issue timestamps for each transaction • gen_ts(key k) • Generate a timestamp w.r.t. key k • last_ts(key k) • Return the finally issued timestamp
The New DHT Functions • Based on the KTS service • Insert(key k, FileX D, Hash function set Hr) • Insert or update a file with identity key k into the DHT • Retrieve(k, Hr) • Retrieve the latest copy of the file with identity key k
Insert A File putg2(k, (tA, P.avi)) putg(k, (tA, P.avi)) gen_ts(k)=tA H G h(P.avi)=k KTS Timestamp Service U A k1 Insert P.avi B F g(k)=k1 (A) C E g2(k)=k2 (C) k2 D
Retrieve A File getg2(k) getg(k) last_ts(k)=tA H G h(P.avi)=k KTS Timestamp Service (t0, P.avi) U A k1 Get P.avi B F (tA, P.avi) g(k)=k1 (A) C E g2(k)=k2 (C) k2 D
Update A File • If( tsx > ts0) then • Update File D putg(k, (tsx, File D))
Retrieval Cost Analysis • C = Ckts + N * Cret • Ckts = Cret = O(logn), n = # of peers • Let X be the random variable of N • N : Number of retries to get the latest copy • pt : The probability of finding a fresh copy • Prob(X = i) = pt * (1 - pt)i-1 • |Hr| = number of replicas of the system
Retrieval Cost Analysis • Then, how can we get a timestamp? • Key-based Timestamp Service (KTS)
The KTS Service • Use the same DHT but with different hash function hts 4 3 TimeStamp Request (k) Req(k, hts)=p 1 Hash Table Req (k, hts) Hash Table Req(k, hts) 2
The KTS Service • How can node p generate timestamps w.r.t. key k? • Receive the counters from a leaving peer • DHT system will distribute the load of the leaving peer to its neighbors • Direct initialization • Send a file request w.r.t. key k to obtain the latest timestamp • Take place if the leaving peer fails • Indirect initialization
The KTS Service • Indirect initialization • The probability to fail pf • pf = (1-pt)|H| • If pt = 30%, |H|=13, then pf < 1% • After initialization, increase timestamp on every timestamp request
Experiments And Simulations • Environments • 64 node cluster • 10000 nodes on the SimJava platform • Metrics • Response time : Time to return a current replica in response to a query • Communication cost : # of messages to send to answer a query
The Competitor - BRICKS • Use a function to map key k to multiple keys (k1, k2, k3, k4, …) • Each replica has a version number • Concurrent update problems • Must extract all replicas to find the newest one
Conclusion • Pros • Use DHT to provide timestamp service is smart! • Consider the concurrent update problem • Easy to apply on exiting DHTs • Cons • KTS service can raise additional communication overhead