200 likes | 326 Vues
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea. What is PAST ?. Archival storage and content distribution utility Not a general purpose file system Stores multiple replicas of files
E N D
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea
What is PAST ? • Archival storage and content distribution utility • Not a general purpose file system • Stores multiple replicas of files • Caches additional copies of popular files in the local file system
How it works • Built over a self-organizing, Internet-based overlay network • Based on Pastry routing scheme • Offers persistent storage services for replicated read-only files • Owners can insert/reclaim files • Clients just lookup
PAST Nodes • The collection of PAST nodes form an overlay network • Minimally, a PAST node is an access point • Optionally, it contributes to storage and participate in the routing
PAST operations • fileId = Insert(name, owner-credentials, k, file); • file = Lookup(fileId); • Reclaim(fileId, owner-credentials);
Insertion • fileId computed as the secure hash of name, owner’s public key, salt • Stores the file on the k nodes whose nodeIds are numerically closest to the 128 msb of fileId • Remember from Pastry: each node has a 128-bit nodeId (circular namespace)
Insert contd • The required storage is debited against the owner’s storage quota • A file certificate is returned • Signed with owner’s private key • Contains: fileId, hash of content, replication factor + others • The file & certificate are routed via Pastry • Each node of the k replica storing nodes attach a store receipt • Ack sent back after all k-nodes have accepted the file
Lookup & Reclaim • Lookup: Pastry locates a “near” node that has a copy and retrieves it • Reclaim: weak consistency • After it, a lookup is no longer guaranteed to retrieve the file • But, it does not guarantee that the file I no longer available
Security • Each PAST node and each user of the system hold a smartcard • Private/public key pair is associated with each card • Smartcards generate and verify certificates and maintain storage quotas
More on Security • Smartcards ensures integrity of nodeId and fileId assignments • Store receipts prevent malicious nodes to create fewer than k copies • File certificates allow storage nodes and clients to verify integrity and authenticity of stored content, or to enforce the storage quota
Storage Management • Based on local coordination among nodes nearby with nearby nodeIds • Responsibilities: • Balance the free storage among nodes • Maintain the invariant that replicas for each file are are stored on k nodes closest to its fileId
Causes for storage imbalance & solutions • The number of files assigned to each node may vary • The size of the inserted files may vary • The storage capacity of PAST nodes differs • Solutions • Replica diversion • File diversion
Replica diversion • Recall: each node maintains a leaf set • l nodes with nodeIds numerically closest to given node • If a node A cannot accommodate a copy locally, it considers replica diversion • A chooses B in its leaf set and asks it to store the replica • Then, enters a pointer to B’s copy in its table and issues a store receipt
Policies for accepting a replica • If (file size/remaining free storage) > t • Reject • t is a fixed threshold • T has different values for primary replica ( nodes among k numerically closest ) and diverted replica ( nodes in the same leaf set, but not k closest ) • t(primary) > t(diverted)
File diversion • When one of the k nodes declines to store a replica try replica diversion • If the chosen node for diverted replica also declines the entire file is diverted • Negative ack is sent, the client will generate another fileId, and start again • After 3 rejections the user is announced
Maintaining replicas • Pastry uses keep-alive messages and it adjusts the leaf set after failures • The same adjustment takes place at join • What happens with the copies stored by a failed node ? • How about the copies stored by a node that leaves or enters a new leaf set ?
Maintaining replicas contd • To maintain the invariant ( k copies ) the replicas have to be re-created in the previous cases • Big overhead • Proposed solution for join: lazy re-creation • First insert a pointer to the node that holds them, then migrate them gradually
Caching • The k replicas are maintained in PAST for availability • The fetch distance is measured in terms of overlay network hops ( which doesn’t mean anything for the real case ) • Caching is used to improve performance
Caching contd • PAST uses the “unused” portion of their advertised disk space to cache files • When store a new primary or a diverted replica, a node evicts one or more cached copies • How it works: a file that is routed through a node by Pastry ( insert or lookup ) is inserted into the local cache f its size < c • c is a fraction of the current cache size
Conclusions • Along with Tapestry, Chord(CFS), and CAN represent peer-to-peer routing and location schemes for storage • The ideas are almost the same in all of them • Questions raised at SOSP about them: • Is there any real application for them ? • Who will trust these infrastructures to store his/her files ?