Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Dr

Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea

What is PAST ? • Archival storage and content distribution utility • Not a general purpose file system • Stores multiple replicas of files • Caches additional copies of popular files in the local file system

How it works • Built over a self-organizing, Internet-based overlay network • Based on Pastry routing scheme • Offers persistent storage services for replicated read-only files • Owners can insert/reclaim files • Clients just lookup

PAST Nodes • The collection of PAST nodes form an overlay network • Minimally, a PAST node is an access point • Optionally, it contributes to storage and participate in the routing

PAST operations • fileId = Insert(name, owner-credentials, k, file); • file = Lookup(fileId); • Reclaim(fileId, owner-credentials);

Insertion • fileId computed as the secure hash of name, owner’s public key, salt • Stores the file on the k nodes whose nodeIds are numerically closest to the 128 msb of fileId • Remember from Pastry: each node has a 128-bit nodeId (circular namespace)

Insert contd • The required storage is debited against the owner’s storage quota • A file certificate is returned • Signed with owner’s private key • Contains: fileId, hash of content, replication factor + others • The file & certificate are routed via Pastry • Each node of the k replica storing nodes attach a store receipt • Ack sent back after all k-nodes have accepted the file

Lookup & Reclaim • Lookup: Pastry locates a “near” node that has a copy and retrieves it • Reclaim: weak consistency • After it, a lookup is no longer guaranteed to retrieve the file • But, it does not guarantee that the file I no longer available

Security • Each PAST node and each user of the system hold a smartcard • Private/public key pair is associated with each card • Smartcards generate and verify certificates and maintain storage quotas

More on Security • Smartcards ensures integrity of nodeId and fileId assignments • Store receipts prevent malicious nodes to create fewer than k copies • File certificates allow storage nodes and clients to verify integrity and authenticity of stored content, or to enforce the storage quota

Storage Management • Based on local coordination among nodes nearby with nearby nodeIds • Responsibilities: • Balance the free storage among nodes • Maintain the invariant that replicas for each file are are stored on k nodes closest to its fileId

Causes for storage imbalance & solutions • The number of files assigned to each node may vary • The size of the inserted files may vary • The storage capacity of PAST nodes differs • Solutions • Replica diversion • File diversion

Replica diversion • Recall: each node maintains a leaf set • l nodes with nodeIds numerically closest to given node • If a node A cannot accommodate a copy locally, it considers replica diversion • A chooses B in its leaf set and asks it to store the replica • Then, enters a pointer to B’s copy in its table and issues a store receipt

Policies for accepting a replica • If (file size/remaining free storage) > t • Reject • t is a fixed threshold • T has different values for primary replica ( nodes among k numerically closest ) and diverted replica ( nodes in the same leaf set, but not k closest ) • t(primary) > t(diverted)

File diversion • When one of the k nodes declines to store a replica  try replica diversion • If the chosen node for diverted replica also declines  the entire file is diverted • Negative ack is sent, the client will generate another fileId, and start again • After 3 rejections the user is announced

Maintaining replicas • Pastry uses keep-alive messages and it adjusts the leaf set after failures • The same adjustment takes place at join • What happens with the copies stored by a failed node ? • How about the copies stored by a node that leaves or enters a new leaf set ?

Maintaining replicas contd • To maintain the invariant ( k copies ) the replicas have to be re-created in the previous cases • Big overhead • Proposed solution for join: lazy re-creation • First insert a pointer to the node that holds them, then migrate them gradually

Caching • The k replicas are maintained in PAST for availability • The fetch distance is measured in terms of overlay network hops ( which doesn’t mean anything for the real case ) • Caching is used to improve performance

Caching contd • PAST uses the “unused” portion of their advertised disk space to cache files • When store a new primary or a diverted replica, a node evicts one or more cached copies • How it works: a file that is routed through a node by Pastry ( insert or lookup ) is inserted into the local cache f its size < c • c is a fraction of the current cache size

Conclusions • Along with Tapestry, Chord(CFS), and CAN represent peer-to-peer routing and location schemes for storage • The ideas are almost the same in all of them • Questions raised at SOSP about them: • Is there any real application for them ? • Who will trust these infrastructures to store his/her files ?

Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Dr

Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Dr

Presentation Transcript

Pastry Antony Rowstron and Peter Druschel

Option 2: The Oceanic Data Utility: Global-Scale Persistent Storage

OceanStore Global-Scale Persistent Storage

Peer-to-Peer (P2P) Distributed Storage

Persistent Storage

Energy Storage | Utility Scale Energy Storage - A Project Profile

Samsara: Honor Among Thieves in Peer-to-Peer Storage

The Oceanic Data Utility: (OceanStore) Global-Scale Persistent Storage

Replica Control for Peer-to-Peer Storage Systems

Storage management and caching in PAST

Peer-to-peer Multimedia Streaming and Caching Service

Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

Peer-to-peer Multimedia Streaming and Caching Service

Peer-to-Peer (P2P) Storage Systems CSE 581 Winter 2002

Replica Control for Peer-to-Peer Storage Systems

Bidding for storage space in a peer-to-peer data preservation system

Persistent Storage

OceanStore Global-Scale Persistent Storage

OceanStore Exploiting Peer-to-Peer for a Self-Repairing, Secure and Persistent Storage Utility