390 likes | 510 Vues
This paper by Barbara Liskov and Rodrigo Rodrigues, presented at MIT CSAIL in April 2005, explores the critical need for secure and reliable internet services that safeguard stored data against various attacks. It highlights the importance of replication protocols that ensure continuous service operation despite failures, distinguishes between types of failures (failstop and Byzantine), and presents the Membership Service (MS) for automatic configuration and enforcing consistency among servers. The paper outlines a system architecture capable of high availability, dynamic membership management, and methods to prevent operator errors.
E N D
Providing Secure Storage on the Internet Barbara Liskov & Rodrigo Rodrigues MIT CSAIL April 2005
Internet Services • Store critical state • Are attractive targets for attacks • Must continue to function correctly in spite of attacks and failures
Replication Protocols • Allow continued service in spite of failures • Failstop failures • Byzantine failures • Byzantine failures really happen! • Malicious attacks
Internet Services 2 • Very large scale • Amount of state • Number of users • Implies lots of servers • Must be dynamic • System membership changes over time
BFT-LS • Provide support for Internet services • Highly available and reliable • Very large scale • Changing membership • Automatic reconfiguration • Avoid operator errors • Extending replication protocols
Outline • Application structure • MS specification • MS implementation • Application methodology • Performance and analysis
C C C C S S S S S S Unreliable Network System Model • Many servers, clients • Service state is partitioned among servers • Each “item” has a replica group • Example applications: file systems, databases
s s s s s s s s s s s s s s s s Client accesses current replica group C
s s s s s s s s s s s s s s s s Client accesses new replica group C
s s s s s s s s s s s s s s s s Client contacts wrong replica group C
The Membership Service (MS) • Reconfigures automatically to reduce operator errors • Provides accurate membership information that nodes can agree on • Ensures clients are up-to-date • Works at large scale
System runs in Epochs • Periods of time, e.g., 6 hours • Membership is static during an epoch • During epoch e, MS computes membership for epoch e+1 • Epoch duration is a system parameter • No more than f failures in any replica group while it is useful
Server IDs • Ids chosen by MS • Consistent hashing • Very large circular id space
Membership Operations • Insert and delete node • Admission control • Trusted authority produces a certificate • Insert certificate includes • ip address, public key, random number, and epoch range • MS assigns the node id ( h(ip,k,n) )
Monitoring • MS monitors the servers • Sends probes (containing nonces) • Some responses must be signed • Delayed response to failures • Timing of probes, number of missed probes, are system parameters • BF nodes (code attestation)
Ending Epochs • Stop epoch after fixed time • Compute the next configuration: Epoch number Adds and Deletes • Sign it • MS has a well known public key • Propagated to all nodes • Over a tree plus gossip
C MS Guaranteeing Freshness • Clients sends a challenge to MS • Response gives client a time periodT during which it may execute requests • T is calculated using client clock <nonce> <nonce, epoch #>σMS
Implementing the MS • At a single dedicated node • Single point of failure • At a group of 3f+1 • Running BFT • No more than f failures in system lifetime • At the servers themselves • Reconfiguring the MS
System Architecture • All nodes run application • 3F+1 run the MS
Implementation Issues • Nodes run BFT • State machine replication (e.g., add, delete) • Decision making • Choosing MS membership • Signing
Decision Making • Each replica probes independently • Removing a node requires agreement • One replica proposes • 2F+1 must agree • Then can run the delete operation • Ending an epoch is similar
Moving the MS • Needed to handle MS node failures • To reduce attack opportunity • Move must be unpredictable • Secure multi-party coin toss • Next replicas are h(c,1), …, h(c,3F+1)
Signing • Configuration must be signed • There is a well-known public key • Proactive secret sharing • MS replicas have shares of private key • F+1 shares needed to sign • Keys are re-shared when MS moves
Changing Epochs: Summary of Steps • Run the endEpoch operation on state machine • Select new MS replicas • Share refreshment • Sign new configuration • Discard old shares
Example Service • Any replicated service • Dynamic Byzantine Quorums dBQS • Read/Write interface to objects • Two kinds of objects • Mutable public-key objects • Immutable content-hash objects
dBQS Object Placement • Consistent hashing • 3f+1 successors of object id are responsible for the object 14 16
Byzantine Quorum Operations • Public-key objects contain • State, signature, version number • Quorum is 2f+1 replicas • Write: • Phase 1: client reads to learn highest v# • Phase 2: client writes to higher v# • Read: • Phase 1: client gets value with highest v# • Phase 2: write-back if some replicas have a smaller v#
dBQS Algorithms – Dynamic Case • Tag all messages with epoch numbers • Servers reject requests for wrong epoch • Clients execute phases entirely in an epoch • Must be holding a valid challenge response • Servers upgrade to new configuration • If needed, perform state transfer from old group • A methodology
Evaluation • Implemented MS, two example services • Ran set of experiments on PlanetLab, RON, local area
MS Scalability • Probes – use sub-committees • Leases – use aggregation • Configuration distribution • Use diffs and distribution trees
Time to reconfigure • Time to reconfigure is small • Variability stems from PlanetLab nodes • Only used F = 1, limitation of APSS protocol
Failure-free Computation • Depends on no more than F failures while group is useful • How likely is this?
Conclusion • Providing support for Internet services • Scalable membership service • Reconfiguring the MS • Dynamic replication algorithms • dBQS – a methodology • Future research • Proactive secret sharing • Scalable applications
Providing Secure Storage on the Internet Barbara Liskov and Rodrigo Rodrigues MIT CSAIL April 2005