Providing Secure Storage on the Internet: Strategies for Reliability and Availability

Providing Secure Storage on the Internet Barbara Liskov & Rodrigo Rodrigues MIT CSAIL April 2005

Internet Services • Store critical state • Are attractive targets for attacks • Must continue to function correctly in spite of attacks and failures

Replication Protocols • Allow continued service in spite of failures • Failstop failures • Byzantine failures • Byzantine failures really happen! • Malicious attacks

Internet Services 2 • Very large scale • Amount of state • Number of users • Implies lots of servers • Must be dynamic • System membership changes over time

BFT-LS • Provide support for Internet services • Highly available and reliable • Very large scale • Changing membership • Automatic reconfiguration • Avoid operator errors • Extending replication protocols

Outline • Application structure • MS specification • MS implementation • Application methodology • Performance and analysis

C C C C S S S S S S Unreliable Network System Model • Many servers, clients • Service state is partitioned among servers • Each “item” has a replica group • Example applications: file systems, databases

s s s s s s s s s s s s s s s s Client accesses current replica group C

s s s s s s s s s s s s s s s s Client accesses new replica group C

s s s s s s s s s s s s s s s s Client contacts wrong replica group C

The Membership Service (MS) • Reconfigures automatically to reduce operator errors • Provides accurate membership information that nodes can agree on • Ensures clients are up-to-date • Works at large scale

System runs in Epochs • Periods of time, e.g., 6 hours • Membership is static during an epoch • During epoch e, MS computes membership for epoch e+1 • Epoch duration is a system parameter • No more than f failures in any replica group while it is useful

Server IDs • Ids chosen by MS • Consistent hashing • Very large circular id space

Membership Operations • Insert and delete node • Admission control • Trusted authority produces a certificate • Insert certificate includes • ip address, public key, random number, and epoch range • MS assigns the node id ( h(ip,k,n) )

Monitoring • MS monitors the servers • Sends probes (containing nonces) • Some responses must be signed • Delayed response to failures • Timing of probes, number of missed probes, are system parameters • BF nodes (code attestation)

Ending Epochs • Stop epoch after fixed time • Compute the next configuration: Epoch number Adds and Deletes • Sign it • MS has a well known public key • Propagated to all nodes • Over a tree plus gossip

C MS Guaranteeing Freshness • Clients sends a challenge to MS • Response gives client a time periodT during which it may execute requests • T is calculated using client clock <nonce> <nonce, epoch #>σMS

Implementing the MS • At a single dedicated node • Single point of failure • At a group of 3f+1 • Running BFT • No more than f failures in system lifetime • At the servers themselves • Reconfiguring the MS

System Architecture • All nodes run application • 3F+1 run the MS

Implementation Issues • Nodes run BFT • State machine replication (e.g., add, delete) • Decision making • Choosing MS membership • Signing

Decision Making • Each replica probes independently • Removing a node requires agreement • One replica proposes • 2F+1 must agree • Then can run the delete operation • Ending an epoch is similar

Moving the MS • Needed to handle MS node failures • To reduce attack opportunity • Move must be unpredictable • Secure multi-party coin toss • Next replicas are h(c,1), …, h(c,3F+1)

Signing • Configuration must be signed • There is a well-known public key • Proactive secret sharing • MS replicas have shares of private key • F+1 shares needed to sign • Keys are re-shared when MS moves

Changing Epochs: Summary of Steps • Run the endEpoch operation on state machine • Select new MS replicas • Share refreshment • Sign new configuration • Discard old shares

Example Service • Any replicated service • Dynamic Byzantine Quorums dBQS • Read/Write interface to objects • Two kinds of objects • Mutable public-key objects • Immutable content-hash objects

dBQS Object Placement • Consistent hashing • 3f+1 successors of object id are responsible for the object 14 16

Byzantine Quorum Operations • Public-key objects contain • State, signature, version number • Quorum is 2f+1 replicas • Write: • Phase 1: client reads to learn highest v# • Phase 2: client writes to higher v# • Read: • Phase 1: client gets value with highest v# • Phase 2: write-back if some replicas have a smaller v#

dBQS Algorithms – Dynamic Case • Tag all messages with epoch numbers • Servers reject requests for wrong epoch • Clients execute phases entirely in an epoch • Must be holding a valid challenge response • Servers upgrade to new configuration • If needed, perform state transfer from old group • A methodology

Evaluation • Implemented MS, two example services • Ran set of experiments on PlanetLab, RON, local area

MS Scalability • Probes – use sub-committees • Leases – use aggregation • Configuration distribution • Use diffs and distribution trees

Fetch Throughput

Time to reconfigure • Time to reconfigure is small • Variability stems from PlanetLab nodes • Only used F = 1, limitation of APSS protocol

dBQS Performance

Failure-free Computation • Depends on no more than F failures while group is useful • How likely is this?

Probability of Choosing a Bad Group

Probability that the System Fails

Conclusion • Providing support for Internet services • Scalable membership service • Reconfiguring the MS • Dynamic replication algorithms • dBQS – a methodology • Future research • Proactive secret sharing • Scalable applications

Providing Secure Storage on the Internet Barbara Liskov and Rodrigo Rodrigues MIT CSAIL April 2005

Providing Secure Storage on the Internet: Strategies for Reliability and Availability