Enhancing Accountability in Distributed Systems: Practical Implementations and Challenges

Accountable distributed systems and the accountable cloud Peter Druschel joint work with Andreas Haeberlen1, PetrKuznetsov2, Rodrigo Rodrigues 1 University of Pennsylvania 2 TU Berlin/Deutsche Telekom Labs Building and Programming the Cloud, Mysore, Jan 2010

Outline • Why accountability? • A definition • A practical implementation: PeerReview • Accountability in the Cloud • Technical Challenges • Conclusion Building and Programming the Cloud, Mysore, Jan 2010

Whatistheproblem? • Multiple administrative domains (federated, p2p) • Multiple stakeholders (hosting, Web) • different actors, somewhat different interests • lack of global visibility, control • Complex faults • software faults, mis-configuration, negligence, disgruntled employees, outside attacks, manipulation • Lack of transparency Building and Programming the Cloud, Mysore, Jan 2010

Learning fromthe 'offline' world • Reliesheavily on accountabilityto deal withfaults, misbehavior • Example: Banking • Recordcanbeusedto (manually) • detectproblems • identifytheresponsibleparty • convincethat a problemdoes (not) exist Building and Programming the Cloud, Mysore, Jan 2010

What does accountability mean in distributed systems? • Tamper-evident recordofeachnode‘sactions • (Automated) auditfor fault detection, localization • Evidencetoconvince a thirdpartythat a fault has (not) occured • Accountability provides • transparency • trust • incentives to avoid faults Building and Programming the Cloud, Mysore, Jan 2010

Ideal accountability Whenever a node is faulty, the system generates a proof of misbehavior against that node • Fault := Node deviates from expected behavior • Our goal is to automatically • detect faults • identify the faulty nodes • convince others that a node is (or is not) faulty • Can we build a system that provides the following guarantee? Building and Programming the Cloud, Mysore, Jan 2010

0 X Can we detect all faults? 100101011000101101011100100100 • Problem: Faults that affect only a node's internal state • Would require online trusted probes at each node • Focus on observable faults: • Faults that affect a correct node • Can detect observable faults without requiring trusted components A C Building and Programming the Cloud, Mysore, Jan 2010

Can we always get a proof? I sent X! A • Problem: He-said-she-said • Threepossiblecauses: • A neversent X • B refusestoacknowledge X • X was delayedbythenetwork • Cannotgetproofofmisbehavior! • Generalizetoverifiableevidence: • a proofofmisbehavior, or • a challengethat a faultynodecannotanswer • Whatifthechallengednodedoes not respond? • Does not prove a fault, but nodeissuspecteduntilitresponds X ? B I neverreceived X! ?! C Building and Programming the Cloud, Mysore, Jan 2010

Practical accountability • Requirementfor an accountabledistributedsystem: • Thisisuseful • Any (!) fault thataffects a correctnodeiseventuallydetectedandlinkedto a faultynode • Itcanbeimplemented in practice Whenever a fault isobservedby a correctnode, thesystemeventuallygeneratesverifiableevidenceagainst a faultynode Building and Programming the Cloud, Mysore, Jan 2010

PeerReview Addsaccountabilityto a givensystem • Implementedas a library • Providestamper-evident record • Detectsfaults via state-machinereplay Assumptions: • Nodes canbemodeledasdeterministicstatemachines • Thereis a trustedreferenceimplementationofthestatemachines • Correctnodescaneventuallycommunicate • Nodes cansignmessages Building and Programming the Cloud, Mysore, Jan 2010

PeerReview is widely applicable • App #1: NFS server in the Linux kernel • Many small, latency-sensitive requests • Tampering with files • Lost updates • App #2: Overlay multicast • Transfers large volume of data • Freeloading • Tampering with content • App #3: P2P email • Complex, large, decentralized • Denial of service • Attacks on DHT routing • Details in [Haeberlen et al., SOSP’07] • NetReview [Haeberlen et al. NSDI’08] • Metadata corruption • Incorrect access control • Censorship Building and Programming the Cloud, Mysore, Jan 2010

How much does PeerReview cost? • Log storage • 10 – 100 GByte per month, depending on application • Message signatures • Message latency (e.g. 1.5ms RTT with RSA-1024) • CPU overhead (embarrassingly parallel) • Log/authenticator transfer, replay overhead • Depends on # witnesses • Can be deferred to exploit bursty/diurnal load patterns Building and Programming the Cloud, Mysore, Jan 2010

Split administration in theCloud • Bug in Alice‘ssoftware • Subtledifferencesbetween Alice andBob‘senvironments • ... Alice Alice's customers Bob • Whatifthereis a problem? • Bug in Bob‘ssoftware • Insufficientresourceallocation • Hacker attack • ... Building and Programming the Cloud, Mysore, Jan 2010

Split administraction: Alice‘sperspective ? ? ? ? ? ? ? ? Alice Alice's customers Bob • If something is wrong, how will I know? • How can I tell if it's my software or the cloud? • If it's the cloud, how can I convince Bob? Building and Programming the Cloud, Mysore, Jan 2010

Split administraction: Bob'sperspective ? ? ? ? ? ? ? ? ? ? ? ? ? Alice Alice's customers Bob • If something is wrong, how will I know? • How can I tell if it's the cloud or Alice's software? • If it's Alice's software, how can I convince Alice? • If something is wrong, how will I know? • How can I tell if it's my software or the cloud? • If it's the cloud, how can I convince Bob? Building and Programming the Cloud, Mysore, Jan 2010

An idealized solution • Whatifwehad an oraclethat Alice and Bob couldaskaboutproblems? • Completeness:Ifthecloudisfaulty, theoracle will say so • Accuracy:Ifthecloudisnotfaulty, theoracle will say so • Verifiability: The oracleproducesevidencethatwouldconvince a thirdparty Alice Alice's customers Bob Oracle Building and Programming the Cloud, Mysore, Jan 2010

The accountablecloud • Idea: Makecloudaccountable • Cloudrecordsitsactions in a tamper-evident log • Alice canauditthe log and check forfaults • Use log toconstructevidencethat a fault does (not) exist • Shouldworkevenifoneparty was compromised! Alice Alice's customers Tamper-evidentlog Bob Building and Programming the Cloud, Mysore, Jan 2010

Discussion • Is thistoopessimistic? Cloudisn'tmalicious! • Hacker attacks, softwarebugs, operatorerror, maliciousclient, … • Difficulttocomeupwith a morerestrictive fault model • Withoutprovableproperties, evidencehaslittlevalue • Whywould a providerwanttodeploythis? • Attractivetoprospectivecustomers (peaceofmind) • Helps in handlingcustomercomplaints, resolvedisputes Building and Programming the Cloud, Mysore, Jan 2010

Is the technology ready? • Cloudaccountabilityshould • Haveprovableguarantees • Work formostcloudapplications • Requirenochangestoapplicationcode • Cover a widespectrumofproperties • Havereasonableoverhead • Can existingtechniquesdeliverthis? • CATS, Repeat&Compare, AIP, PeerReview, NetReview, AudIt, ... • More workisneeded!   ? ? ? Building and Programming the Cloud, Mysore, Jan 2010

Work in progress: AVM Virtual machine • Goal: Provide accountability for arbitrary binary executables • Idea: Accountable virtual machine (AVM) • Cloud records enough data to enable deterministic replay • Alice can replay log against a reference implementation • Can audit any part of the hostedexecution Alice Bob Building and Programming the Cloud, Mysore, Jan 2010

Challenges • Complete state-machine replay expensive • limit to spot checks, investigation of suspected faults • multi-core replay is hard • replay log against an abstract model? • Checking performance properties • Checking information flow • Lots of research opportunities Building and Programming the Cloud, Mysore, Jan 2010

Summary • Accountability is a useful capability in distributed systems • tamper-evident record • fault detection and localization • evidence • Proposal: the accountable cloud • Can verify correct operation, produce evidence • Provable guarantees  solid foundation for both players • Challenges remain Questions? Building and Programming the Cloud, Mysore, Jan 2010

Enhancing Accountability in Distributed Systems: Practical Implementations and Challenges

Enhancing Accountability in Distributed Systems: Practical Implementations and Challenges

Presentation Transcript

ACCOUNTABLE TALK

Accountable Capital

Accountable Capital

Accountable Talk

Accountable distributed systems and the accountable cloud

Accountable Talk

Accountable Leadership

ACCOUNTABLE TALK

Accountable Talk

Accountable Leadership

Accountable Talk

Holding the Internet Accountable

The Battle for Accountable Voting Systems

Accountable Systems: Fusion Center Prototype

Accountable and Responsible

Accountable articles

Accountable Talk

Accountable Mails

Accountable and Responsible

The Battle for Accountable Voting Systems

ACCOUNTABLE RESEARCH