240 likes | 412 Vues
Security and Privacy in Cloud Computing. Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2011. Lecture 10 09/15/2011. Securing Cloud Computations. Goal : Learn about techniques for verifying computations outsourced to a cloud Review Assignment #5
E N D
Security and Privacy in Cloud Computing Ragib HasanUniversity of Alabama at BirminghamCS 491/691/791 Fall 2011 Lecture 10 09/15/2011
Securing Cloud Computations Goal: Learn about techniques for verifying computations outsourced to a cloud Review Assignment #5 Du et al., RunTest: Assuring Integrity of Dataflow Processing in Cloud Computing Infrastructures, AsiaCCS 2010 Fall 2011 Lecture 10 | UAB | Ragib Hasan
Outsourcing Computations • Goal? • Outsource a computation by sending the following to a cloud • A computation (e.g., a (sequence of operations)) • Input data • Get back the final result data set Fall 2011 Lecture 10 | UAB | Ragib Hasan
Outsourcing Computations: Examples Send a large scale image processing job to a cloud Analyzing a large scale data set Fall 2011 Lecture 10 | UAB | Ragib Hasan
Outsourcing Computations: Model Dataflow computing is the dominant model Declares how things connect(unlike imperative programming, which focuses on how things happen) Data objects flow from one node to another, Each node applies a specific function to data inputs to produce output data Fall 2011 Lecture 10 | UAB | Ragib Hasan
Verifying Dataflow Computations in a Cloud Scenario User sends her data processing job to the cloud. Clouds provide dataflow operation as a service (e.g., MapReduce, Hadoop etc.) Problem: Users have no way of evaluating the correctness of results Fall 2011 Lecture 10 | UAB | Ragib Hasan
Threat Model • Assets: • Confidentiality of • Input data • Output data • Intermediate data • Functions • Integrity of computations Fall 2011 Lecture 10 | UAB | Ragib Hasan
Threat Model • Attacker: • The cloud provider, or an intruder who controls part of the cloud • The attacker can (selectively) modify code running on the inputs, create invalid outputs etc. Fall 2011 Lecture 10 | UAB | Ragib Hasan
Map Reduce Most popular dataflow computing system Invented by Google and at one time widely used for indexing webpages and pageranks Allows large scale reliable computation Fall 2011 Lecture 10 | UAB | Ragib Hasan
MapReduce Overview • Master Assign MapTask • Assign ReduceTask Write to DFS Remote Read Local Write • Read from DFS P1 P1 P1 Output 1 ... … … … … … Pr Pr Pr • R1 • M1 B1 • … … • … … B2 • Input • M2 • … … … … Output r • Rr Bn • Mn • Mapper • Reducer • Map Phase Intermediate Result Reduce Phase • DFS • DFS Fall 2011 Lecture 10 | UAB | Ragib Hasan • 10/32
map map k k k v v v k k k v v v MapReduce: The Map Step Input key-value pairs Intermediate key-value pairs … … k v Fall 2011 Lecture 10 | UAB | Ragib Hasan
Intermediate key-value pairs Key-value groups reduce reduce k k v v k v v v k k k v v v k v v group k v … … k v k v MapReduce: The Reduce Step Output key-value pairs … Fall 2011 Lecture 10 | UAB | Ragib Hasan
Word Count using MapReduce map(key, value): // key: document name; value: text of document for each word w in value: emit(w, 1) reduce(key, values): // key: a word; value: an iterator over counts result = 0 for each count v in values: result += v emit(result)
MapReduce – WordCount Application (Hello, 1) (Bye, 1) Hello World, Bye World! (World, 1) (World, 1) (Hello, 2) (Bye, 1) (Welcome, 1) (to, 3) • M3 • M2 • M1 R1 (Welcome, 1) (to, 1) (to, 1) Welcome to ACSAC, Goodbye to ACSAC. (ACSAC, 1) (Goodbye, 1) (ACSAC, 1) (World, 2) (ACSAC, 2) (Goodbye, 2) (MapReduce, 2) R2 (Hello, 1) (to, 1) Hello MapReduce, Goodbye to MapReduce. (MapReduce, 1) (Goodbye, 1) (MapReduce, 1) • Map Phase Intermediate Result Reduce Phase • DFS • DFS Fall 2011 Lecture 10 | UAB | Ragib Hasan
Verification in Clouds Problem Given just the inputs to each node, how to verify the computation done in a cloud Possible approaches? Re-computation Sampling Replication Auditing Attestation Trusted computing Fall 2011 Lecture 10 | UAB | Ragib Hasan
Re-computation • Key idea: • Re-do the computation • Advantages: • 100% guarantee that any mistakes will always be detected • Disadvantages: • Worst case cost (a check requires equal time and same computation cost as the original computation) Fall 2011 Lecture 10 | UAB | Ragib Hasan
Sampling • Key idea: • Feed known values in the inputs, check for known outcomes in the corresponding outputs • Advantages • Efficient • Disadvantages: • A clever attacker can figure out the test inputs and be honest for that cycle Fall 2011 Lecture 10 | UAB | Ragib Hasan
Replication • Key idea: • Replicate the same computation using multiple set of nodes • Use majority voting to verify correctness • Advantages: • Computationally faster (same speed since all computations can run in parallel) • Disadvantages: • Costly, since multiple copies of same computations need to be run • Can be defeated by a clever adversary Fall 2011 Lecture 10 | UAB | Ragib Hasan
Auditing • Key idea: • Have each node sign inputs, what it has done, and outputs • Later, an auditor can check for correct computation • Advantages: • Provides non-repudiation • Allows forensic investigation • Disadvantages: • Adds to computation time due to the crypto • Expensive audits Fall 2011 Lecture 10 | UAB | Ragib Hasan
Attestation • Key idea: • Verify a code or path of a computation • Advantages: • Can ensure that the correct code was run on the data • Disadvantages: • Expensive to compute Fall 2011 Lecture 10 | UAB | Ragib Hasan
Trusted Computing • Key idea: • Ensure that the cloud nodes are using trustworthy configuration and software • Advantages • Disadvantages Fall 2011 Lecture 10 | UAB | Ragib Hasan
Summary Verifying computations is difficult Provably secure approaches are often very computation-intensive, and therefore not practical Fall 2011 Lecture 10 | UAB | Ragib Hasan