What is new in the cloud

1. What is new in the cloud? Donald Kossmann ETH Zurich http://systems.ethz.ch

2. Acknowledgments

3. Questions?

4. Agenda Why? How? What?

5. Simple Truths �Power of data� the more data the merrier (GB -> TB -> PB) data comes from everywhere in all shapes value of data often discovered later data has no owner within an organization (no silos!) Services turn data into $ the more services the merrier (10s -> 1000s -> Ms) need to adapt quickly Examples: Google, FB, Amadeus, Walmart, BMW, ... Platforms: Oracle, MS, SAP, Google, ..., 28msec ?

7. Promises of cloud computing? Cost �pay as you go� for HW and SW no upfront cost / investment: CapEx vs. OpEx scale down if service becomes less popular utilization: statistical allocation of resources out-source and commoditize computing HW automatically gets cheaper and faster economy of scale for admin: patches, backups, etc. failures: cost of preventing and having failures Time to market avoid unnecessary steps HW provisioning, puchasing, test

8. What to optimize?

9. Misconceptions Variable Cost -> Unpredictable Cost pay-as-you-go and predictability can be combined IT department needs to rethink �budget models� Performance is more fundamental than $ at that scale, prices must be honest how relevant are your perf. numbers of 1992 today? technology follows business; business follows technol. Time is money (�secs� ~ �$� in my graphs) often true; often enough not true: Put computing where the energy is (ocean, desert, ...) Writing inner track of disk consumes 2x energy

10. Problem: Vendor Lock-In Hardware no standard APIs for IaaS expensive to move TBs of data between clouds this was actually a solved problem before the cloud Platform PaaS makes it neither better nor worse (situation is very bad as is) Apps and Devices iTunes, Google Docs, Amazon Kindle, iPhone Apps, ... they own your data; you don�t own their (paid for) data

11. Agenda Why? How? What?

12. Teach your DBMS to swim

13. Research Perspective ...

14. Scope of this talk Workloads: Focus on OLTP OLAP under heavy debate by others streaming not addressed yet (~ OLTP) testing, archiving, etc. is boring Types of clouds: Any type both private, public, hybrid only difference: private clouds have planned downtime cloud on the chip swarms: ad-hoc private clouds IaaS vs. PaaS vs. SaaS: Focus on PaaS

15. Game Changers OLTP: �Key-value Store� vs. �DBMS� [No-SQL] virtually infinite scale-out fault-tolerance (OLAP: �Hadoop� vs. �DBMS�) Virtualization transparent use of resources (computers + humans) hide heterogeneity of resources 100Ks machines are a reality problems that need 100Ks machines are a reality

16. Reference Architecture

17. Open Questions How to map stack to IaaS? How to implement store layer? What consistency model? What programming model? Whether and how to cache?

18. Variant I: Partition Workload by �Request�

19. Partition Workload by �Request� Principle partition data by �tenant� route request to DB of that tenant Advantages reuse existing database stack (RDBMS) Disadvantages multi-tenant problem [Salesforce], [Jacobs] optimization, migration, load balancing, fix cost need DB federator for inter-tenant requests expensive HW and SW for high availabilty

20. Variant II: Partition Workload by �Load�

21. Partition Workload by �Load� Principle fine-grained data partitioning by page or object any server can handle any request implement DBMS as a library (not server) Advantages avoids disadvantages of Variant I Disadvantages new synchronization problem (CAP theorem) whole new breed of systems caching not effective (see later)

22. Experiments [Loesing et al. 2010] TPC-W Benchmark throuphput: WIPS latency: fixed depending on request type cost: cost / WIPS, total cost, predictability Players Amazon RDS, SimpleDB 28msec [Brantner et al. 2008] Google AppEngine Microsoft Azure

23. Scale-up Experiments

24. Cost / WIPS (m$)

25. Open Questions How to map traditional DB stack to IaaS? How to implement the storage layer? What is the right consistency model? What is the right programming model? Whether and how to make use of caching?

26. Store Variants Traditional (e.g., Amazon EBS) local disks with physically exclusive access put/get interface; no synchronization only works for V1 Key-value stores (e.g., Amazon S3) DHTs with concurrent access put/get interface; no synchronization works for V1 and V2; makes more sense for V2 ClockScan [Unterbrunner et al. 2009] massively shared scans in a distributed system push down predicates + simple aggr; write monotonicity works well for both variants

27. ClockScan Key ideas each core continuously scans one partition in MM while scanning, it executes queries/updates on the fly queries and updates are indexed; tuples probed just as in the stream processing world but queries are short-lived updates are processed before reads Properties very high query and update throughput (1000s / sec) predictable and guaranteed response times good enough, but not optimal write monotonicity at store level (more than disk)


29. CAP Theorem Three properties of distributed systems Consistency (ACID transactions w. serializability) Availability (nobody is ever blocked) resilience to network Partitioning Result it is trivial to achieve 2 out of 3 it is impossible to have all three Two schools Databases: sacrifice availability Distributed systems: sacrifice consistency

30. Why sacrifice Consistency? It is a simple solution nobody understands what sacrificing �P� means sacrificing �A� is unacceptable in the Web possible to push the problem to app developer �C� not needed in many applications Banks do not implement ACID (classic example wrong) Airline reservation only transacts reads (Huh?) MySQL et al. ship by default in lower isolation level Data is noisy and inconsistent anyway making it, say, 1% worse does not matter

31. What have people done? Client-side Consistency Models [Tannenbaum],[PNUTS08] New DB transaction models Escrow, Reservation Pattern [O�Neil 86], [Gawlick 09] SAGAs and compensation; e.g., in BPEL [G.-Molina,Salem] SAP, Amadeus et al. [Buck-Emden], [Kemper et al. 98] Limit the size of transacted data E.g., Microsoft Azure Levels of Consistency, Consistency-Cost Tradeoffs read/write monotonicy + �A� + �P� [Brantner08] economic models for consistency [Amadeus], [Kraska09] Educate Application Developers [Helland 2009]

32. Does it matter? How far do traditional (monolithic) DBMSes go? unlimited scalability for all practical matters high availability for all practical matters monolithic DBMSes still hold records in all regards That is why we focus on the $ tradeoffs it is not a principle / religious matter it is a $ optimization problem


34. Programming Model Properties of a programming lang. for the cloud support DB-style + OO-style + CEP-style avoid keeping state at servers for V2 architecture Many languages will work in the cloud SQL, XQuery, Ruby, ...; we have shown it for XQuery J2EE will not work Open (research) questions do OLAP on the OLTP data: My guess is yes! rewrite your apps: My guess is yes!

35. Caching Many Variants Possible this is just one V1 caching mandatory V2 caching prohibitive TPC-W Experiments marginal improvements for Google AppEngine No low hanging fruit

36. Agenda Why? How? What?

37. What is Sausalito? Application Server + Web Server + Database keeps any kind of data runs services Fully cloud-enabled full elasticity (cost and throughput) full fault-tolerance runs on cheap hardware (private and public clouds) Fully Web Standard compliant Web Services, REST XML, JSON, CSV, ... XML Schema, XQuery, XPath

38. Sausalito in the Cloud (V2) 38

39. Sausalito in the Cloud (offline)

40. Bets Made How to map traditional DB stack to IaaS? implemented both architectures (V1 + V2) V1 only in a single server variant for low end How to implement the storage layer? EBS for V1; KVS for V2 What is the right consistency model? ACID for V1; configurable for V2 What is the right data + programming model? XML & XQuery Whether and how to make use of caching? No! (Only for code / precompiled query plans)

41. Demo Getting started guide http://sausalito.28msec.com Example applications http://www.28msec.com/community

42. Cloud: Fans and Skeptics Fans VCs: low CapEx, Gartner hype USA Government: lack of alternative Departments: time-to-market, by-pass IT dept. USA Researchers: next big thing IT start-ups: levels the field Skeptics EU Government: next big USA thing EU Researchers: burnt by Grid Computing IT department: lock-in, become irrelevant Big enterprise IT vendors: low margins, forced to adapt

43. XML & XQuery: Fans and Skeptics Fans Large enterprises: reduces cost, helps abbandon silos EU Research: scientific challenge in PL, type theory, ... Government: lack of alternatives, standards, complete Skeptics VCs: do not understand the market Web 2.0: hard and boring, expensive USA Database Research: religion Need intersection of fans for the bets made ?

44. Conclusion Researchers study tradeoffs Key-values stores are game changers Measuring $ is a game changer MMDBs (ClockScan) could be a game changer Entrepreneurs make bets Pay per use is a game changer XML & XQuery could be game changers Personal experience: You cannot do both! You cannot play and observe at the same time [Heisenberg]

What is new in the cloud

What is new in the cloud

Presentation Transcript

What is Cloud Computing?

What is Cloud Computing

What is The Cloud ?

What is Cloud Computing?

What is Cloud Computing

What is Cloud Computing?

What is Cloud Backup?

What is Cloud Accounting?

What is the cloud?

What is Cloud Computing?

What is Cloud Computing ?

What Is “The Cloud”?

What is Cloud computing

What Is Private Cloud?

What is cloud technology?

What is hybrid Cloud?

What is Cloud computing?

What is Cloud Computing

What is cloud computing ?

What is cloud foundry?

What is cloud computing?

What is Cloud Hosting?