1 / 42

Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms

Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms. Aaron J. Elmore, Sudipto Das , Divyakant Agrawal , Amr El Abbadi Distributed Systems Lab University of California Santa Barbara. Cloud Application Platforms. Serve thousands of applications (tenants)

tryphena
Télécharger la présentation

Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms Aaron J. Elmore, Sudipto Das, DivyakantAgrawal, Amr El AbbadiDistributed Systems LabUniversity of California Santa Barbara

  2. Cloud Application Platforms • Serve thousands of applications (tenants) • AppEngine, Azure, Force.com • Tenants are (typically) • Small • SLA sensitive • Erratic load patterns • Subject to flash crowds • i.e. the fark, digg, slashdot,reddit effect (for now) • Support for Multitenancy is critical • Our focus: DBMSs serving these platforms Sudipto Das {sudipto@cs.ucsb.edu}

  3. Multitenancy… What the tenant wants… What the service provider wants… Sudipto Das {sudipto@cs.ucsb.edu}

  4. Cloud Infrastructure is Elastic Static provisioning for peak is inelastic Capacity Resources Resources Capacity Demand Demand Traditional Infrastructures Deployment in the Cloud Time Time Unused resources Slide Credits: Berkeley RAD Lab Sudipto Das {sudipto@cs.ucsb.edu}

  5. Elasticity in a Multitenant DB Load Balancer Application/Web/Caching tier Database tier Sudipto Das {sudipto@cs.ucsb.edu}

  6. Live Database Migration • Migrate a tenant’s database in a Live system • A critical operation to support elasticity • Different from • Migration between software versions • Migration in case of schema evolution Sudipto Das {sudipto@cs.ucsb.edu}

  7. VM Migration for DB Elasticity • VM migration [Clark et al., NSDI 2005] • One tenant-per-VM • Pros: allows fine-grained load balancing • Cons • Performance overhead • Poor consolidation ratio [Curino et al., CIDR 2011] • Multiple tenants in a VM • Pros: good performance • Cons: Migrate alltenants  Coarse-grained load balancing Sudipto Das {sudipto@cs.ucsb.edu}

  8. Problem Formulation • Multiple tenants share the same database process • Shared process multitenancy • Example systems: SQL Azure, ElasTraS, RelationalCloud, and may more • Migrate individual tenants • VM migration cannot be used for fine-grained migration • Target architecture: Shared Nothing • Shared storage architectures: see our VLDB 2011 Paper Sudipto Das {sudipto@cs.ucsb.edu}

  9. Shared nothing architecture Sudipto Das {sudipto@cs.ucsb.edu}

  10. Why is Live Migration hard? • How to ensure no downtime? • Need to migrate the persistent database image (tens of MBs to GBs) • How to guarantee correctness during failures? • Nodes can fail during migration • How to ensure transaction atomicity and durability? • How to recover migration state after failure? • Nodes recover after a failure • How to guarantee serializability? • Transaction correctness equivalent to normal operation • How to minimize migration cost? … Sudipto Das {sudipto@cs.ucsb.edu}

  11. Migration Cost Metrics • Downtime • Time tenant is unavailable • Service Interruption • Number of operations failing/transactions aborting • Migration Overhead/Performance impact • During normal operation, migration, and after migration • Additional Data Transferred • Data transferred in addition to DB’s persistent image Sudipto Das {sudipto@cs.ucsb.edu}

  12. How did we do it? • Migration executed in phases • Starts with transfer of minimal information to destination (“wireframe”) • Source and destination concurrently execute transactions in one migration phase • Database pages used as granule of migration • Pages “pulled” by destination on-demand • Minimal transaction synchronization • A page is uniquely owned by either source or destination • Leverage page level locking • Logging and handshaking protocols to tolerate failures Sudipto Das {sudipto@cs.ucsb.edu}

  13. Simplifying Assumptions • For this talk • Small tenants • i.e. not sharded across nodes. • No replication • No structural changes to indices • Extensions in the paper • Relaxes these assumptions Sudipto Das {sudipto@cs.ucsb.edu}

  14. Design Overview P1 P2 P3 Owned Pages Pn TS1,…, TSk Active transactions Destination Source Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}

  15. Init Mode Freeze index wireframe and migrate P1 P1 P2 P2 P3 P3 Owned Pages Un-owned Pages Pn Pn TS1,…, TSk Active transactions Destination Source Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}

  16. What is an index wireframe? Source Destination Sudipto Das {sudipto@cs.ucsb.edu}

  17. Dual Mode Requests for un-owned pages can block P3 accessed by TDi P1 P1 P2 P2 P3 P3 P3pulled from source Pn Pn TSk+1,…, TSl TD1,…, TDm Old, still active transactions New transactions Destination Source Page owned by Node Index wireframes remain frozen Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}

  18. Finish Mode Pages can be pulled by the destination, if needed P1 P1 P2 P2 P3 P3 P1, P2, … pushed from source Pn Pn TDm+1,…, TDn Completed Destination Source Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}

  19. Normal Operation Index wireframe un-frozen P1 P2 P3 Pn TDn+1,…, TDp Destination Source Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}

  20. Artifacts of this design • Once migrated, pages are never pulled back by source • Transactions at source accessing migrated pages are aborted • No structural changes to indices during migration • Transactions (at both nodes) that make structural changes to indices abort • Destination “pulls” pages on-demand • Transactions at the destination experience higher latency compared to normal operation Sudipto Das {sudipto@cs.ucsb.edu}

  21. Serializability (proofs in paper) • Only concern is “dual mode” • Init and Finish: only one node is executing transactions • Local predicate locking of internal index and exclusive page level locking between nodes  no phantoms • Strict 2PL  Transactions are locally serializable • Pages transferred only once • No Tdest  Tsource conflict dependency • Guaranteed serializability Sudipto Das {sudipto@cs.ucsb.edu}

  22. Recovery (proofs in paper) • Transaction recovery • For every database page, transactions at source ordered before transactions at destination • After failure, conflicting transactions replayed in the same order • Migration recovery • Atomic transitions between migration modes • Logging and handshake protocols • Every page has exactly one owner • Bookkeeping at the index level Sudipto Das {sudipto@cs.ucsb.edu}

  23. Correctness (proofs in paper) • In the presence of arbitrary repeated failures, Zephyr ensures: • Updates made to database pages are consistent • A failure does not leave a page without an owner • Both source and destination are in the same migration mode • Guaranteed termination and starvation freedom Sudipto Das {sudipto@cs.ucsb.edu}

  24. Extensions (Details in the paper) • Replicated Tenants • Sharded Tenants • Allow structural changes to the indices • Using shared lock managers in the dual mode Sudipto Das {sudipto@cs.ucsb.edu}

  25. Implementation • Prototyped using an open source OLTP database H2 • Supports standard SQL/JDBC API • Serializable isolation level • Tree Indices • Relational data model • Modified the database engine • Added support for freezing indices • Page migration status maintained using index • Details in the paper… • Tungsten SQL Routermigrates JDBC connections during migration Sudipto Das {sudipto@cs.ucsb.edu}

  26. Experimental Setup • Two database nodes, each with a DB instance running • Synthetic benchmark as load generator • Modified YCSB to add transactions • Small read/write transactions • Compared against Stop and Copy (S&C) Sudipto Das {sudipto@cs.ucsb.edu}

  27. Experimental Methodology • Default transaction parameters: • 10 operations per transaction 80% Read, 15% Update, 5% Inserts System Controller Metadata Workload: 60 sessions 100 Transactions per session Migrate • Hardware: 2.4 Ghz Intel Core 2 Quads, 8GB RAM, 7200 RPM SATA HDs with 32 MB Cache • Gigabit ethernet • Default DB Size: 100k rows (~250 MB) Sudipto Das {sudipto@cs.ucsb.edu}

  28. Results Overview • Downtime (tenant unavailability) • S&C: 3 – 8 seconds (needed to migrate, unavailable for updates) • Zephyr:No downtime. Either source or destination is available • Service interruption (failed operations) • S&C: ~100 s – 1,000s. All transactions with updates are aborted • Zephyr: ~10s – 100s. Orders of magnitude less interruption Sudipto Das {sudipto@cs.ucsb.edu}

  29. Results Overview • Average increase in transaction latency (compared to the 6,000 transaction workload without migration) • S&C: 10 – 15%. Cold cache at destination • Zephyr: 10 – 20%. Pages fetched on-demand • Data transfer • S&C: Persistent database image • Zephyr: 2 – 3% additional data transfer (messaging overhead) • Total time taken to migrate • S&C: 3 – 8 seconds. Unavailable for any writes • Zephyr: 10 – 18 seconds. No-unavailability Sudipto Das {sudipto@cs.ucsb.edu}

  30. Failed Operations Sudipto Das {sudipto@cs.ucsb.edu} Orders of magnitude fewer failed operations

  31. Contributions • Proposed Zephyr, a live database migration technique with no downtime for shared nothing architectures • The first end to end solution with safety, correctness and liveness guarantees • Prototype implementation on a relational OLTP database • Low cost on a variety of workloads Sudipto Das {sudipto@cs.ucsb.edu}

  32. Back-up

  33. More details Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}

  34. Freeze indexes Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}

  35. Duplicate indexes with sentinels Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}

  36. Dual Mode Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}

  37. Finish Mode Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu} 37

  38. Finish Mode Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}

  39. Guarantees • Either source or destination is serving the tenant • No downtime • Serializable transaction execution • Unique page ownership • Local multi-granularity locking • Safety in the presence of failures • Transactions are atomic and durable • Migration state is recovered from log • Ensure consistency of the database state Sudipto Das {sudipto@cs.ucsb.edu}

  40. Migration Cost Analysis • Wireframe copy • Typically orders of magnitude smaller than data • Operational overhead during migration • Extra data (in addition to database pages) transferred • Transactions aborted during migration Sudipto Das {sudipto@cs.ucsb.edu}

  41. Effect of Inserts on Zephyr Sudipto Das {sudipto@cs.ucsb.edu} Failures due to attempted modification of Index structure

  42. Average Transaction Latency Sudipto Das {sudipto@cs.ucsb.edu} Only committed transaction reported Loss of cache for both migration types Zephyrresults in a remote fetch

More Related