140 likes | 228 Vues
Learn about a high availability virtual machine system that protects VMs from host failures, ensuring prompt recovery, memory efficiency, and resource optimization.
E N D
Memory-efficient Virtual Machine High Availability Karen Kai-Yuan Hou Prof. Kang G. Shin University of Michigan Mustafa Uysal (VMware) Arif Merchant (HP Labs) SharadSinghal (HP Labs)
Protect VM from Host Failures • Set up backup by primary VM replication • Backup takes over execution promptly if primary fails • High memory costE.g. To protect a 1G VM, an additional 1G memory is reserved to just hold the backup. App 1 App 2 App 1 App 2 Physical Host Failure Primary VM Backup VM Hypervisor Hypervisor Primary Host Backup Host
Use a Shared Storage • “Maintain” backup VM in storage instead of RAM • Improve resource and energy efficiency. Recover anywhere. Other primary (active) VM App 2 App 1 App 1 App 2 App 2 App 1 Other primary (active) VM Primary VM Primary VM Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Host 1 Host n Host 1 Host 2 Host 2 Shared Storage Backup VM
Protection: Tracking Primary VM State • Take checkpoints of the primary VM • Incremental, periodic, copy-on-write checkpoints App 1 App 2 Primary VM VM memory space VM Fail-over Image
Fail-over: Bringing Up Backup VM • Slim VM Restore • Load only necessary informationand switch on backup VM quickly • Fetch pages on-demand as the backup VM executes App 1 App 2 Restored backup VM VM memory space VM Fail-over Image
Improving I/O Efficiency with SSDs • Small, random I/O’s are more efficient on SSDs Primary Side Updating the VM image continuously. Restore Side Fetching from the VM image on-demand. small, random writes small, random reads VM Fail-over Image
Preliminary Evaluation • Prototype built on Xen 3.3.2 • Questions • How much overhead does continuous checkpointing introduce on the primary VM? • How does the shared storage support continuous updating of the fail-over image? • How quickly can our system bring up a backup VM? • How does the backup VM perform when it executes by fetching pages on-demand?
Checkpointing Overheads • Kernel Compilation • RUBiS
CoW and SSD Enhancements • CoW reduces VM pause time for taking checkpoints • Checkpoints commit faster on a SSD
Fail-over Time and Demand Fetching • Time required to bring up a backup VM • Overheads of fetching VM pages on-demand
Interesting Observations:Page Fetching Behavior • How a VM uses (demand fetches) its pages while compiling a kernel:
Interesting Observations:Page Fetching Behavior • What actually happens on disk (recorded by blktrace):
Conclusions 35 s