230 likes | 239 Vues
This article explores a fast rejuvenation technique for server consolidation using virtual machines. It focuses on the software aging of a virtual machine monitor (VMM) and proposes a method to proactively prevent performance degradation. The technique involves on-memory suspension and resumption of VMs, avoiding the need for OS reboots and reducing downtime. Experiments show significant improvements in downtime and performance degradation compared to traditional methods.
E N D
A Fast Rejuvenation Technique for Server Consolidation with Virtual Machines Kenichi KouraiShigeru Chiba Tokyo Institute of Technology
Server consolidation with VMs • Server consolidation is widely carried out • Multiple server machines are integrated on one physical machine • Recently, using virtual machines (VM) • VMs are run on a virtual machine monitor (VMM) • Multiplexing resources ... VM VM VMM hardware
Software aging of VMMs • Software aging of a VMM is critical • Software aging is... • The phenomenon that software state degrades with time • E.g. exhaustion of system resources • Software aging of a VMMaffects all VMs on it • E.g. performance degradation ... VM VM VMM
Software rejuvenation of VMMs • Preventive maintenance • Performed before software aging of a VMM affects its VMs • Occasionally stops a VMM, cleans its internal state, and restarts it • Typical example: rebooting a VMM • Cleans the internal state automatically and completely • The easiest way
Drawbacks (1/2):Increasing service downtime • The VMM reboot needs: • Rebooting all OSes running on the VMs • The time tends to be long • Larger number of VMs • Longer startup time of services • A hardware reset • The BIOS power-on self test is time-consuming VM ... OS OS VMM OSshutdown VMMshutdown hardwarereset VMM boot OS boot
Drawbacks (2/2):Performance degradation • The file cache is lost by the OS reboot • OSes cannot restore performance until the file cache is re-filled • They strongly rely on the file cacheto speed up file accesses • The time tends to be long • The file cache size is increasing • Large amount of memory for a VM • Free memory as the file cache process file cache OS disk
Warm-VM reboot • Fast rejuvenation technique • Efficiently reboots only a VMM • The VMM reboot causes no OS reboot • Basic idea • Suspend all VMs before the VMM reboot • Resume them after the reboot • Challenge • How does a VMM efficiently deal with the large memory images of VMs?
On-memory suspend of VMs • Freezes the memory images of VMs on the main memory • That memory area is just reserved • The time does not depend on the memory size • Saving them into a slow disk is inefficient • ACPI S3 state for VMs • Suspend To RAM • Traditional suspend isACPI S4 state VM freeze disk main memory
On-memory resume of VMs • Unfreezes the memory images preserved on the main memory • They are reused directly as the memory of VMs • No need to read them from a slow disk • The file cache of OSes is also restored • No performance degradation VM unfreeze disk main memory
Quick reload of VMMs • Directly boots a new VMM without a hardware reset • The memory images of VMs are preserved through the VMM reboot • Software can keep track of them • A hardware reset does not guarantee this • A VMM is rebooted quickly • No overhead due toa hardware reset main memory VM new VMM preload old VMM
Comparison with other methods • Cold-VM reboot • Needs the OS reboot • Saved-VM reboot • A naive implementation of the warm-VM reboot • VMs are saved into a disk
Model for availability • Must consider the software rejuvenation of both a VMM and OSes • Warm-VM reboot • The OS rejuvenation isindependent • Cold-VM reboot • The OS rejuvenation is affectedby the VMM rejuvenation • # of the OS rejuvenationincreases OS rejuvenation VMM rejuvenation OS rejuvenation VMM rejuvenation
RootHammer • We have implemented the warm-VM reboot into Xen 3.0.0 • On-memory suspend/resume • Based on Xen's suspend/resume • Manages the mapping from theVM memory to the physical memory • Quick reload • Based on the kexec mechanism in Linux • Kexec for a VMM is included in the latest Xen • It is not for reusing the memory images VM memory physical memory
Experiments • Examine that the warm-VM reboot reduces downtime and performance degradation • Comparison • Cold-VM reboot with the OS reboot • Saved-VM reboot using Xen's suspend/resume server ... Linux Linux client VMM 2 dual-core Opteron 12 GB SDRAM 15,000 rpm SCSI disk gigabit Ethernet Linux
Performance ofon-memory suspend/resume • Suspend/resume of one VM with 11 GB of memory • Ours: 1 sec • Xen's: 280 sec • Depends on the memory size • Suspend/resume of 11 VMs • Ours: 4 sec • OS reboot: 58 sec • Depends on # of VMs
Effect of quick reload • The time of rebooting a VMM with no VMs • Warm-VM reboot • 11 sec • The time of quick reload is negligible • Cold-VM reboot • 59 sec • The time due to a hardware reset is 48 sec
Downtime of services • Warm-VM reboot • Always the same • 42 sec • Saved-VM reboot • Depends on # of VMs • 429 sec (11 VMs) • Cold-VM reboot • Affected by the service type • 157 sec (sshd) • 241 sec (JBoss)
Availability of JBoss • The warm-VM reboot achieves four 9s • Assumptions • OS rejuvenation every week • 34 sec • VMM rejuvenation every 4 weeks • In 0.5 week after the last OS rejuvenation 1 week OS rejuvenation VMM rejuvenation 0.5 week
Performance degradation • The throughput of the Apache web server • before and after the VMM reboot • Warm-VM reboot • No degradation • Cold-VM reboot • Degraded by 69%
Software rejuvenationin a cluster environment • Clustering achieves zero downtime • Multiple hosts can provide the same service • Let us consider the total throughput of all hosts in a cluster • Warm-VM reboot • (m-1)p • Cold-VM reboot • (m-1)p • (m-0.69)p for a whileafter the reboot total throughput mp (m-1)p 42 sec 241 sec t m: # of hosts p: throughput of one host
Comparison with VM migrationin a cluster environment • VM migration achieves nearly zero downtime • VMs are moved to another host • Xen's live migration, VMware's VMotion • Total throughput • Normal run • (m-1)p • One host is reserved for migration • Live migration • (m-1.12)p total throughput mp (m-1)p 42 sec 17 min t
Related work • Microreboot [Candea et al.'04] • Reboots only a part of subcomponents • The warm-VM reboot enables rebooting only a parent component (VMM for VMs) • Checkpointing/restart [Randell '75] • Saves/restores OS processes • Similar to suspend/resume of VMs • Optimizations of suspend/resume • Incremental suspend, compression of memory images
Conclusion • We proposed the warm-VM reboot • On-memory suspend/resume • Freezes/unfreezes the memory images of VMs • Quick reload • Preserves the memory images through the VMM reboot • It achieved fast rejuvenation • Downtime reduced by 83% at maximum • No performance degradation