270 likes | 409 Vues
This paper presents a novel approach, LUCOS, for live updating operating systems using virtualization techniques. It addresses the inherent challenges of operating systems, including security vulnerabilities and design flaws, by facilitating the application of patches without interrupting system availability. LUCOS eliminates the need for safe points in updates, supporting rolling back of patches and overcoming issues such as deadlocks and system crashes. Our experiments validate its effectiveness, achieving less than 1ms update time with minimal performance overhead.
E N D
Live Updating Operating Systems Using Virtualization Haibo Chen, Rong Chen, Fengzhe Zhang, Binyu Zang Fudan University Pen-Chung Yew University of Minnesota at Twin-Cities
Motivation • Operating Systems are far from perfect: • Security holes, design flaws, bugs, new features …… • Results: continuous patches and upgrades required • Difficulties in applying patches and upgrades • Disruptive: loss of availability • Irreversible: risk of system crash • Live Update feature is highly desirable, and very often, critical.
What COS misses? • Requirements to Live Update an OS: • Define an updatable unit • Difficult, COS is monolithic • Apply patch in a safe point • Some hot spots do not have a safe point • root file system, network modules • Consistency • Difficult for OS to update itself
What is LUCOS? • ”Any problem in computer science can be solved with another level of indirection.” • David Wheeler in Butler Lampson’s 1992 ACM Turing Award speech. • Live Updating Contemporary Operating Systems using virtualization • Use Virtual Machine Monitors (VMMs) to patch operating systems (e.g. Linux) • Avoid need for safe point, allow co-existence of the old version and the new version of data structures. • VMM maintains the coherence and tracks when to finish a live update.
What is LUCOS? • A practical live updating system • Apply a broaden range of real-life Linux patches on-the-fly • require nosafe points, retain OS-transparency. • Support patches for recovering tainted state (e.g. deadlock situation) • Allow rolling back committed patches • Require minimal update time(< 1ms) and incur negligible performance overhead (less than 1%)
Some Existing Efforts • Dynamic Software Update • Focus on live update to application software • LUCOS: live update to operating systems • K42 (Baumann et al., Usenix ‘05) • A new operating system to support live update • Tightly bound to object-oriented design techniques • A safe point is desirable • LUCOS: transparently supports existing OS (including non-object-oriented), requires no safe point
Two Types of Live Updates • Updates to onlycode: • Only code is modified. • Updates to code with data changes: • Including global, single-instance data, or multiple-instance data.
Termination of a Live Update • When all threads leave original functions • Stack inspection (Altekar, Usenix Security’05): • Maintain a list of threads executing in original functions • Remove threads that leave original functions • Terminate live update when the list is empty
Patches for Recovering Tainted State • Vision: • Some bugs could cause a tainted state: • Deadlock situation • Simple patching could not solve the problem • spinlock_t demo_lock = SPIN_LOCK_UNLOCKED; • void foo(void){...; • spin_lock(&demo_lock); • ... ; • if(condition){return;} • ...; • spin_unlock(&demo_lock); • } • Code 1. a buggy function with • a potential for deadlocks. • spinlock_t demo_lock = SPIN_LOCK_UNLOCKED; • void foo_patch(void){...; • spin_lock(&demo_lock); • ...; • if(condition){ • spin_unlock(&demo_lock); • return; • }...; • spin_unlock(&demo_lock); • } • code 2: a patch function to fix • the deadlock problem. void state_transfer(void){ if(spin_is_locked( &demo_lock)) spin_unlock(&demo_lock); } code 3: a callback function to recover from a deadlocked situation.
Patches for Recovering Tainted State • Solutions: • Allow callbacks in live update • Three types of callbacks in LUCOS: • function callbacks • thread callbacks • data callbacks • Example: use thread callbacks to resolve the deadlock situation
Patch Rollback • A special type of patches: • Use the original code and data to patch the committed ones • Change state with new data back to original data • Resource overhead: • Has to keep original code and data in memory
Experiments Setup • Implemented on Linux 2.6.10 running Xen-2.0.5. • Systems: • Fedora Core 2 distribution • 3.0GHz Pentium IV with 1GB RAM • Intel Pro 100/1000 Ethernet NIC in 100Mbs LAN • A single 250GB 7200 RPM SATA disk.
Workloads • SPEC INT 2000: • Measure the performance of CPU-intensive workloads • Linux build time: • Measure the overall time to built a Linux Kernel 2.6.10 with gcc-3.3.3. • Open Source Database Benchmark suite (OSDB): • Information Retrieval (IR) • Online Transaction Processing (OLTP)
Experience with Real-Life Patches • Five typical patches selected from Linux upgrades: • upgrade of Linux kernel from 2.6.10 to 2.6.11 • upgrade of backend block device drivers in Xen-Linux
Time to Apply and Rollback Live Updates Note: OSDB-IR/OLTP are running in background when the patches are applied and rollbacked.
Conclusions • Existing operating systems can be live updated • No safe point is required • Patches should recover tainted state • Rollback of a live update is supported • Time overhead to apply a live update is minimal • Performance overhead is negligible
Future Work • Avoid the performance overhead of virtualization • Integrate it with our self-virtualization system • Virtualize operating systems on demand
Questions? • Our contact information: • Parallel processing institute, Fudan University, China • Phone: +86-21-51355363 • Fax: +86-21-65646571
Patch File Format in LUCOS • Follows the format of Linux kernel modules, and adds • New declarations of data structures • *Callback functions • *Patch startup and patch cleanup functions • *State transfer
Fine-grained memory protection • Facilitating ECC memory (Qin et al., HPCA’05) • cache line granularity • Mondrian memory protection (Witchel et al., ASPLOS-X) • word level memory protection
Self-virtualization: architecture • OS can switch between the three modes on-the-fly quickly • Applications are completely unaware of the mode switch • Hosting mode is used to host other OS . • Migrating mode prepares the OS to self-migrate to other machine.