Difference Engine: Harnessing Memory Redundancy in Virtual Machines

Difference Engine:Harnessing Memory Redundancy in Virtual Machines Diwaker Gupta et al., OSDI 08 presenter : Dongwoo Kang rediori@dankook.ac.kr

Contents • Introduction • DE(Difference Engine) • Sharing • Patching • Compress • Implementation • Evaluation • Conclusion

introduction

Virtualization • Virtualization has improved and spread over the past decade • benefit of Virtualization • High CPU utilization • Individual server machines often run at 5-10% CPU utilization • Over-provision for peak levels of demand • Isolation between services source : http://www.bcgsystems.com/blog/bid/38584/Server-Virtualization-Enabling-Technology-for-the-Masses

CPU • Many core support virtualization • Can run many VMs source : http://gigglehd.com/zbxe/4164557 source : http://software.intel.com/en-us/articles/planning-for-parallel-optimization/

Memory for Virtualization • 1 GB of RAM • Multiplexing 10 VMs on that same host, • however, would allocate each just 100MB • Increase memory capacity • Need Extra slots on the mother board • Need higher-capacity modules • Consume significant power • High density memory is expensive CPU’s suited to multiplexing, but main memory is not source : http://www.legitreviews.com/article/1882/4/

Techniques to decrease the memory • VMware ESX server • Content-based page sharing • Reduce 10-40% on Homogeneous VMs • Disco • TPS(transparent page sharing) • But, modify Guest VM YeoungUng Park PPT in 2011 source : http://www.rpgminiatures.com/acatalog/4238_Chimera_-_War_Drums_-_Rare.html

Difference Engine

Overview of Difference Engine Similar Page Identical Page Difference Engine sharing patch compress source : http://www.khulsey.com/tm_ferrari_f1_engine.html

1. DE – Sharing • Scan memory to find identical pages • Whole page Sharing • How to? • Content based page sharing • Use SuperFastHashfunction • perform byte-by-byte comparison • Write Protect • Mark shared page as read-only • Writing to the shared page causes page fault trapped by VMM • VMM creates a private copy and updates virtual memory

2. DE-Patch • Patching • Sub-page sharing • Reducing the memory required to store similar pages • Create patch • Use Xdelta, Binary diff program • patch + reference page = target page • Don’t’ create patch unless it less than half a page

2. DE-Patch • Effectiveness of sub-page sharing • Snapshot based test • Each VM have 512MB • 77% Savings RUBiS Kernel Compile lmbench, Vim 7.0 Compile MIXED-1

3. DE-Compress • Compress pages that are not similar to anything • LZO, WKdm algorithms • How to select pages for compress? • Clock • NRU(Not Recently Used) Algorithm

4. DE-Paging • Good candidate page for swapping out • It would likely not be accessed in the near future • It is the same requirement as compressed/patched pages • compressed and patched pages candidate for swapping out

Implementation

Implementation • Implementation on the top of Xen 3.0.4 • Roughly 14,500 lines of code • Additional 20,000 lines from ports of existing algorithms • Xdelta, LZO, WKdm

1. DE-Sharing • SuperFastHash • Hash table size is limited by Xen’s Heap • Hash Table can hold entries for only 1/5 of physical memory • 1.76MB Hash Table • So, 5 passes to scan whole memory source : http://www.azillionmonkeys.com/qed/hash.html

2. DE-Patch • Identifying candidate reference pages • Uses hash of 64Byte blocks of each page ( random portion ) • HashSimilarityDetector(k,s) • k is group • s is Hash key Index : AB HashSimilarityDetector(1,1) Index : ABCD HashSimilarityDetector(1,2) HashSimilarityDetector(2,1) Index : AB Index : CD Higher values of s capture local similarity, while higher k values in corporate global similarity

2. DE-Patch • HashSimilarityDetector(k,s), c • c is number of candidates Apache RUBiS Sysbench Kernel Compile IOZone, dbench lmbench, Vim 7.0 Compile MIXED-1 MIXED-2

Clock • Not-Recently Used policy • Use R/M bits • Checks if page has been referenced/modified • C1 - Recently Modified [M,R=1,1] • C2 - Not Recently Modified [M,R=1,0] • C3 - Not Recently Accessed [M,R=0,0] • C4 - Not Accessed for an extended period C1 : Ignore C2 : sharing, reference pages for patching, but cannot be patched or compressed themselves C3 : shared, patched C4 : compression, swapping

Clock Performance • Lifetime of patched and compressed pages

Compress • Operates similarly to patching • patch + reference page = target page • Uncompress ‘compressed page’ = target page • Interaction between compression and patching • Compress a page, the page can no longer be used as a reference for a later patched page • Sharing -> Patching -> Compress

Paging • Disk paging done by swapd in Dom0 memory_monitor(){ if( mem > HIGH_WATERMARK ) then while( mem >= LOW_WATERMARK ){ then swapout ( page ) endif } endif }

performance evaluation

System • Dell PowerEdge 1950 • 2 processors • 2.3GHz Intel Xeon Processors • VMware ESX Server 3.0.1 • But, Use a single CPU

Evaluation 1 • Homogeneous VMs • 4 VMs, each run dbench • 512MB memory • OS : Debian 3.1

Evaluation 2 • Heterogeneous VMs • MIXED-1, MIXED-2

Evaluation 3 • Overhead of DE • MIXED-1 Env • baseline is Xen without DE • Overhead is 7% • Overhead of VMware ESX Server is 5%

Evaluation 4 • Benefits from memory saved • Run more VMs • Increase throughput

Conclusion

Conclusion • Main memory is a bottlenecks on Virtualization • Present Difference Engine • Whole page sharing • Page patching( sub-page sharing ) • Page compress • 1.6 ~ 2.5 more memory savings than VMware ESX Server • Saved memory can be used to run more VMs => Improve throughput

Q & A

Backup Slides

Page Table Entry

Difference Engine: Harnessing Memory Redundancy in Virtual Machines