1 / 57

Efficient Virtualization on Embedded Power Architecture Platforms

Efficient Virtualization on Embedded Power Architecture Platforms. Aashish Mittal, Dushyant Bansal, Sorav Bansal Indian Institute of Technology Delhi. Varun Sethi Freescale Semiconductor. Embedded Virtualization : Motivation. Resource partitioning (control-plane / data-plane)

boone
Télécharger la présentation

Efficient Virtualization on Embedded Power Architecture Platforms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Virtualization on Embedded Power Architecture Platforms Aashish Mittal, Dushyant Bansal, Sorav Bansal Indian Institute of Technology Delhi Varun Sethi Freescale Semiconductor

  2. Embedded Virtualization : Motivation • Resource partitioning (control-plane / data-plane) • High availability (active/standby configuration) • In-service upgrade • Sandboxing (isolate untrusted software) • App compatibility, consolidation, …

  3. Our Contribution 2-6x faster performance improvement Efficient Software virtualization on Embedded Power architecture

  4. Embedded Power Architecture

  5. Embedded Power Architecture • Satisfies Popek-and-Goldberg virtualization requirements • 4 byte fixed length word aligned instructions • Software managed small TLB • 16 variable sized (4KB - 4GB) and 512 fixed size (4KB) entries • S-byte alignment constraint for a page of size S • Orthogonal rwxpage permission bits for user/kernel • No segmentation support • Branching • Direct branch restricted to 26 bit offset • Indirect branches through registers only

  6. Trap-and-emulate

  7. Trap-and-Emulate

  8. Performance overhead of Trap-and-Emulate

  9. Performance overhead of Trap-and-Emulate 4-16x slower than Bare-metal

  10. Binary Translation

  11. Full Binary Translation x86 based solution : Full Binary Translation (Full BT) [VMware] Translation Cache Kernel Code Guest Virtual Address Space

  12. In-place Binary Translation Our solution on Embedded Power : In-place Binary Translation (In-place BT) Kernel Code Privileged instruction Translated instruction Guest Virtual Address Space

  13. Full BT vs. In-place BT : Architectural Implications

  14. Advantages of In-place Binary Translation

  15. In-Place Binary Translation

  16. In-Place Binary Translation Kernel Code Privileged instruction Guest Virtual Address Space

  17. In-Place Binary Translation Move from machine state register to r0 mfmsr r0 Kernel Code Guest Virtual Address Space

  18. In-Place Binary Translation mfmsr r0 Kernel Code Hypervisor Guest Virtual Address Space

  19. In-Place Binary Translation mfmsr r0 Kernel Code Hypervisor Guest Virtual Address Space

  20. In-Place Binary Translation addr Shared Space Kernel Code Kernel Code Load msrfrom ‘shared space’ into r0 mfmsr r0 lwz r0, addr Translated instruction Privileged instruction Guest Virtual Address Space

  21. In-Place Binary Translation addr Shared Space Kernel Code Kernel Code Mark as execute-only Patch-site mfmsr r0 lwz r0, addr Translated instruction Privileged instruction Guest Virtual Address Space

  22. Translation for complex instruction More than one instruction to emulate mtmsr r0 Kernel Code Move r0 to machine state register Guest Virtual Address Space

  23. Translation for complex instruction Translation Cache Translation Cache Kernel Code Emulation Code Branch back mtmsr r0 branch

  24. Translation for complex instruction Translation Cache Translation Cache 26 Kernel Code Emulation Code Branch back 26 mtmsr r0 branch

  25. Translation Cache - Constraint Translation Cache Kernel Code ±32MB Guest Virtual Address Space

  26. Placement Constraints Shared Space • Unused by Guest • Placement constraints on translation cache Translation Cache Kernel Code ±32MB Guest Virtual Address Space

  27. Placement Constraints Shared Space • Unused by Guest • Placement constraints on translation cache easy Translation Cache Kernel Code ±32MB Guest Virtual Address Space

  28. Placement Constraints Shared Space • Unused by Guest • Placement constraints on translation cache easy Translation Cache Kernel Code ±32MB harder Guest Virtual Address Space

  29. Stealing space for Translation Cache Kernel Data Translation Cache 32MB Kernel Code Guest Virtual Address Space • Steal space within 32MB • Choose some data section which lies within 32MB of kernel

  30. Stealing space for Translation Cache Kernel Data Execute-only Translation Cache 32MB Kernel Code • Mark space as execute-only • All R/W accesses to this region trap into hypervisor Guest Virtual Address Space • Steal space within 32MB • Choose some data section which lies within 32MB of kernel

  31. Read/Write Tracing

  32. Read/Write tracing cost Translation cache In-place patch Kernel Code Guest Virtual Address Space Translated instruction Privileged instruction

  33. Read/Write tracing cost Translation cache In-place patch Kernel Code Read/Write access by Guest Guest Virtual Address Space Translated instruction Traced region access Privileged instruction

  34. Read/Write tracing cost Translation cache In-place patch Kernel Code R/W traced region Read/Write access by Guest Guest Virtual Address Space Translated instruction Traced region access Privileged instruction

  35. Read/Write tracing cost Translation cache In-place patch Kernel Code R/W traced region Read/Write access by Guest Guest Virtual Address Space Translated instruction Traced region access Privileged instruction

  36. R/W tracing – False sharing Page Faults

  37. R/W tracing – Tradeoff Page Faults

  38. R/W tracing – Tradeoff Page Faults

  39. R/W tracing – Adaptive page resizing Page Faults

  40. Adaptive Page Resizing : Optimizing Tradeoff • Fragmentation of large page. Based on workload type: • Burst - page is broken such that all patch-sites on that page belong to the “shortest” page • Scan - page is broken into two halves and the half with larger number of tracing page faults is untraced • Remove patch and re-instate Read/Write privileges • High number of tracing page faults • Opportunistic merging • Periodically merge neighboring pages to reduce TLB pressure

  41. Adaptive Page Resizing Performance 1.5-2.4x faster than trap-and-emulate

  42. Overhead Translation cache Kernel Code Read/Write access by Guest Guest Virtual Address Space Translated instruction Traced region access Privileged instruction

  43. Overhead Translation cache Translation cache Kernel Code Read/Write access by Guest to mirrored data Read/Write access by Guest Guest Virtual Address Space Translated instruction Traced region access Privileged instruction

  44. Adaptive-Data Mirroring Instructions accessing data on pages protected by R/W tracing Mirror the accessed data on shared space In-place translation

  45. Adaptive Data Mirroring Performance 2.2-3.1x faster than trap-and-emulate

  46. Comparison with Bare Metal 1.9-4.9x slower than bare metal

  47. More results in paper • Comparisons on lmbench and unixbench • Comparing adaptive page resizingwith statically configured page resizing • Three-way tradeoff between • Privileged Instruction Exits • Read/Write Tracing Page Faults • TLB Miss Page Faults

  48. Correlation between Architecture& Hypervisor Design • RISC vs CISC architecture • In-place vs Full Binary Translation • Orthogonal RWX bits vs RW/NX bits (x86) • Software managed TLBs (with page size flexibility) vs Segmentation

  49. Conclusions Improved virtualization performance of unmodified guests on embedded Power Architecture(2-6x) Insight into implications of subtle architectural design decisions on VMM design

  50. Questions ?

More Related