Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads

Partition and Isolate:Approaches for Consolidating HPC and Commodity Workloads Jack Lange Assistant Professor University of Pittsburgh

Summary • Commodity and HPC systems have been converging • Commodity off the shelf components • Linux based HPC systems • Cloud computing • Problem: Real HPC applications need HPC environments • Tightly coupled, massively parallel, and synchronized • Current services must provide dedicated HPC systems • Can we co-host HPC applications on commodity systems? • Dual Stack Approach • Provision the underlying software stack along with application • Commodity stack should handle commodity applications • HPC stack can provide HPC environment

User Space Partitioning • Current systems do support this, but… • Interference still exists inside the system software • Inherent feature of commodity systems Socket 2 Socket 1 Memory Cores Cores Memory 6 2 5 1 8 7 4 3 Commodity Partition HPC Partition

HPC vs. Commodity Systems • Commodity systems have fundamentally different focus than HPC systems • Amdahl’s vs. Gustafson’s laws • Commodity: Optimized for common case • HPC: Common case is not good enough • At large (tightly coupled) scales, percentiles lose meaning • Collective operations must wait for slowest node • 1% of nodes can make 99% suffer • HPC systems must optimize outliers (worst case)

High Performance Computing (HPC) • Large scale simulations to solve Big Problems

Dual Stack Approach • Partition • Segment the underlying hardware resources • Assign them to exclusively to specific workloads • Isolate • Prevent interference from other workloads • Hardware: partitions must be course grained • Software: eliminate shared state • Implementation • Independent system software running on isolated resources

HPC in the cloud • Clouds are starting to look like supercomputers… • Are we seeing a convergence? • Not yet • Noise issues • Poor isolation • Resource contention • Lack of control over topology • Very bad for tightly coupled parallel apps • Require specialized environments that solve these problems • Approaching convergence • Vision: Dynamically partition cloud resources into HPC and commodity zones • This talk: partitioning compute nodes with performance isolation

Commodity VMMs • Virtualization is considered an “enterprise” technology • Designed for commodity environments • Fundamentally different, but not wrong! • Example: KVM architecture issues • Userspace handlers • Fairly complex memory management • Locking and periodic optimizations • Presence of system noise

Palacios VMM • OS-independent embeddable virtual machine monitor • Established compatibility with Linux, Kitten, and Minix • Specifically targets HPC applications and environments • Consistent performance with very low variance • Deployable on supercomputers, clusters (Infiniband/Ethernet), and servers • 0-3% overhead at large scales (thousands of nodes) • VEE 2011, IPDPS 2010, ROSS 2011 Open source and freely available • http://www.v3vee.org/palacios

Palacios/Linux • Palacios/Linux provides lightweight and high performance virtualized environments • Internally manages dedicated resources • Memory and CPU scheduling • Does not bother with “enterprise features” • Page sharing/merging, swapping, overcommitting resources • Palacios enables scalable HPC performance on commodity platforms

VMM Comparison • Primary difference: Consistency • Requirement for tightly coupled performance at large scale • Example: KVM nested paging architecture • Maintains free page caches to optimize performance • Requires cache management • Shares page tables to optimize memory usage • Requires synchronization

Dual Stack Architecture • Partitioning at the OS level HPC Application Commodity Application(s) HPC Linux Commodity Linux KVM Palacios VMM Palacios Resource Managers Linux Kernel Linux Module Interface Hardware • Enable cloud to host both commodity and HPC apps • Each zone optimized for the target applications

Evaluation • Goal: Measure VM isolation properties • Partitioned a single node into HPC and commodity zones • Commodity Zone: Parallel Kernel compilation • HPC Zone: Set of standard HPC benchmarks • System: • Dual 6-core AMD Opteron with NUMA topology • Linux guest environments (HPC and commodity) • Important: Local node only • Does not promise good performance at scale • But, poor performance will magnify at large scales

Results Commodity VMMs degrade with contention Palacios delivers consistent performance MiniFE: Unstructured implicit finite element solver Mantevo Project -- https://software.sandia.gov/mantevo/index.html

Discussion • A dual stack approach can provide HPC environments on commodity clouds • HPC and commodity workloads can dynamically share resources • HPC requirements can be met without fully dedicated resources • Networking is still an open issue • Need mechanisms for isolation and partitioning • Need high performance networking architectures • 1Gbit is not good enough • 10Gbit is good, Infiniband is better • Need control over placement and topologies

Multi-stack Operating Systems • Future Exascale Systems are moving towards in situ organization • Applications traditionally have utilized their own platforms • Visualization, storage, analysis, etc • Everything must now collapse onto a single platform

What this means for the OS • At Petascale we could optimize each environment separately • Each had their own OS and hardware • At Exascale workloads will be co-located • Can a single OS handle all workloads effectively? • Probably Not • Each has different resource requirements and behaviors • Exascale will need to support multiple OS environments on the same hardware

Current Supercomputer OS architectures UNIX LINUX Source: http://en.wikipedia.org/wiki/File:Operating_systems_used_on_top_500_supercomputers.svg

Will Linux continue to dominate? • An open question at this point • Exascale systems may mark a radical departure from traditional architectures • Kitten: Open-source Lightweight Kernel from Sandia • Minimal compute node OS • Provides mostly Linux-compatible user environment • Supports unmodified compiler toolchains and ELF executables • But doesn’t include the enterprise Linux features • Simple memory management, application managed I/O, etc

Dual Stack Architecture HPC Application Commodity Application(s) Kitten Linux Palacios VMM Palacios Resource Managers Linux Kernel Linux Module Interface Hardware • Provide LWK environment on a commodity system • Each zone optimized for the target applications

Stream Palacios/Kitten provides higher memory throughput than Linux • 400 MB/s (4.74%) Palacios/Kitten provides more consistent memory performance than Linux • 340 MB/s lower standard deviation

Selfish Detour Virtualized Kitten Linux Virtual Kitten • Palacios/Kitten can reduce noise from Linux • Eliminates Periodic timer interrupts

Better than native? • Results are preliminary • Followed best practices for configuring Linux • Didn’t try to optimize VMM performance • Virtualization can improve system performance • when the system is running a commodity OS B. Kocoloski, J. Lange, Better than Native: Using Virtualization to Improve Compute Node Performance, (ROSS 2012)

Beyond Virtualization • Virtualization imposes overhead • Power: requires transistors • Performance: small, but present • Interference: Still some dependencies on host OS • Might not be available on exascale hardware • Can we provide native partitioning? • We think so • Linux provides the ability to dynamically remove resources (CPUs, memory, etc) • These can be taken over by a second OS

Dual Stack Architecture Commodity Application(s) HPC Application Kitten Palacio VMM Linux Hardware • Provide LWK environment on a commodity system • Each zone optimized for the target applications

Approach • OS partition created via offlined resources • CPUs, memory, PCI devices • Secondary OS “booted” on offline resources • Issues: • OS initialization • Boot process • Resource discovery • Coordination and communication • Security and safety

Dual Stack Memory • Maybe we don’t need to provide an entirely separate OS • Instead selectively manage some resources • Dual stack memory • Provide a separate memory management layer to Linux • Features • Selectively manage heap per application • Provide applications with direct control over memory layout • Transparently back memory using large pages • Without overhead added by Linux

Dual Stack Architecture HPC Application Commodity Application(s) Memory Management Linux Hardware • Provide LWK memory manager on a commodity OS

Conclusion • Commodity systems are not designed tosupport HPC workloads • Different requirements and behaviors than commodity applications • A multi stack approach can provide HPC environments in commodity systems • HPC requirements can be met without separate physical systems • HPC and commodity workloads can dynamically share resources • Isolated system software environments

Thank you Jack Lange Assistant Professor University of Pittsburgh • jacklange@cs.pitt.edu • http://www.cs.pitt.edu/~jacklange

Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads