FAST-OS: Petascale Single System Image

FAST-OS: Petascale Single System Image Jeffrey S. Vetter (PI)Computer Science and MathematicsComputing & Computational Sciences Directorate Team Members Nikhil Bhatia, Collin Mccurdy, Hong Ong, Phil Roth, WeiKuan Yu

PetaScale single system image • What? • Make a collection of standard hardware look like one big machine, in as many ways as feasible • Why? • Provide a single solution to address all forms of clustering • Simultaneously address availability, scalability, manageability, and usability • Components • Virtual cluster using contemporary hypervisor technology • Performance and scalability of parallel shared root file system • Paging behaviors of applications, i.e., reducing TLB misses • Reducing “noise” from operating systems at scale

Virtual clusters using contemporary hypervisor technology • Goal: Virtual cluster of domUs on multiple dom0s • Enable preemptive migration of OS images for reliability, maintenance, load-balancing • Have virtual login node images, virtual service node images, and virtual compute node images (x86_64, FC5-based) • Initial experimentation on two x86_64 RHEL4 hosts • Collect per-node/per-domain high-level information about host and guest domains • Use a copy-on-write scheme at NFS server • Unionfs-based root file system trees, one per virtual cluster node • Investigate oprofile for active profiling • Performance data should help make migratory decisions • Investigate functionality of Xentop and the LibVirt API for scalable management of domUs • Develop management console for manual and algorithmic control

High-speed migration of virtual clusters with InfiniBand (IB) Goal: Reduce impact of virtualization on high-performance computing application performance • High bandwidth, low latency • Effective migration of OS Phase 1: Xen virtualization with IB • IBM and OSU have a prototype version, functioning over IA32 • Functioning over IA32, support only IB verbs and IPoIB • Need to port to x86_64 • Need support for other IB components Phase 2: Xen migration over IB, prototype under • Migration of IB-capable Xen domains over TCP • Snapshot and close IB domains before migration • Open and reconnect after migration • Address issues in handling user-space IB info • Protection key • IB context, queue pair numbers and LID • Need to coordinate static domains, too Phase 3: Migration of IB-capable Xen domains over IB, live migration

Paging behavior of DOE applications Goal: Understand paging behavior of applications in large scale system Trace-based memory subsystem simulator • Performs dynamic tracing using PIN (Intel’s) • For each data memory reference, simulator • Executes code that looks up data in caches/TLB/PTEs • Estimates latency of misses • Produces estimate of cycles spent in memory subsystem • Offers these advantages of simulation • Allows evaluation of nonexistent configurations: • What if paging were turned off? • What if Opteron used Xeon smaller-L1-cache-for-larger-TLB tradeoff? Potential weakness: inaccuracy • Sequential vs parallel memory accesses • Intel (simulator) vs Portland Group (XT3) compiler • Different runtime libraries (MPI)

Applications and methodology Applications: • High-Performance Computing and Communications benchmarks • Sequential: STREAM, DGEMM, FFT, RandomAccess • Parallel: MPI_RandomAccess, HPL, FFT, PTRANS • Climate:POP, CAM, HYCOM • Fusion: GYRO, GTC • Molecular Dynamics: AMBER, LAMMPS Methodology: • 16p version of each benchmark • Unmeasured: init code, warm up caches; measured: three time steps • Eight variants (results normalized to HW-large): • HW: large and small pages • Simulated Opteron: large, small, and no pages • Simulated Opteron/Xeon hybrid: large, small, no pages

Example results: HYCOM Initial simulation results match measurement and illustrate differences in page configurations

Parallel root file system Goals of study • Use a parallel file system for implementation of shared root environment • Evaluate performance of parallel root file system • Understand root I/O access pattern and potential scaling limits Current status • RootFS implemented using NFSv4, PVFS-2, and Lustre 1.4.6 • RootFS distributed using ramdisk implementation • Modified mkinitrd program locally (pvfs2 and lustre) • Modified to init scripts to mount root at boot time (pvfs2 and lustre) • Configured compute nodes to run small Linux kernel • Released two Lustre 1.4.6.4 patches for kernel 2.6.15 • File systems performance measured using IOR, b_eff_io, and NPB I/O

Example results: IOR read/write throughput

Contacts Jeffery S. Vetter Principle InvestigatorComputer Science and Mathematics Computing & Computational Sciences Directorate(865) 356-1649 vetter@ornl.gov Hong Ong(865) 241-1115 hongong@ornl.gov Nikhil Bhatia(865) 241-1535 bhatia@ornl.gov Collin McCurdy(865) 241-6433 cmccurdy@ornl.gov Phil Roth (865) 241-1543 rothpc@ornl.gov WeiKuan Yu (865) 574-7990 wyu@ornl.gov 10 Ong_SSI_0611

FAST-OS: Petascale Single System Image

FAST-OS: Petascale Single System Image

Presentation Transcript

{image} , {image} {image} , {image} {image} , {image} {image} , {image} {image} , {image}

Matlab Image Processing

Lecture 8 Part 1: Class Project Intro Part 2: Effect of Tip-tilt on Image Quality Part 3: AO System Optimization

Preparing for Petascale and Beyond

Fast Digital Image Inpainting

Fast-Flap Neuro Fixation System

Removing Partial Blur in a Single Image

Fast Interactive Image Segmentation by Discriminative Clustering

HINDUISM

Sharpening from Shadows: Sensor Transforms for Removing Shadows using a Single Image

Single Image Super-Resolution Using Sparse Representation

Single-Image Refocusing and Defocusing

Single-Jeopardy

Outline

Scientific Application Performance on Candidate PetaScale Applications

Petascale Initiative Post-doctoral Hires and Current Projects

Toward Fast Computation of Dense Image Correspondence on the GPU

Segmentation Based Environment Modeling Using a Single Image

Introduction to Petascale Computing

FAST-OS: Petascale Single System Image

Numerical Libraries for Petascale Computing

FAST System Training