Profile-Driven Selective Program Loading

Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University of Maryland, College Park, MD 20742

Motivation • Programs are getting larger! • Many frameworks and libraries • Many supercomputers lack demand-paging • Example: Cray XT and BlueGene series • Available memory is scarce • Observation: Most programs do not use every available function! • Frameworks and libraries are too general • Code that handles errors or special cases • Why not remove functions that are not used in the common case?

Aim Reduce memory footprint by selectively loading parts of shared libraries

Target Platforms and Applications • Unix/Linux systems that support ELF • Modifies ELF program headers • Applications with many libraries • Most current reasonable applications • Parallel programs running on multiple nodes • MPI etc. • Platforms without demand-paging • Cray XT and BlueGene series

Architecture Overview • Application is profiled. • It is rewritten with • Modified Shared Libraries • A Signal Handler • Application is executed as usual.

Profiler • Need a list of never-called functions in each shared library • Profile the application several times • May not be perfect • DynInst-based profiler • Write small program (~ 70 LOC) • Rewrite shared libraries • Profile as many times as necessary

Rewriting • Do not load unused functions • Modify ELF program headers • Example: libpetsc.so .text Program Headers: Type Offset VirtAddrPhysAddrFileSizMemSizFlg Align LOAD 0x000000 0x00000000 0x000000000x090000 0x090000 R E 0x1000 LOAD 0x112000 0x00112000 0x00112000 0x012584 0x012584 R E 0x1000 Program Headers: Type Offset VirtAddrPhysAddrFileSizMemSizFlg Align LOAD 0x000000 0x00000000 0x00000000 0x124584 0x124584 R E 0x1000 LOAD 0x124584 0x00125584 0x00125584 0x013f8 0x0a434 RW 0x1000 DYNAMIC 0x12459c 0x0012559c 0x0012559c 0x00130 0x00130 RW 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4 • First Loadable Section: • .text, .init, .fini, .plt • Second Loadable Section: • .dynamic, .got, .got.plt, .data, .bss

Rewriting • Do not load unused functions • Modify ELF program headers • Example: libpetsc.so .text Program Headers: Type Offset VirtAddrPhysAddrFileSizMemSizFlg Align LOAD 0x000000 0x00000000 0x000000000x090000 0x090000 R E 0x1000 LOAD 0x112000 0x00112000 0x00112000 0x012584 0x012584 R E 0x1000 LOAD 0x124584 0x00125584 0x00125584 0x013f8 0x0a434 RW 0x1000 DYNAMIC 0x12459c 0x0012559c 0x0012559c 0x00130 0x00130 RW 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4 • First Loadable Section: • .text, .init, .fini, .plt • Second Loadable Section: • .dynamic, .got, .got.plt, .data, .bss

Rewriting • Rewriter based on DynInst • Profile data is used to create lists of Used and Unused functions • Access / Modify symbols • Defragment functions to maximize space savings • Requires moving functions inside shared libraries

Function Defragmentation Used Unused

Challenges: Relative Calls • Common way of calling functions in PIC. • If either callee or caller is moved, their relative positioning changes. • Offsets in such relative call instructions need to be updated call d call d’ d d' foo foo

Challenges: Symbols • Runtime linker uses symbols to resolve cross-library calls. • Uses procedure linkage tables (plt) • If a function is moved, its associated symbol has to be updated. foo: 0xdeadbeef foo: 0xbeefdead foo@plt foo@plt foo call foo@plt call foo@plt foo

Challenges: Jump Tables • Used to represent n-way branches at machine level • Targets are read from jump table • Entries are offsets of targets from the GOT address • Becomes invalid if the function referenced in a jump table is moved • DynInst reads jump tables to generate CFGs • We update entries so that they can be used to point to new location of targets

Unexpectedly Called Function • Execution is not always predictable • Unexpected function calls • Rewrite original executable with a Signal Handler • Load the function upon an unexpected call • Signal Handler picks up page faults (SIGSEGV) • Loads requested page on-demand • Execution resumes • User-level: No OS modifications

Experiments • Tested on • PETSc ex5 in snes package • PETSc ex2 in ksp package • GS2 • Compiled with debug flag and no optimization • Used Open MPI • Tested on 64-node cluster at UMD • Dual-core x86 processors • Unmodified Linux kernel • Space savings of about 82% on average

PETSc – snes (ex5)

PETSc – ksp (ex2)

GS2

Running Times • GS2 takes 5 seconds less on average • (36m 38s vs. 36m 33s) • Overhead on PETSc examples • ex2 runs for 2.7 secs, ex5 runs for 1.05 secs.

Running Times • Results suggest no overhead for reasonably-long running programs • Initial cost for signal handler registration • Better instruction cache and TLB performance

Summary • Our tool reduces memory footprint of shared libraries • Rewrite shared libraries with holes • Defragment functions to maximize space savings • On-demand page loading if a not-yet-loaded function is called • About 82% memory space savings for shared libraries • Might improve instruction cache and TLB performance

Profile-Driven Selective Program Loading

Profile-Driven Selective Program Loading

Presentation Transcript

Loading Simulation Program in C++ (LSPC)

Loading

ARM Monitor, Program Loading and Initialization

Selective Spine Immobilization Training Program

Applying to a Selective ADMISSION program

Data-Driven Program Evaluation

Loading…

Profile-driven Inlining for Erlang

Loading…

Loading …… .

Nimrod Program Loading Unit (NPLU)

How to Resolve Outlook Loading Profile Error?

Wood chips loading program Searsport Maine