Efficient Checkpointing of Java Software using Context-Sensitive Capture and Replay

Guoqing Xu, Atanas Rountev, Yan Tang, Feng Qin Ohio State University ESEC/FSE 07 Efficient Checkpointing of Java Software using Context-Sensitive Capture and Replay

Outline Motivation Challenges for checkpointing/replaying Java software Summary of our approach Contributions Static analyses Multiple execution regions Experimental evaluation Conclusions

Motivation Checkpointing/replaying has been used for a variety of purposes at system level Originally designed to support fault tolerance Debugging of OS and of parallel and distributed software Checkpointing can benefit a number of software engineering tasks Reduce the cost of manual debugging and testing Support for automated techniques for debugging and testing: e.g., dynamic slicing and delta-debugging Inspired by both system-level checkpointing [Pan-PDD88, Dunlap-OSDI02, King-USENIX05] and “saving-and-restoring” software engineering techniques [Saff-ASE05, Orso-WODA05,Orso-WODA06, Elbaum-FSE06]

Challenges Ease of use and deployment Application-level checkpointing: no JVM/runtime support, just code analysis and instrumentation Challenge: no direct access to the call stack; no control over thread scheduling or external resources (files, etc.) Reduce the size of the recorded state Dumping the entire heap may be prohibitively expensive, especially for large programs Challenge: static analyses to prune redundant state Static and dynamic overhead Static analysis cost is amortized over multiple runs Approach is intended for long-running applications

Summary of Our Approach Tool input: program + checkpoint definition Performs static analyses and code instrumentation Tool output: two program versions First, an augmentedcheckpointing version is executed once to record (parts of) the run-time program states At the checkpoint: heap objects, static fields, locals At certain points along the call chain leading to the checkpoint Next, a prunedreplaying version is executed multiple times Restore variables saved at the checkpoint Restore variables saved at points along the call chain How do we resume execution from the checkpoint? Step 1: control flow quickly reaches the checkpoint Step 2: recover state at checkpoint Step 3: incrementally recover state after call sites along the call chain leading to the checkpoint

Definitions Crosscut call chain (CC-chain) A programmer-specified call chain that leads to the method that contains the checkpoint E.g. main(44) -> run(28) Decision points A call site on the CC-chain (e.g. m.run) – due to polymorphism A predicate on which a decision point or the checkpoint is control-dependent At a decision point, the checkpointing version records the control-flow outcome The replaying version uses this info to force the control flow to reach the checkpoint

Replaying, Step 1: Recover the Call Stack Predicate decision point: recover boolean value Call site decision point o.m(a1…, an) Recover the run-time type of the receiver object; instantiated during replaying using sun.misc.Unsafe

Checkpointing Version void run(String[] args) { processCmdLine(args); loadNecessaryClasses(); Set wp_packs = getWpacks(); Set body_packs = getBpacks(); boolean b = Options.v().whole_jimple(); =>save(b); if (b){// DP getPack("cg").apply(); // --- checkpoint --- => save(…); getPack("wjtp").apply(); getPack("wjop").apply(); getPack("wjap").apply(); } retrieveAllBodies(); … } ... } static void main(String[] args) { Main m = new Main(); boolean b = args.length !=0; =>save(b); if (b) // DP => save(type_of(m)); m.run(args); // DP }

Replaying Version void run(String[] args) { processCmdLine(args); loadNecessaryClasses(); Set wp_packs = getWpacks(); Set body_packs = getBpacks(); boolean b = Options.v().whole_jimple(); =>read(b); if (b){// DP getPack("cg").apply(); // --- checkpoint --- =>read(…); getPack("wjtp").apply(); getPack("wjop").apply(); getPack("wjap").apply(); } retrieveAllBodies(); … } static void main(String[] args) { Main m = new Main(); boolean b = args.length !=0; =>read(b); if (b) // DP => read(type_of(m)); => unsafe.allocate(m); => args = null; m.run(args); // DP }

Step 2: Recover at the Checkpoint Our static analysis selects locals for recording(for checkpointing)/recovering(for replaying) when They are written before the checkpoint They are read after the checkpoint Record primitive-typed values or entire object graphs on the heap (all reachable objects) Static fields are selected based on the same idea void run(String[] args) { processCmdLine(args); loadNecessaryClasses(); Set wp_packs = getWpacks(); Setbody_packs= getBpacks(); if (Options.v().whole_jimple()) { getPack("cg").apply(); // --- checkpoint --- getPack("wjtp").apply(); getPack("wjop").apply(); getPack("wjap").apply(); } retrieveAllBodies(); for (Iterator i = body_packs.iterator(); i.hasNext();) { … }… } body_packs

Selection of Static Fields A whole program Mod/Use analysis A static field is “written” if its value is changed, or any heap object reachable from it is mutated A static field is “read” if its value is directly read Analysis algorithm Context-sensitive and flow-insensitive; uses the points-to solution and the call graph from Spark [Lhotak CC-03] Bottom-up traversal of the SCC-DAG of the call graph For each method m, a set Cmis maintained to contain all objects from which a mutated object can be reached Propagate backwards the objects in Cmthat escape a callee method to its callers Select a static field fld if PointsToSet(fld)∩ Cm ≠ ∅

Step 3: Recover after the Checkpoint Replaying only at decision points and the checkpoint is not enough to guarantee correct execution after the checkpoint Additionally record/recover local variables that will be read after each call site in CC-chain void main(){ Set hs = new HashSet(); B b = new B(hs); //-- reco/rest // (type_of(b)) b.m(); //-- extra reco/rest (hs) if(hs == b.s){ … } } class B{ Set s; void m(){ B r0 = this; r0.s = new HashSet(); //-- checkpoint //-- reco/rest (r0) r0.s.add(“”); } } hsuninitialized

Additional Issues A checkpoint can have multiple run-time instances If a method in CC-chain has callers that are not in the chain, it has to be replicated Currently do not support multi-threaded programs Our technique does not guarantee the correctness of the execution, when the post-checkpoint part of the program Depends on external resources, such as files, databases Depends on unique-per-execution values, such as clock Is modified with new cross-checkpoint dependencies Multiple execution regions Designated by a starting point and an ending point Specified by two CC-chains

Study 1: Static Analysis Program #R #IP compress 1 6 socksproxy 3 11 socksecho 3 14 raytrace 3 10 soot-2.2.3 10 35 muffin 3 20 sablecc 4 11 jess 3 8 violet 4 9 javacup 4 9 jtar-1.21 2 4 db 2 5 jflex 2 8 jb-6.1 3 5 jlex-1.2.6 3 8

Static Analysis: Locals Reduction

Static Analysis: Static Fields Reduction

Static Analysis: Removed/Inserted Statements

Static Analysis Cost Phase 1: Soot infrastructure cost Between 1.64ms and 30.6ms per thousand Jimple statements On average, 11.1ms/1000 statements Phase 2: Our analysis cost Between 1.67ms and 26.6ms per thousand Jimple statements On average, 9.4ms/1000 statements This should be amortized across multiple runs of the replaying version

Study 2: Run-Time Performance (compress) Original program: compressing and decompressing 5 big tar files several times Evaluated for five checkpoint definitions One checkpoint, close to the beginning of the program Two regions of compression and decompression A region containing the process of compression A region containing the process of decompression One checkpoint, close to the end of the program

compress Performance • Normalized running time • Normalized size of captured program state

Study 2: Run-Time Performance (soot) Input: soot-2.2.3 itself containing 2227333 methods Phases Enabling cg.spark, wjtp, wjop.ji, wjap.uft,jtp, jop.cp Evaluated for six checkpoint definitions Before whole-program packs After cg After wjtp After wjop After wjap After body packs

soot Performance • Normalized running time • Normalized size of captured program state

Study 2: Run-Time Performance (jflex-1.4.1) Input: a .flex grammar file corresponding to a DFA containing 21769 states Evaluated for four checkpoint definitions After NFA is generated After DFA is generated to DFA After minimization After emission

jflex Performance • Normalized running time • Normalized size of captured program state

Summary of Evaluation Static analysis successfully reduces the size of program state recorded and recovered It is more meaningful to checkpoint/replay long-running programs Checkpoints are better taken after a phase of long computation with (relatively) small output state √compress: small program state, short running time √ soot: large program state, but very long computation time X jflex: large program state, short running time

Conclusions A static-analysis-based checkpointing/replaying technique An implementation and an evaluation that shows our technique can be an interesting candidate for testing, debugging, and dynamic slicing of long-running programs Future work Language-level checkpointing/replaying multi-threaded programs More precise static analyses could be employed to reduce the size of program state to be captured The run-time support for object reading and writing could be improved

Questions?

Efficient Checkpointing of Java Software using Context-Sensitive Capture and Replay