260 likes | 420 Vues
Precise Memory Leak Detection for Java Software Using Container Profiling. Guoqing Xu , Atanas Rountev Program analysis and software tools group Ohio State University Supported by NSF under CAREER grant CCF-0546040. Memory Leaks. C: malloc without free C++: new without delete
E N D
Precise Memory Leak Detection for Java Software Using Container Profiling GuoqingXu , AtanasRountev Program analysis and software tools group Ohio State University Supported by NSF under CAREER grant CCF-0546040
Memory Leaks • C: malloc without free • C++: new without delete • Java: garbage-collected language • Unreachable objects are identified and freed • How about reachable objects that are not used again? • A Java memory leak can cause serious problems • Performance degradation due to GC cost • Crash with OutOfMemory exception • For long-running enterprise applications with large memory footprint: even small leaks are bad PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Finding the Causes of Memory Leaks (1/2) • Compile-time analysis usually does not work • Run-time analysis is preferable, but tricky • Millions of heap objects at any moment of time • The statement that finally exhausts the memory has nothing to do with the source of heap growth • Continuous run-time heap analysis looking for suspicious behavior (symptom) • E.g., a possible symptom is the growing number of objects: “the number of java.util.HashMap$Entry objects keeps growing” • Finding the leak cause, given this symptom PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Finding the Causes of Memory Leaks (2/2) • Issue 1: what is a leak symptom? • Growing number of instances of a type • LeakBot [ECOOP’05]; Cork [POPL’07] • Staleness (time since last use) of an object • Sleigh [ASPLOS’06] • Issue 2: what is the leak cause? • Starting with the suspicious objects, traverse backwards the run-time object graphto find the cause • This is all great, but … PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
What is a Leak Symptom? • A single factor is not enough as the leak symptom • Growing number of instances may be due to perfectly legitimate useful data • Staleness does not necessarily mean a leak • E.g., a JFrame window object is never used after creation, but it is not a leak • Other factors: e.g., volume of memory consumed by an object and all objects reachable from it • E.g., a big container that is not used for a while may be more important than a never-used string PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
What is the Leak Cause? • From the suspicious objects, traverse backwards the object graph and examine the reference edges • Very large and complex object graph on the heap • The programmer is buried under a mountain of data • How to decide if a reference edge is unnecessary? • Why does this edge exist at all? • Where exactly in the program code • was this reference edge created? • the edge should have been destroyed? • should the programmer look to find and fix the bug? PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Outline • Motivation • New container-based approach • Key idea • Generic leak analysis • Specific leak analysis for Java • Experimental evaluation • Real-world memory leaks • Run-time overhead PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
A New Perspective • Observation: containers are often the leak causes • Elements are not properly removed from containers • Many real-world JDK memory leak bugs are caused by misuse of containers • Let’s reverse the traditional diagnosis process • Start by suspecting that all containers are leaking, and use symptoms to rule out those less likely to leak • Avoid the effort to search for a cause starting from arbitrary suspicious objects PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Our Proposal • Container-centric • Track container operations • Assign a confidence value to each container based on its symptoms • Rank and report based thi s on this confidence value • We only consider bugs caused by containers at the first and second levels of the tree PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Run-time Leak Confidence Analysis • Generic “analysis template” that can be implemented in different ways • Later we show a specific implementation for Java • Considers the combined effect of multiple factors • Memory taken up by an individual container • Overall memory consumption • Staleness of a container • Container abstraction: ADT with three operations • ADD(, o) adds object o to container • GET() retrieves an object from container • REMOVE(, o) removes object o from container PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Leaking Region • A time region [s , e] in which symptoms occur • Garbage collection (CG) occurs at s, at e, and several times in between • The live-memory consumption at these GC events (mostly) keeps increasing from one event to the next • Choice of e • Offline, post-mortem analysis: the time at which the program ends or OutOfMemory exception is thrown • Online, while the program is running: any time when a user wants to generate a report • Choice of s: examine the history of GC events before e and the live-memory usage at them PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Memory Usage of a Container • At some GC event in the leaking region • Find the total memory consumed by all objects reachable from the container • Relative value: divide by the total live memory at this GC event; get a number [0, 1] • Memory usage graph: • X axis: time relative to e • Y axis: relative memory • Memory contributionMC(): area under the curve [0, 1] PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Staleness of a Container • Time since last use [ASPLOS’06] • A new definition in terms of and o • SC(o) = (2 - 1)/(2 - 0) • SC() is the average SC(o) for objects o in • A number [0, 1] • Large value of SC means that many elements are sitting in the container without being used PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Putting it All Together • Leaking confidence LC = SC × MC1-SC [0, 1] • If SC or MC increases, LC also increases • Properties • MC = 0 and SC[0, 1] LC = 0 • SC = 0 and MC[0, 1] LC = 0 • SC = 1 and MC[0, 1] LC = 1 • MC = 1 and SC[0, 1] LC = SC • Analysis output: • Containers ranked by their LC value • ADD/GET call sites ranked by the average staleness of their elements PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Outline • Motivation • New container-based approach • Key idea • Generic leak analysis • Specific leak analysis for Java • Experimental evaluation • Real-world memory leaks • Run-time overhead PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Modeling and Tracking of Containers class HashMap { Object put(Object key, Object value) {…} Object get(Object key) {…} Object remove(Object key) {…} } class Java_util_HashMap { static void put_after(intcsID, Map map, Object key, Object value, Object result) { if (result == null) { … Recorder.v().record(csID, map, key, …, Recorder.EFFECT_ADD); } } } Object result = m.put(a,b); Java_util_HashMap.put_after(1234, m, a, b, result); PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Tracking of Memory Usage • Approximation of MC • An object graph traversal thread is launched periodically to calculate the total amount of memory consumed by objects reachable from the container object • Precision and overhead tradeoff is defined by the interval between two runs of the thread • Our experience shows that once every 50 GC events is a good compromise PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Data Analysis • Decide what should be the leaking region • Compute the approximation of MC • MC = MTi ×(Ti+1 – Ti) • Compute SC • Scan the Recorder data and remove data entries outside the leaking region • For each element, find its REMOVE event and its last GET event PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Evaluation on Real-World Memory Leaks • Java AWT/Swing bugs • Sun JDK bug #6209673 – existed in Java 5, fixed in 6 • Sun JDK bug #6559589 – still open in Java 6 • SPECjbb bug • The generated reports are precise • Top-ranked containers are the actual causes of the bugs • Confidence values for bug-inducing containers and correctly-used containers differ significantly PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Sun JDK Bug #6209673 • The bug manifests when switching between two Swing applications • According to a developer’s report, it is very hard to track down • We instrumented the entire java.awt and javax.swing packages, and the test case that triggered the bug PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Sun JDK Bug #6209673 Container:29781703 type: java.util.HashMap (LC: 0.443, SC: 0.480, MC: 0.855) ---cs: javax.swing.RepaintManager:591 Container:2263554 type: class java.util.LinkedList (LC: 0.145, SC:0.172, MC: 0.814) ---cs: java.awt.DefaultKeyboardFocusManager:738 Container:399262 type: class javax.swing.JPanel (LC: 0.038, SC:0.044, MC: 0.860) ---cs: javax.swing.JComponent:796 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Sun JDK Bug #6209673 • Line 591 of javax.swing.RepaintManager • A GET operation image = (VolatileImage) volatileMap.get(config); • The container that is misused is the volatileMap • This information is sufficient for a developer to locate the bug • Where is the actual bug? • VolatileImage objects are cached in the map • Upon a display mode switch, the old configuration object get invalidated and will not be used again • But the images are still maintained in the map • Similar bugs exist for #6559589 and SPECJbb PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Overhead Compile-time analysis Dynamic overhead Sampling rate: 1/15GC, 1/50GC Initial heap size: default, 512M
Overhead for Different Sampling Rates Y-axis: (NewTime-OldTime)/OldTime 1/15GC: 121.2% 1/50GC: 87.5%
Overhead for Different Initial Heap Size Default heap: 177.2% 512M heap: 87.5%
Summary • Proposed a container-centric approach • Tracking all modeled containers • Computing a leak confidence for each container • Memory contribution and staleness contribution • Can be used for both online and offline diagnosis • Memory leak detection for Java • Code transformation + run-time profiling • Future work • Lower overhead (e.g., selective profiling; JVM internals) • Evaluate other confidence models • Larger experimental study PRESTO: Program Analyses and Software Tools Research Group, Ohio State University