1 / 36

Semantics-Aware Performance Optimization

Semantics-Aware Performance Optimization. Harry Xu CS Departmental Seminar 01/13/2012. Who Am I. Recently got my Ph.D. (in 08/11) Interested in (static and dynamic) program analysis Theoretical foundations Applications Recent interest--- software bloat analysis.

tucker
Télécharger la présentation

Semantics-Aware Performance Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012

  2. Who Am I • Recently got my Ph.D. (in 08/11) • Interested in (static and dynamic) program analysis • Theoretical foundations • Applications • Recent interest--- software bloat analysis http://www.ics.uci.edu/~guoqingx Looking for motivated Ph.D. students

  3. Is Today’s Software Fast Enough? • Pervasive use of large-scale, enterprise-level applications • Layers of libraries and frameworks • Object-orientation encourages excess • No free lunch anymore from hardware advances • The size of software grows faster than the hardware capabilities (a.k.a. Myhrvold’s Law)

  4. As A Result Heaps are getting bigger • Grown from 500M to 2-3G or more in the past few years • But not necessarily supporting more users or functions Surprisingly common (all are from real apps): • Supporting thousands of users (millions are expected) • Saving 500K session state per user (2K is expected) • Requiring 2M for a text index per simple document • Creating 100K temporary objects per web hit Big impact on applications in a lot of different domains

  5. Let’s Do Optimizations • Dynamic optimizations really helped • JIT compilers can significantly lower the level of inefficiencies • Combine traditional dataflow analyses with inlining • Optimizations as part of the run time • Situations where traditional compilersmay not work well • Dynamic languages (Dr. Michael Franz’s interest) • Semantic inefficiencies (my interest :)

  6. Semantic Inefficiencies • Performance problems caused by developers’ inappropriate choicesand mistakes • In many cases, they are from positive software engineering practices • Make everything as general as possible • Use objects for all simple tasks • Negative effects multiply when we pile up abstractions • Semantics-agnostic optimizations cannot optimize them away • Human insight is required

  7. Key Insight Bringing semantic information into the optimizer is more important than developing sophisticated analyses that are still semantics-agnostic Develop semantics-aware optimizations

  8. Outline • Motivation and Introduction • LeakChaser: a semantics-aware memory leak detector [PLDI 2011] • CoCo: a sound and adaptive system for online replacement of data structures [Ongoing]

  9. Java Memory Leaks • Objects are reachable, but not used • E.g., cached in a big HashMap, but never removed • Existing memory leak detection techniques • Unaware of program semantics: track arbitrary objects • No focus: profiling the whole execution—real causes buried in a sea of likely problems • Developer insight is necessary in leak detection—a three-tier approach to exploit such insight Tier L Tier M Tier H Manual Manual Manual LeakChaser LeakChaser LeakChaser

  10. Exploiting Developer Insight • Let programmers write specifications • Lifetime invariants often exist in large-scale apps • Lifetimes for certain objects are strongly correlated Would like to have a new assertion framework • Screen s = new Screen(…); • Configuration c = new Configuration(); • /*s and c should always be created together and eventually die together */

  11. Tier L: Low-Level Liveness Assertions • An assertion framework to specify lifetime relationships • assertDiesBefore(c, s) // s: Screen, c: Configuration • assertDiesBeforeAlloc(s, s) • Can be used to assert arbitrary objects that have high-level semantic relationships

  12. High-Level Events: Transactions • Frequently-executed code regions • Likely to contain memory leaks • Inspired by EJB transactions • Allow programmers to specify transactions runQuery runQuery runQuery runQuery • ResultSetrunQuery(String query){ • Connection c = getConnection(…); • Statement s = c.createStmt(); • ResultSet r = s.executeQuery(query); • return r; • } … … … Heap

  13. Transaction Specification • ResultSetrunQuery(String query){ • transaction { • Connection c = getConnection(…); • Statement s = c.createStmt(); • ResultSet r = s.executeQuery(query); • } • return r; • } • Transaction • A user-specified spatial boundary • A transaction identifier object that is correlated with the livenss of this region: temporal boundary (query)

  14. Tier M: Checking Transaction Properties • ResultSetrunQuery(String query){ • transaction (query) { • Connection c = getConnection(…); • share{ • Statement s = c.createStmt(); • globalMap.cache(s); • } • ResultSet r = s.executeQuery(query); • } • return r; • } • Semantics for each object o created in this transaction do assertDiesBefore (o,query) • Share region for each object ocreated in this transaction if o is not created in share do assertDiesBefore (o, query) Programmers do not need to understand implementation details to use low-level assertions

  15. Tier H: Inferring Transaction Properties • ResultSetrunQuery(String query){ • transaction (query){ • Connection c = getConnection(…); • Statement s = c.createStmt(); • globalMap.cache(s); • ResultSet r = s.executeQuery(query); • } • return r; • } • Minimum requirement for user involvement • Specify a transaction • Tell LeakChaser to run in the inference mode • Semantics for each ocreated in this transaction if assertDiesBefore (o, query) = false startTrackStaleness(o) if(o.staleness >= S) reportLeak(); , INFER Programmers let LeakChaser do most of the work

  16. Three-Tier Approach • Tier H, Tier M, and Tier L • Decreasing levels of abstraction • More knowledge required for diagnosis • Increased precision • LeakChaser: an iterative diagnosis process • Start Tier H with little knowledge • Gradually explore leaky behaviors to locate the root cause

  17. Case Studies • Six case studies on real-world applications • Eclipse diff (bug #115789) • SPECJbb 2000: LeakChaser found a memory problem never reported before • Eclipse editor(bug #139465): quickly concluded that this was not a bug • Eclipse WTP (bug #155898): found the cause for this bug that was reported three years ago and is still open • MySQL leak • Mckoi leak: first time found that a leaking thread is the root cause • The ability of diagnosing problems for a large system at its client

  18. Implementation • Jikes RVM • Works for both baseline and optimizing compilers • Works for all non-generational tracing GCs • Overhead on FastAdaptiveImmix(average) • Infrastructure: 10%on time, less than 10%on space • Transactions: 2.3X slowdown for 1088377 transactions • LeakChaser is available for download • http://jikesrvm.org/Research+Archive

  19. Summary • Developer insight is given in the form of specifications • Any other ways to express developer insight?

  20. Outline • Motivation and Introduction • LeakChaser: a semantics-aware memory leak detector [PLDI 2011] • CoCo: a sound and adaptive system for online replacement of data structures [Ongoing]

  21. Container Inefficiencies • Inappropriate choice of container is an important source of bloat • Examples • Use HashSetto store very few elements ArraySetor SingletonSet • Call many get(i)on a LinkedList ArrayList …

  22. Optimizing Containers • Container semantics required • Different design and implementation rationales • Chameleon – an offline approach [PLDI 2009] • Profile container usage • Report problematic container choices • Make recommendations • Make it online? – remove burden from developers completely • Appears to be an impossible task • Soundness– how to provide consistency guarantee • Performance– how to reduce switch overhead

  23. Nothing Is Impossible • The CoCo approach • Users specify replacement rules, e.g., LinkedListArrayList if #get(i) > X • CoCo switches implementations at run time • CoCo is an application-levelapproach that performs optimizations via pure Java code • Manually modified container code • Automatically generated glue code

  24. CoCo System Overview • Manually modify container classes to make them CoCo-optimizable • Use CoCo static compiler to generate glue code and compile it with the optimizable container classes • Run the program with our modified JikesRVM

  25. The CoCo Methodology • CoCo works only for same-interface optimizations • LinkedListcan be replaced only with another List (that implement java.util.List) • For each allocation of container type c, create a group of other containers {c1, c2, c3} LinkedList l = new LinkedList(); Container Combo ArrayList Hash ArrayList LinkedList Inactive XYZList Active

  26. Soundness add(o) Active Combo • All operations are performed only on the active container • When an object is added into the active container, its abstraction is added into inactive containers LinkedList get() ArrayList addAbstraction(α) Inactive addAbstraction(α) Hash ArrayList Inactive addAbstraction(α) XYZList Inactive

  27. Soundness active Combo • Once a switch rule evaluates to true, an inactive container becomes active • If an abstraction is located by a retrieval, it is concretizedto provide safety guarantee LinkedList o get() α o concretize ArrayList α Inactive Hash ArrayList α Inactive XYZList α Inactive • if #get(i) > X • LinkedListArrayList

  28. Optimizable Container Classes Manually modified Automatically generated • class LinkedListimplements List { • void add (Object o) { //actual stuff } • Object get (int index) {//actual stuff } • } • ListCombocombo = … ; • class ListComboimplements List { • void add(Object o) { • } • Object get(int index) { • } • } • } $CoCo • List active = …; • List[] inactiveList = …; • void add (Object o) {combo.add(o);} $CoCo • Object get (int index) { return combo.get(index);} active.add$CoCo(o); for each l ininactiveList{ l.addAbstract(α); } • class ArrayListimplements List { • void add (Object o) { //actual stuff } • Object get (int index) { //actual stuff } • } • ListCombocombo = … ; $CoCo Object o = active.get$CoCo(index); • void add (Object o) {combo.add(o);} If (o instanceof Abstraction) { o = active.concretize(o); } return o; $CoCo • Object get (int index) { return combo.get(index);}

  29. Optimizable Container Classes • LinkedList l = new LinkedList(); • l.add(o); create combo call create inactive lists forward dispatch

  30. Perform Online Container Switch • class ListComboimplements List { • void add(Object o) { • } • Object get(int index) { • } • } • Change field active to the appropriate container • The client still interfaces with the original container profileAndReplace(ADD); active.add$CoCo(o); … profileAndReplace(GET); Object o = active.get$CoCo(index); … void profileAndReplace(intoprType) { switch(oprType) { //profiling case ADD: ADD_OPR++; break; case GET: GET_OPR++; break; } //rules if (GET_OPR > X && active instanceofLinkedList) swap(active, inactive[i]); // inactive[i] is ArrayList }

  31. Abstraction • An abstraction is a placeholder for a set of concrete elements • Container specific • Its granularity influences performance • For List, an abstraction contains • Host container ID • Indices of a range of elements it represents

  32. Concretization • Bring back all elements represented • Containers to be optimized • Better have significant algorithmic advantages in certain execution scenarios • Modify 4 containers from Java collection framework and implemented 3 from scratch • List – ArrayList, LinkedList, and HashArrayListMap– HashMapand ArrayMapSet – HashSetand ArraySet • The same set of switch rules as used in Chameleon

  33. Implementation • Modify Jikes RVM to provide run-time support • Replace each new java.util.X (…) with new coco.util.X (…) in both baseline and optimizing compilers • Optimizations • Dropping combo • Sampling • Lazy creation of inactive containers • Aggressively inline CoCo-related methods

  34. Evaluation • Micro-benchmarks • 75X faster for one benchmark after switching from ArrayListto HashArrayList • DaCapo • 14 large-scale, real-world programs • 1/50 sampling rate : profileAndReplaceis invoked once per 50 calls to add/get • 8.4% speedup on average

  35. Conclusions • The first semantics-aware bloat removal technique • Developer insight is encoded by • Replacement rules • Abstraction and concretization functions • Impact on future dynamic optimization research • Develop more semantics-aware optimization techniques • Any other ways to provide compilers with developer insight?

  36. Thanks You Q/A

More Related