Context-sensitive points-to analysis: is it worth it?

Context-sensitivepoints-to analysis:is it worth it? Article by Ondřej Lhoták & Laurie Hendren from McGill University Presentation by Roza Pogalnikova

Abstract • Evaluate precision of subset-based points-to analysis • Compare different context-sensitivity approaches: • call site strings • object sensitivity • algorithm by Zhu and Calman, Whaley and Lam (ZCWL)‏

Subset-based PTA • Finding allocation sites that reach variable: • S: a = new A() // allocation statement • for variable x somewhere in the program: can it point to object allocated at S?

Context Sensitivity • Call site: by program statement of method invocation • Object sensitivity: by receiving object of method invocation • ZCWL: k-CFA, where k is call graph depth without SCCs Run context-insensitive algorithm on cloned context-sensitive call graph. S: this->call_method()‏ S:this->call_method()‏

Parameters • Include: • specialize only pointer variables • use heap abstraction as well • Different lengths of context strings

Measurements • Measure to guide implementation: • number of contexts • number of distinct contexts • number of distinct point-to sets • Measure to evaluate: • size of the call graph (methods/edges)‏ • devirtualizable call sites • casts statically provable to be safe

Results • Object sensitivity is the best and most scalable • Heap abstraction improves precision of analysis • Reduced analysis precision when no context sensitivity call graph in cycles

What • Compare three kinds of context-sensitive points-to analysis: • call sites as context abstraction • object-sensitive analysis • ZCWL algorithm

How • Implemented with JEDD system: • language extension of Java • abstraction of work with Binary Decision Diagrams (BDDs)‏ • Soot framework written in JEDD: • points-to analysis • call graph construction • side-effect analysis in BDDs • virtual call resolution

BDDs Binary decision tree and truth table for the function f(x1, x2, x3) = -x1 * -x2 * -x3 + x1 * x2 + x2 * x3 BDD for the function f * credit: http://en.wikipedia.org/wiki/Binary_decision_diagram

PTA using BDDs Points-to:(a, A)(b, B)(c, C)(a, B)(b, A)(c, A), (c, B) • Program:A: a = new O()B: b = new O()C: c = new O()a = bb = ac = b

PTA using BDDs Points-to representation:(a, A) as 0000(a, B) as 0001(b, A) as 0100(b, B) as 0101(c, A) as 1000(c, B) as 1001(c, C) as 1010 • Binary representation: • a & A as 00 • b & B as 01 • c & C as 10

PTA using BDDs • Compact way to represent points-to relations: * credit: [2] Points-to Analysis using BDDs

Determine • How many contexts generalized? • How number of contexts relates to precision of analysis? • How likely scalable solution to be feasible?

Background • O - pointer targets (objects)‏ • P – pointers • I – method invocation p may point to o: O(o) ϵ pt(P(p))‏

Background • Oas – program statement where object was allocated • Pvar - pointer to local variable • [O(o), f] - field f of object o • Pfs(o.f) – pointer to a field f of object o

Background • Compare 2 families of invocation abstraction: • call site Ics(i) (program statement of metacall)‏ • receiver object Iro(i) = O(o) (object on which method was invoked)

Background • String of contexts given base abstraction Ibase: Istring(i) = [Ibase(i), Ibase(i2), Ibase(i3), ...] • ij is a j'th topmost invocation on stack during i (i = i1)‏ • Two approaches to make it finite: • define limit k to length of context string • ZCWL: exclude cycle edges from call graph

Background • Another choice: which pointers/objects to model context-sensitively? • Given context-insensitive Pci and context I model run-time pointer p: • context-sensitively by P(p) = [I(ip), Pci(p)] (ip method invocation with p)‏ • context-insensitively by P(p) = Pci(p)‏

Background • Given allocation site abstraction Oas, and context I model object o: • context-sensitively by O(o) = [I(io), Oas(o)] (io method invocation where o was allocated)‏ • context insensitively by O(o) = Oas(o)‏

Benchmarks • The study was performed on: • SpecJVM 98 benchmark suite • DaCapo benchmark suite (ver. beta050224)‏ • Ashes benchmark suite • Polyglot extensible Java front-end • SUN standard library 1.3.1_01

Benchmarks

Contexts Number • Considered intractable: • propagate context from call site to called method • context strings number grows exponentially in the length of call chains

Contexts Number • Clarify next issues: • how many of these contexts improve analysis results? • why BDDs can represent such number, and is there hope to represent it with traditional techniques?

Total contexts number • Count method-context pairs • Empty spots – analysis not completed with available memory • BDD lib. could allocate 41 million BDD nodes (~820 MB)‏

Total contexts number

Total contexts number • Explicit context representation not scaling good • Contexts number grows slowly in object-sensitive (this pointer method invocations)‏ • ZCWL • k is max call depth in the call graph after merging SCCs • big variations because k different for each benchmark

Equivalent contexts • Method-context pairs (m1, c1) and (m2, c2) are equivalent if: • m1 = m2 • ∀ local pointer p in the method, pt(P(p)) is the same for c1 and c2 • Equivalence classes reflect precision improvement due to context sensitivity

Equivalent contexts

Equivalent contexts • BDD “automatically” merges equal points-to relations, i. e. is effective • Object-sensitive vs. call sites – more precise • Context string length does not have great impact • Surprisingly ZCWL is less precise due to context-insensitivity in SCCs

Distinct points-to sets • Measures analysis cost • Approximates space requirements in “traditional”representation, like shared bit-vectors • Similar results for all context-sensitive variations • Increase in distinct point-to sets with context-sensitive heap abstraction

Distinct points-to sets

Call Graph • Compare context-insensitive projection of context-sensitive call graphs • each node is method (and not method-context pair) • reachable methods preserved • ZCWL excluded (same as input context-insensitive graph)‏

Reachable methods

Reachable methods • Context-sensitivity discovers more unreachable methods (bloat)‏ • Context-sensitivity for heap objects: • In object-sensitive adds precision (sablecc-j)‏ • In call site no impact

Call edges

Call edges • Compare size of call graph in call edges • The same with exception of large difference in sablecc-j (specific code pattern)‏

Virtual call resolution • Number of virtual calls with more then one implementation • Object-sensitive analysis has clear advantage over call site. • heap objects add precision (sablecc-j)‏

Virtual call resolution

Cast safety • Cast cannot fail if pointer can point-to only to object of “right” type (sub-type of the type in cast)‏ • Count non-provable casts • Object-sensitivity, especially with heap objects is the best (polyglot, javac)

Cast safety

Conclusions Evaluated effects: generated contexts distinct point-to sets precision of call graph construction virtual call resolution cast safety analysis • Context-sensitive variations: • object-sensitive analysis • call sites as context abstraction • ZCWL algorithm

Conclusions • Context-sensitivity improvements: • small: call graph precision • medium: virtual call resolution • major: cast safety analysis • Object-sensitive analysis was the best: • analysis precision • potential scalability

Conclusions • Object-sensitive variations improvements: • small: length of context strings • significant: heap objects with context • implementable with other existing techniques

Conclusions • ZCWL algorithm: • disappointing results • caused by context-insensitive treatment of calls within SCCs of the initial graph • large proportion of edges in SCC

Context-sensitive points-to analysis: is it worth it?

Context-sensitive points-to analysis: is it worth it?

Presentation Transcript

In a rugby union match there are four ways to score A penalty or drop goal is worth 3 points A try is worth 5 poin

National Trends in Context Sensitive Solutions Taking the High Road Toward Sustainable Highways

A Context-Sensitive Pointer Analysis Phase in Open64 Compiler

Annual Worth Analysis

The Power of Context

Context-sensitive Languages

AGEC 105 EQ8 October 17, 2012 This EQ is worth 4 points.

Demand-Driven Context-Sensitive Alias Analysis for Java

Refinement-Based Context-Sensitive Points-To Analysis for JAVA

Context-Sensitive Design Criteria Implementation

BADM 720: Final Exam Review

An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages

A New Normal Form for Context-Sensitive Grammars

Scaling CFL-Reachability-Based Points-To Analysis Using Context-Sensitive Must-Not-Alias Analysis

Context-sensitive ranking

Efficient, Context-Sensitive Dynamic Analysis via Calling Context Uptrees

Practical Object-sensitive Points-to Analysis for Java

Financial Analysis

1 POINT