350 likes | 671 Vues
Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008. Pointer Analysis. Outline. Motivation and Introduction. Related Work. Preliminary Results. Research Directions. Pointer analysis is the mechanism of statically finding out possible run-time values of a pointer.
E N D
Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008. Pointer Analysis.
Outline. • Motivation and Introduction. • Related Work. • Preliminary Results. • Research Directions.
Pointer analysis is the mechanism of statically finding out possible run-time values of a pointer. What is Pointer Analysis?
Pointer analysis is the mechanism of statically finding out possible run-time values of a pointer and relation of various pointers with each other. What is Pointer Analysis?
Relation between pointers. • p = arr + ii; q = arr + jj; if (p == q) { fun(); } • q = p; ... if (p == q) { fun(); }
Variants of Pointer Analysis. • Alias analysis. do p and q point to the same memory location? • Points-to analysis. does p point to memory location x?
Why Pointer Analysis? • for parallelization: fun(p); fun(q); • for common subexpression elimination: x = p + 2; y = q + 2; • for dead code elimination. if (p == q) { fun(); } • for other optimizations.
Introduction. • Flow sensitivity. • Context sensitivity. • Field sensitivity. • Unification based. • Inclusion based.
Flow sensitivity. p = &x; p = &y; label: ... flow-sensitive: {(p, &y)}. flow-insensitive: {(p, &x), (p, &y)}.
Context sensitivity. caller1() { caller2() { fun(int *ptr) { fun(p); fun(q); r = ptr; } } } context-insensitive: {(r, p), (r, q)}. context sensitive: {(r, p)} along call-path caller1, {(r, q)} along call-path caller2.
Field sensitivity. x.f = p; or p = x.f; field-sensitive: {(x.f, p)}. field-insensitive: {(x, p)}.
Unification based. one(&s1); one(struct s*p) { two(struct s*q) { one(&s2); p->a = 3; q->b = 4; two(&s3); two(p); } } unification-based: {(p, &s1), (p, &s2), (p, &s3), (q, &s1), (q, &s2), (q, &s3)}.
Inclusion based. one(&s1); one(struct s*p) { two(struct s*q) { one(&s2); p->a = 3; q->b = 4; two(&s3); two(p); } } inclusion-based: {(p, &s1), (p, &s2), (q, &s1), (q, &s2), (q, &s3)}
Like all other important problems in Computer Science... • Alias analysis without memory allocation, intra-procedural, flow-sensitive, supporting arbitrary levels of indirection, is NP-hard. • For two levels of indirection, it is still NP-hard. • Even flow-insensitive analysis is NP-hard (for arbitrary levels of indirection). • With dynamic memory allocation, allowing structs, it becomes undecidable. • Even for scalars (no structs), it remains undecidable. G Ramalingam, The undecidability of aliasing, TOPLAS 1994. Venkatesan Chakaravarthy, New results on the computability and complexity of points-to analysis, POPL 2003.
But the good news is... • For single pointer dereference, even a flow-sensitive analysis with only scalars and well-defined types is in P, if dynamic memory allocation is not allowed. • For arbitrary number of dereferences, if the analysis is flow-insensitive, it is in P. G Ramalingam, The undecidability of aliasing, TOPLAS 1994. Venkatesan Chakaravarthy, New results on the computability and complexity of points-to analysis, POPL 2003.
Open Problems. • When dynamic memory allocation is not allowed, but arbitrary number of levels of dereferencing is allowed, the problem is NP-hard. Is it in NP? • Is the above problem for bounded number of dereferences in P? • When dynamic memory is allowed, is the problem decidable?
Related Work. • Choi et al, POPL 1993. • flow sensitive. • solution set for each program point. • alias sets for each CFG node. • uses worklists for efficiency. • precise but inefficient. J D Choi,M Burke, P Carini, Efficient flow-sensitive interprocedural computation of pointer induced aliases and side effects, POPL 1993.
Related Work. • Andersen, PhD Thesis, 1994. • flow insensitive. • context insensitive. • inclusion based. • each variable represented using separate node. • precision used as upper bound. Lars Ole Andersen, Program Analysis and Specialization for the C Programming Language, PhD thesis, 1994.
Related Work. • Burke et al, LCPC 1995. • flow insensitive. • alias solution for each procedure. • worklist used for efficiency. • can filter alias information based on scoping. • nearly as precise as Andersen's. M Burke, P Carini, J D Choi, M Hind, Flow-insensitive interprocedural alias analysis in the presence of function pointers, LCPC 1995.
Related Work. • Reps et al, POPL 1995. • problem formulated using graph reachability. • poly-time algorithm for interprocedural finite distributive subset-based problems. • graph reachability used for aliasing. Thomas Reps, Susan Horwitz, Mooly Sagiv, Precise Interprocedural Dataflow Analysis via Graph Reachability, POPL 1995.
Related Work. • Steensgaard, POPL 1996. • flow insensitive. • context insensitive. • field insensitive. • unification based. • linear space and almost linear time algorithm. • imprecise but sets lower bound on time complexity. Bjarne Steensgaard, Points-to Analysis in Almost Linear Time, POPL 1996.
Related Work. • Ghiya et al, PLDI 1996. • flow sensitive. • context sensitive. • field insensitive. • makes use of direction, interference and shape. • classifies as tree, dag or cyclic graph. Rakesh Ghiya, Laurie Hendren, Is it a Tree, a DAG, or a Cyclic Graph? A Shape Analysis For Heap Directed Pointers in C, PLDI 1996.
Related Work. • Cheng et al, PLDI 2000. • uses access paths. • flow insensitive. • field sensitive. • cost effective context sensitivity. • works well for large number of indirect function calls. Ben-Chung Cheng, Wen-Mei Hwu, Modular Interprocedural Pointer Analysis using Access Paths: Design, Implementation, and Evaluation, PLDI 2000.
Related Work. • Whaley et al, PLDI 2004. • context sensitive. • field sensitive. • partially flow sensitive. • inclusion based. • scalable (10 min, 400 MB, 8000 methods). • ordered BDDs. John Whaley, Monica Lam, Cloning-based Context-sensitive Pointer Alias Analysis Using Binary Decision Diagrams, PLDI 2004.
Related Work. • Lattner et al, PLDI 2007. • context sensitive. • flow insensitive. • field sensitive. • unification based. • scalable. • efficient (3 sec for 200K lines). • low storage requirement (30MB). Chris Lattner, Andrew Lenharth, Vikram Adve, Making Context Sensitive Points-to Analysis with Heap Cloning Practical For The Real World, PLDI 2007.
Our Experiments. • framework = LLVM. • algorithm = Andersen. • benchmark = SPEC 2000.
Research Directions. • Pointer arithmetic. void f(struct list *p, struct list *q) { struct list *tmp; tmp = p->next; p->next = q->next; q->next = q->next->next; p->next->next = tmp; }
Research Directions. • Profiling. • at specific program points like function entry, exit. • for hot functions. • for fat pointers.
Research Directions. • Complex data structures. • a recursive data structure is merged into a single node. • some programs have a single global data structure to operate on, like symbol table, dictionary. • how to characterize complexity of a data structure?
Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008. Pointer Analysis.
188.ammp Description. Benchmark Program General Category: Computational Chemistry. Modeling large systems of molecules usually associated with Biology. Benchmark Description: The benchmark runs molecular dynamics (i.e. solves the ODE defined by Newton's equations for the motions of the atoms in the system) on a protein-inhibitor complex which is embedded in water (see Harrison 1993 for descriptions of the algorithm and stability analysis on it). The energy is approximated by a classical potential or "force field". The protein is HIV protease complexed with the inhibitor indinavir. There are 9582 atoms in the water and protein making this representative of a typical large simulation. This benchmark is derived from published work on understanding drug resistance in HIV (Weber and Harrison 1999). Input Description: The problem tracks how the atoms move from an initial coorinates and initial velocities.
Conferences. POPL: Principles of Programming Languages. PLDI: Programming Language Design and Implementation. MSP: Memory Systems Performance. LCPC: Languages and Compilers for Parallel Computing.
Related Work. • Raman et al, MSP 2005. • uses executable instructions. • run time (dynamic). • collects RDS profile. • no type information. • interesting properties of data structures are found out.