Rupesh Nasre. Aug 24, 2007. Pointer Analysis Survey.
Outline. • The problem. • Background. • Representative papers. • Discussion: trends, similarities, differences. • Directions for research.
Statically find out the groups of program variables, such that, all variables in a group may point to the same memory block during the program execution. The problem.
Background (1 of 7). • Static analysis. • done on static representation of a program. • does not require program execution. • is conservative by definition. • Dynamic analysis. • done on traces of program executions. • does not cover all possible behaviors. • precise for a run of the program.
Background (2 of 7). • Clients. • program transformations that depend on pointer analysis. • for instance, queries related to pointers and compiler optimizations. • typically, query resolution time for clients is inversely proportional to pointer analysis time.
Background (3 of 7). • Precision. • a measure of correctness for getting the required information from pointer analysis. • for pointer analysis, the required information is: whether two pointers are aliases or non-aliases. • dynamic analysis is precise with respect to that execution.
Background (4 of 7). • Efficiency. • amount of time taken by an algorithm. • Scalability. • asymptotic time complexity of an algorithm. • An algorithm can be efficient, but not scalable.
Background (5 of 7). • Flow-sensitivity. • algorithm considers control flow in the program. • Context-sensitivity. • algorithm considers calling context of a function. • Field-sensitivity. • algorithm separates individual fields of an aggregate, from each other and from the aggregate itself.
Background (6 of 7). • Unification-based. • algorithm merges equivalence classes of variables in an assignment. • less storage requirement. • fast. • low precision.
Background (7 of 7). • Inclusion based (or subset based or constraint based). • algorithm processes assignments directionally and each symbol is represented by a single node. • more storage requirement. • slower. • high precision.
Representative papers (1 of 4). • Choi et al, Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects, POPL 1993. • Andersen, PhD Thesis, 1994. • Burke et al, Flow-insensitive interprocedural alias analysis in the presence of pointers, LCPC 1995. • Reps et al, Precise interprocedural dataflow analysis via graph reachability, POPL 1995.
Representative papers (2 of 4). • Steensgaard, Points-to analysis in almost linear time, POPL 1996. • Ghiya et al, Is it a tree, DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C, PLDI 1996. • Hind et al, Which pointer analysis should I use?, ISSTA 2000.
Representative papers (3 of 4). • Cheng et al, Modular interprocedural pointer analysis using access paths: design, implementation, and evaluation, PLDI 2000. • Liang et al, Evaluating the precision of static reference analysis using profiling, ISSTA 2002. • Whaley et al, Cloning-based context-sensitive pointer alias analysis using binary decision diagrams, PLDI 2004.
Representative papers (4 of 4). • Raman et al, Recursive data structure profiling, MSP 2005. • Lattner et al, Making context sensitive points-to analysis with heap-cloning practical for the real world, PLDI 2007.
Discussion: similarities, differences. • Flow-sensitive: Choi93, Ghiya96, Reps95, Whaley04. • Context-sensitive: Andersen94, Cheng00, Ghiya96, Lattner07, Whaley04. • Field-sensitive: Cheng00, Lattner07, Whaley04. • Unification-based: Steensgaard96, Lattner07. • Inclusion-based: Andersen94, Cheng00, Whaley04.
Discussion: trends (1 of 2). • Recursion is handled using strongly-connected components. • A recursive data structure is represented using a single representative node. • Stack pointers are often treated in a different manner than heap pointers. • For better precision, inclusion-based analyses are preferred. For better efficiency, unification-based analyses are preferred.
Discussion: trends (2 of 2). • Flow-sensitivity does not improve precision to a significant extent, for, typically pointers are not reassigned and when they are, they point to the other part of the same data structure represented as a whole using a single node. • Graph algorithms typically involve three phases: intraprocedural, bottom-up, and top-down. • Single level of context-sensitivity proves sufficiently precise and efficient.
Discussion. • Most of the papers differ in the techniques used to solve pointer analysis problem. • Representation of alias information differs a lot across techniques. • matrices: Ghiya96. • graphs: Das00, Lattner07, Raman05, Reps95, Steensgaard96. • access-paths: Cheng00. • ordered binary decision diagrams: Whaley04.
Directions for research (1 of 4). • Complex data structures. • most algorithms do not handle them well. • occur when large hash tables, dictionaries, symbol tables form the main data structure of a program. • need to characterize complexity of a data structure. • adaptive algorithm depending on the complexity.
Directions for research (2 of 4). • Out-of-order execution for multithreaded programs. • some research done for multithreaded programs. • none of the papers talk about the result of out-of-order execution of instructions on aliases in multithreaded programs. • instructions may be reordered by compiler or hardware.
Directions for research (3 of 4). • Combination of techniques. • no one of the techniques present is best in all aspects. • hybrid approaches are necessary. • one way is to combine static pointer analysis with dynamic profile information. • another way is to use adaptive algorithm which internally uses different sub-algorithms invented.
Directions for research (4 of 4). • Representation of alias information. • history tells us that difference in the alias information representation often led to new algorithms. • research on finding novel ways to represent aliases can be an interesting area to be explored.
Rupesh Nasre. Aug 24, 2007. Pointer Analysis Survey.