210 likes | 306 Vues
This program focuses on providing essential information about the program heap to support various client applications, IDE tools, optimization, error detection, and scalable models. It emphasizes tracking basic set relations efficiently using logical structure identification and abstract expression heap targets. The approach involves using a storage shape graph to represent sets of objects and pointers, optimizing set relations for computational cost, and inferring useful inferences for efficient static analysis.
E N D
Shape Analysis With Reference Sets Mark Marron IMDEA-Software (Madrid, Spain) mark.marron@imdea.org
Motivation • We want to provide basic information about the program heap for supporting a range of client applications • IDE tools (query, refactoring, etc.) • Optimization • Error Detection • Focus on scalable, manageable models/tools even at cost of overall expressivity/analytic power
Demo • Fix sharing info extraction • Add disjoint/overlaps for set information • Point out, more than just variable relations is desirable, variables transient
Goal • Track basic set relations • Membership, Overlapping, Non-Overlapping • Subset, Set Equality • Ensure small computational cost • High precision is not required but must handle common cases accurately • Iterative subset construction/mutation • Set style library operations • Union (AddAll) • Intersection • IsSubset • Contains
Approach Overview • Start with existing model that decomposes heap into related regions • Reduces the complexity of the set formula that are needed • Storage shape graph works well • Nodes represent sets of objects (or data structures), edges represent sets of pointers • Fine grained partitioning is possible • Disjointness properties are natural (and mostly free) • Annotate edges with additional properties to track reference set relations
Logical Structure Identification • Key issue for shape graph approach is how to group concrete objects into abstract nodes • Too many nodes is confusing and computationally expensive • Too few nodes leads to imprecision (as a single node must represent multiple logical structures) • Often done via allocation site or types • Solution: nodes are similar sets of objects • Recursive type information (recursive vs. non-recursive types) • Objects stored in the same collection, array or structure
Target Set Definition • Given a set of heap references R the corresponding target set is: • {Object o | ∃ r ∈ R that points to o} • The two sets of heap references can be related with ⊆ on the target sets • As the heap is partitioned into regions of objects we also define a notion of coverage • A reference set covers a region if every object in the region is in the corresponding target set
Considerations in Abstraction • Several possible choices for representing these relations • Theory of sets over all objects/references • Full binary relations on power sets of edges • Reduced set of relations • For efficiency we use a reduced set of relations • Equality of the reference sets abstracted by pairs of edges (E × E) • Relation from sets of edges to nodes that are covered by the abstracted references (℘(E) × N)
Abstract Edge Equivalence • Track target set equality of the pointers abstracted by pairs of edges
Abstract Node Coverage • Track if all nodes in region are contained in the target sets of given edges
Useful Inferences • There are a number of useful inferences that can be made from these two properties • If e, eʹ are edge equivalent and e has an empty concretization then eʹ must have an empty concretization as well • If an edge e covers node n then any other in edge represents a target set that is ⊆ to the target set for edge e
Subsumes Aliasing • Note that the proposed reference set relations subsume classic must-alias • In the concrete model variables x == y (x, y non-null) iff Target(x) = Target(y) • In the abstract model the variables x, y must-alias iff the corresponding edges ex and ey are edge equivalent
Loop Invariant With Exit Test ... for(int i = 0; i < V.Length; ++i) V[i].f = 0;
Result ... for(int i = 0; i < V.Length; ++i) V[i].f = 0;
Summary • Tracking reference set information is computationally inexpensive • Results are precise enough to model many interesting/important relations • In fact surprisingly so • Why? Most conditions end up being simple • Is this a general property? Are most programs made of simple relations/concepts which are composed into complex concepts (we hope so) • Could we use rich set decision procedures, e.g. all conditions are simple ⇒ most proofs easy/fast with right decomposition
Future Work • Build strong foundation for other tools to utilize • Transform core concepts from prototype to robust tools • Finish implementation of static analysis for CLI bytecode + core libraries (also runtime support) • Export results to Visual Studio for inspection, spec. generation, or other tools • Apply results in optimization, refactoring, and error detection applications