370 likes | 461 Vues
Develop a pointer analysis framework for handling structures and casting in programs, discussing precision, efficiency, and portability trade-offs. Discover problems introduced by casting in pointer analysis and propose solutions.
 
                
                E N D
Title Page Pointer Analysis for Programs with Structures and Casting Suan Hsi Yong, Susan Horwitz, Thomas Reps University of Wisconsin-Madison
Intro: pointer analysis Pointer Analysis • Finds locations to which a pointer may point • Needed for static analyses • e.g. constant propagation, slicing • Precision of pointer analysis affects precision of subsequent analyses • smaller points-to set  more precise • factors: flow-sensitivity, context-sensitivity, treatment of aggregate objects...
Our Approach • Develop a pointer-analysis framework for distinguishing fields of structures • task is complicated by ability to type cast in C • examine the tradeoffs between precision and portability • ideas apply to both flow-sensitive and flow-insensitive analysis
No-structure rules 1 x = &y;  Prog points-to(x,y) w = x;  Prog, points-to(x,y) points-to(w,y) statement rule effect x = &y; x y w = x; w
Collapse Always example “Collapse Always” Approach struct { int * s1; int * s2; } s; int i, j; int * p; s.s1 = &i; s.s2 = &j; p = s.s1;  s = &i;  s = &j;  p = s;  points-to(s,i)  points-to(s,j)  points-to(p,i) points-to(p,j)
No-cast rules 1 Handling Structures x = &s.a; s : x x = &s.a;  Prog a : points-to(x,s.a) x = &((*p).a); x = &((*p).a);  Prog, p s : points-to(p,s) a : points-to(x,s.a) x
No-cast rules 2 Handling Structures s = *p; s = *p;  Prog, points-to(p,b), points-to(b.x,a) p b x1 points-to(s.x,a) x2 x3 a s x1 x2 x3
s = *p with casting s b x1 x1 x2 x2 x3 x3 b y1 y2 y3 ? y4 What Happens With Casting? s = *p; s = *p;  Prog, points-to(p,b), points-to(b.y,a) p points-to(s.?,a) a
s = *p using offsets b 0 4 8 12 s 0 4 8 One Approach: Use Field Offsets s = *p; s = *p;  Prog, points-to(p,b), points-to(b.n,a) p points-to(s.n,a) a • But: • Offsets are compiler-specific • May not be available
Abstract collapse always collapse on cast initial sequence offsets (not portable) least precise most precise Contributions • Identify problems specific to structures and casting in pointer analysis • Introduce a pointer-analysis framework that handles structures and casting with different levels of precision, efficiency, and portability • Present experimental results showing that i)distinguishing fields of structures is important ii) there is very little penalty for portability
C-specs on structures Layout of structs in ANSI C 1) The first field of a structure is at offset 0 i.e. the address of the first field of a structure is the same as the address of the structure 2) The common initial sequence of fields with compatible types in two structures are guaranteed to line up struct S { int s1; char s2; float s3; int s4; }; struct T { int t1; char t2; int * t3; int t4; };
Problem: first field Problems Introduced by Casting 1. “aliasing problem” with the first field(s) of structures struct S { struct T { int * t1; } t; } s; void * p; } p = &s; p = &s.t; p = &s.t.t1; equivalent assignments
Problem: first field 2 Problems Introduced by Casting 1. “aliasing problem” with the first field(s) of structures struct S { struct T { int * t1; } t; } s; void * p;  points-to(p,s)? points-to(p,s.t)? points-to(p,s.t.t1)? p = &s;
Solution: normalize Solution normalize each variable to its “innermost first field” struct S { struct T { int * t1; } t; } s; void * p;  points-to(p,normalize(s))  points-to(p,s.t.t1) p = &s;
Normalize: maps objects with same address In general, normalize can be any function that maps variables to some representative object e.g. normalize(s.a)= s(wheresis the outermost object containings.a)  “Collapse Always” approach e.g. normalize(s.a)= ‹s,offsetof(s,a)›  “Offsets” approach
Normalize: rule change example x = &s.a;  Prog points-to(x,s.a)  x = &s.a;  Prog points-to(normalize(x),normalize(s.a)) points-to(x, y)   a, b such that normalize(a) = x and normalize(b) = y, apoints tob.
Problem: (*p).a Problems Introduced by Casting 2. If p points to a type to which it isn’t declared to point, which field is accessed in the dereference (*p).a? struct S { int s1; char s2; float s3;} s; struct T { int t1; char t2; int * t3;} *p; void * q; p = (struct T *) &s; q = &((*p).t2); p struct T s t1 s1 t2 s2 t3 s3 q = &((*p).t2);  Prog, ? points-to(p,s) q points-to(q,s.t2)
Solution: lookup target : fi : f1 : f2 : Solution Introduce a function to lookup the corresponding field lookup(type, field, target) = the set of fields in target that may correspond to field in type. p:type* type : f :
q=&(*p).a rule with lookup q = &((*p).t2);  Prog, points-to(p,s), s.f  lookup(struct T,t2,s) points-to(q,s.f) q = &((*p).t2);  Prog, points-to(p,s) points-to(q,s.t2) p struct T s t1 s1  t2 s2 t3 s3 q
Problem: assigning block Problems Introduced by Casting 3. What happens when a block of memory of one type is copied into a block of memory of a different type? struct S { int *y1; char *y2; float *y3;} s; struct T { int *x1; char *x2; int *x3; } t; void * p = &t; s = *p; p s = *p;  Prog, s t y1 x1 points-to(p,t),  y2 x2 a points-to(t.x,a) y3 x3 points-to(s.x,a)
Solution: resolve Solution Introduce a function resolve to match corresponding fields in two structures resolve(obj1, obj2, type) = the set of pairs obj1.f, obj2.f ’ where f is a field in obj1 and f ’is the correspond field in obj2 obj1 obj2 obj1.f1 , obj2.f1’ obj1.fn , obj2.fm’ obj1.fn , obj2.fn’ f1 f1’ : : fm’ fn fn’
s = *p rule with resolve t f1’ f2’ f3’ s = *p;  Prog, points-to(p,t), s.f,t.f’ resolve(s,t,ts), s f1 points-to(t.f’,a) f2 points-to(s.f,a) f3 s = *p; s = *p;  Prog, p t x1 points-to(p,t), x2 points-to(t.x,a) x3 points-to(s.x,a) a s y1  y2 y3
First 3 rules with normalize, resolve, lookup x = &s.a;  Prog s : x a points-to(normalize(x), normalize(s.a)) : p q = &((*p).a);  Prog, t*p s : : points-to(normalize(p), s), a a’ s.a’  lookup(t*p, a, s) : : points-to(normalize(q), s.a’ ) q s = *p;  Prog, p points-to(normalize(p), t.b), s t.b : : s.a, t.a’ resolve(normalize(s),t.b,ts), a a’ : : points-to(t.a’, u.c) u.c points-to(s.a, u.c) q
Approaches: Collapse Always collapse always collapse on cast initial sequence offsets (not portable) least precise most precise 1. Collapse Always: portable, least precise normalize(s.a) = s (wheresis an “outermost object”) lookup(t, a, s) = { s } resolve(s, t, t) = {s, t } p s : q : : q = &((*p).a);
Approaches: Collapse On Cast collapse always collapse on cast initial sequence offsets (not portable) least precise most precise 2. Collapse On Cast:portable normalize(s.a) = innermost first field ofs.a lookup(t, a, s) = ifts = t then { normalize(s.a) } else { normalize(s.c)|cis a field ofs } p t s : : a a : : q q = &((*p).a);
Approaches: Collapse On Cast collapse always collapse on cast initial sequence offsets (not portable) least precise most precise 2. Collapse On Cast:portable normalize(s.a) = innermost first field ofs.a lookup(t, a, s) = ifts = t then { normalize(s.a) } else { normalize(s.c)|cis a field ofs } p t s : c1 a c2 : c3 q q = &((*p).a);
Approaches: CoC resolve collapse always collapse on cast initial sequence offsets (not portable) least precise most precise 2. Collapse On Cast  dis a field oft, resolve(s, t, t) = a, a’alookup(t, d, s), a’lookup(t, d, t)    s t t y1 d1 x1 y2 d2 x2 y3 d3 x3
Approaches: Common Initial Sequence x1:int y1:int x2:char y2:char collapse always collapse on cast initial sequence offsets (not portable) t t least precise most precise common InitSeq x3:int y3:int* x4:int y4:int 3. Common Initial Sequence: most precise portable approach p q = &((*p).x2); q lookup(t,x2,t) = {y2}
Approaches: Common Initial Sequence x1:int y1:int x2:char y2:char collapse always collapse on cast initial sequence offsets (not portable) t t least precise most precise common InitSeq x3:int y3:int* x4:int y4:int 3. Common Initial Sequence: most precise portable approach p q = &((*p).x3); q lookup(t, x3,t) = {y3,y4}
Approaches: Offsets ‹s,0› c1 ‹s,4› c2 collapse always collapse on cast initial sequence c3 ‹s,8› offsets (not portable) least precise offsetof(t,a) = 4 most precise 4. “Offsets”:non-portable, most precise normalize(s.a) = ‹s, offsetof(s,a) › (wheresis an “outermost object”) lookup(t, a, ‹s, 0›) = {‹s, offsetof(t, a)›} p t s : a : q q = &((*p).a);
Approaches: Offsets collapse always collapse on cast initial sequence offsets (not portable) least precise most precise ‹s,8› ‹s,10› 4. “Offsets”:non-portable, most precise resolve(‹s, 0›, ‹t, 0›, t) = {‹s, k›, ‹t, k› |kis an integer in[0..sizeof(t)-1] } s t ‹s,0› ‹t,0› ‹s,4› ‹t,4› x ‹s,8› ‹t,10› y
What have we done? Experiments 1. Implemented the pointer-analysis framework in C++ using SUIF 2. Implemented the four algorithms on top of this framework. 3. Ran the four algorithms on 20 C programs (600 to 30,000 lines), and measured • size of points-to sets (precision) • time (efficiency)
Results: Points-to set sizes per deref 50 350 45 300 40 250 35 200 30 25 20 15 10 5 0 bc twig 130.li agrep football less-177 flex-2.4.7 simulator gzip-1.2.4 bison-1.2.2 124.m88ksim ispell-4.0.ispell collapse always common initial sequence collapse on cast offsets Average size of points-to set per dereference
Results: Analysis time (34.9) (9.6) (114.3) 5 4 3 2 1 0 ft ks bc twig 130.li agrep yacr2 099.go triangle football anagram ansitape less-177 flex-2.4.7 simulator gzip-1.2.4 bison-1.2.2 124.m88ksim 129.compress ispell-4.0.ispell collapse always common initial sequence collapse on cast offsets Analysis times, normalized to “offsets” times
Results: Number of points-to edges (22.9) (6.8) (11.0) 5 4 3 2 1 0 bc twig 130.li agrep football less-177 flex-2.4.7 simulator gzip-1.2.4 bison-1.2.2 124.m88ksim ispell-4.0.ispell collapse always common initial sequence collapse on cast offsets Number of points-to edges, normalized to “offsets’’
Conclusions Conclusions • Precise points-to information requires distinguishing fields of structures • Portability does not cost much in terms of time or precision
The End The End