Efficient Field-Sensitive Pointer Analysis for C
This work presents an efficient method for field-sensitive pointer analysis in C programming. It focuses on flow-insensitive pointer analysis, outlining its trade-off between precision and efficiency. The analysis utilizes set-constraints generated from the program, enabling an efficient solution through constraint graphs. Additionally, it addresses aggregate types by advocating for a separate node per field to enhance accuracy. By leveraging the proposed methodology, developers can better determine pointer targets without executing the program, leading to more reliable code optimization strategies.
Efficient Field-Sensitive Pointer Analysis for C
E N D
Presentation Transcript
Efficient Field-Sensitive Pointer Analysis for C David J. Pearce, Paul H.J. Kelly and Chris Hankin Imperial College, London, UK d.pearce@doc.ic.ac.uk www.doc.ic.ac.uk/~djp1/
What is Pointer Analysis? • Determine pointer targets without running program • What is flow-insensitive pointer analysis? • One solution for all statements – so precision lost • This is a trade-off for efficiency over precision • This work considers flow-insensitive pointer analysis only int a,b,*p,*q = NULL; p = &a; if(…) q = p; // p{a,b}, q{a,NULL} p = &b;
Pointer analysis via set-constraints • Generate set-constraints from program and solve them • Use constraint graph for efficient solving int a,b,c,*p,*q,*r; p = &a; r = &b; q = &c; if(...) q = p; else q = r; (program)
Pointer analysis via set-constraints • Generate set-constraints from program and solve them • Use constraint graph for efficient solving int a,b,c,*p,*q,*r; p = &a; // p { a } r = &b; // r { b } q = &c; // q { c } if(...) q = p; // q p else q = r; // q r (program) (constraints)
Pointer analysis via set-constraints p q r • Generate set-constraints from program and solve them • Use constraint graph for efficient solving int a,b,c,*p,*q,*r; p = &a; // p { a } r = &b; // r { b } q = &c; // q { c } if(...) q = p; // q p else q = r; // q r {a} {b} {c} (program) (constraints) (constraint graph)
Pointer analysis via set-constraints p q r • Generate set-constraints from program and solve them • Use constraint graph for efficient solving int a,b,c,*p,*q,*r; p = &a; // p { a } r = &b; // r { b } q = &c; // q { c } if(...) q = p; // q p else q = r; // q r {a} {b} {a,b,c} (program) (constraints) (constraint graph)
Field-Sensitivity p x r q • How to deal with aggregate types ? • Standard approach treats them as single variables typedef struct { int *f1; int *f2; } t1; int a,b,*p,*q,*r; t1 x; p = &a; // p { a } q = &b; // q { b } x.f1 = p; // x p x.f2 = q; // x q r = x.f1; // r x {b} {a} {} {}
Field-Sensitivity p x r q • How to deal with aggregate types ? • Standard approach treats them as single variables typedef struct { int *f1; int *f2; } t1; int a,b,*p,*q,*r; t1 x; p = &a; // p { a } q = &b; // q { b } x.f1 = p; // x p x.f2 = q; // x q r = x.f1; // r x {b} {a} {a,b} {a,b}
Field-Sensitivity – A simple solution p xf2 xf1 r q • Use a separate node per field for each aggregate • Node “x” split in two typedef struct { int *f1; int *f2 } t1; int a,b,*p,*q,*r; t1 x; p = &a; // p { a } q = &b; // q { b } x.f1 = p; // xf1 p x.f2 = q; // xf2 q r = x.f1; // r xf1 {b} {a} {} {} {}
Field-Sensitivity – A simple solution p xf2 xf1 r q • Use a separate node per field for each aggregate • Node “x” split in two typedef struct { int *f1; int *f2 } t1; int a,b,*p,*q,*r; t1 x; p = &a; // p { a } q = &b; // q { b } x.f1 = p; // xf1 p x.f2 = q; // xf2 q r = x.f1; // r xf1 {b} {a} {a} {b} {a}
Problem – can take address of field in C xf2 xf1 typedef struct { int *f1; int *f2; } t1; int **p; t1 x,*s; s = &x; // s { x } p = &(s->f2); // p ? • System thus far has no mechanism for this • First idea – use string concatenation operator || • Works well for this example {..} {..}
Problem – can take address of field in C xf2 xf1 typedef struct { int *f1; int *f2; } t1; int **p; t1 x,*s; s = &x; // s { x } p = &(s->f2); // p (*s) || f2 • System thus far has no mechanism for this • First idea – use string concatenation operator || • Works well for this example {..} {..}
Problem – can take address of field in C xf2 xf1 typedef struct { int *f1; int *f2; } t1; int **p; t1 x,*s; s = &x; // s { x } p = &(s->f2); // p (*s) || f2 p { x } || f2 p { xf2 } • System thus far has no mechanism for this • First idea – use string concatenation operator || • Works well for this example {..} {..}
Problem – compatible types xf4 xf3 typedef struct { int *f1; int *f2; } t1; typedef struct { int *f3; int *f4; } t2; int **p; t1 *s; t2 x; s = (t1*) &x; // s { x } p = &(s->f2); // p (*s) || f2 • First idea – use string concatenation operator || • Casting identical types except for field names • Derivation same as before - but,node xf2 no longer exists! {..} {..}
Problem – compatible types xf4 xf3 typedef struct { int *f1; int *f2; } t1; typedef struct { int *f3; int *f4; } t2; int **p; t1 *s; t2 x; s = (t1*) &x; // s { x } p = &(s->f2); // p (*s) || f2 p { x } || f2 p { xf2 } • First idea – use string concatenation operator || • Casting identical types except for field names • Derivation same as before - but,node xf2 no longer exists! {..} {..}
Field-Sensitivity – Our Solution p xf3 xf4 s typedef struct { int *f1; int *f2; } t1; typedef struct { int *f3; int *f4; } t2; int **p; t1 *s; t2 x; s = (t1*) &x; // s { xf3 } p = &(s->f2); // p s + 1 • Our solution – map variables to integers • Solution sets become integer sets • Use integer addition to model taking address of field • Address of aggregate modelled by address of its first field 0 1 2 3
Field-Sensitivity – Our Solution p xf3 xf4 s typedef struct { int *f1; int *f2; } t1; typedef struct { int *f3; int *f4; } t2; int **p; t1 *s; t2 x; s = (t1*) &x; // s { xf3} s { 2 } p = &(s->f2); // p s + 1 • Our solution – map variables to integers • Solution sets become integer sets • Use integer addition to model taking address of field • Address of aggregate modelled by address of its first field 0 1 2 3
Field-Sensitivity – Our Solution p xf3 xf4 s typedef struct { int *f1; int *f2; } t1; typedef struct { int *f3; int *f4; } t2; int **p; t1 *s; t2 x; s = (t1*) &x; // s { xf3} s { 2 } p = &(s->f2); // p s + 1 p { 2 } + 1 p { 3 } • Our solution – map variables to integers • Solution sets become integer sets • Use integer addition to model taking address of field • Address of aggregate modelled by address of its first field 0 1 2 3
Conclusion • Field-sensitive Pointer Analysis • Presented new technique for C language • Elegantly copes with language features • Taking address of field • Compatible types and casting • Technique also handles function pointers without modification • Experimental evaluation over 7 common C programs • Considerable improvements in precision obtained • But, much higher solving times • And, relative gains appear to diminish with larger benchmarks
Constraint Graphs (continued) p s q r • What about statements involving a pointer dereference? • Cannot be represented in the constraint graph • Instead, add edges as solution of q becomes known • Thus, computation similar to dynamic transitive closure int a,*r,*s,**p,**q; p = &r; // p { r } s = &a; // s { a } q = p; // q p *q = s; // *q s {r} {a} {} {} (program) (constraints) (constraint graph)
Constraint Graphs (continued) p s q r • What about statements involving a pointer dereference? • Cannot be represented in the constraint graph • Instead, add edges as solution of q becomes known • Thus, computation similar to dynamic transitive closure int a,*r,*s,**p,**q; p = &r; // p { r } s = &a; // s { a } q = p; // q p *q = s; // *q s r s {r} {a} {r} {} (program) (constraints) (constraint graph)
Constraint Graphs (continued) p s q r • What about statements involving a pointer dereference? • Cannot be represented in the constraint graph • Instead, add edges as solution of q becomes known • Thus, computation similar to dynamic transitive closure int a,*r,*s,**p,**q; p = &r; // p { r } s = &a; // s { a } q = p; // q p *q = s; // *q s r s {r} {a} {r} {} (program) (constraints) (constraint graph)
Constraint Graphs (continued) p s q r • What about statements involving a pointer dereference? • Cannot be represented in the constraint graph • Instead, add edges as solution of q becomes known • Thus, computation similar to dynamic transitive closure int a,*r,*s,**p,**q; p = &r; // p { r } s = &a; // s { a } q = p; // q p *q = s; // *q s r s {r} {a} {r} {a} (program) (constraints) (constraint graph)