1 / 40

Client-Driven Pointer Analysis

T H E U N I V E R S I T Y O F T E X A S. A T A U S T I N. Client-Driven Pointer Analysis. Samuel Z. Guyer Calvin Lin June 2003. Output. Errors. CIFI. Context & Flow Insensitive. CIFS. Flow Sensitive. CSFS. Context & Flow Sensitive. Using Pointer Analysis.

edan
Télécharger la présentation

Client-Driven Pointer Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. T H E U N I V E R S I T Y O F T E X A S A T A U S T I N Client-Driven Pointer Analysis Samuel Z. Guyer Calvin Lin June 2003

  2. Output Errors CIFI Context & Flow Insensitive CIFS Flow Sensitive CSFS Context & Flow Sensitive Using Pointer Analysis • Pointer analysis: not a stand-alone analysis Supports other client analyses • Current practice: • Client analysis – we’ll focus on error detection • Pointer analysis algorithm – choose precision Pointer Analyzer Client Analysis Error Detector Memory Model

  3. CIFS CSFS CIFI Fast analysis; 85 possible errors 25X slower; 85 possible errors Out of memory; No results Motivation • Real-life scenario: Check for security vulnerabilities in BlackHole mail filter • Manually inspect reported errors • One thing in common: a string processing routine • Clone procedure = ad hoc context sensitivity • Using CIFI, all 85 false positives go away Can we automate this process? Pointer Analyzer Error Detector Memory Model

  4. Our solution • Problems • Cost-benefit tradeoff – severe for pointer analysis • Precision choices are too coarse • Choice is made a priori by the compiler writer • Solution: Mixed precision analysis • Apply higher precision where it’s needed • Use cheap analysis elsewhere Key: Let the needs of client drive precision Customized precision policy created during analysis

  5. Custom Policy CIFI Error Reports Information Loss Dependence Graph Monitor Adaptor Client-Driven Pointer Analysis • Algorithm: • Start with fast cheap analysis: FI and CI • Monitor: how imprecision causes information loss • Adapt: Reanalyze with a customized precision policy Pointer Analyzer Client Analysis Memory Model

  6. Overview • Motivation • Our algorithm Automatically discover what the client needs • Experiments Real programs and challenging error detection problems • Related work and conclusions

  7. main stdin ? ? ! Lattice ^ execl socket error no error execl maybe read ^ False Positives Remote access vulnerability • Example: sock = socket(AF_INET, SOCK_STREAM, 0); read(sock, buffer, 100); execl(buffer); !

  8. Custom Policy CIFI Error Reports Information Loss Dependence Graph Monitor Adaptor Client-Driven Pointer Analysis Pointer Analyzer Client Analysis Memory Model

  9. Analysis framework • Iterative dataflow analysis • Pointer analysis: flow values are points-to sets • Client analysis: flow values form typestate lattice • Fine-grained precision policies • Context sensitivity: per procedure • CS: Clone or inline procedure invocation • CI: Merge values from all call sites • Flow sensitivity: per memory location • FS: Build factored use-def chains • FI: Merge all assignments into a single flow value

  10. Custom Policy CIFI Error Reports Information Loss Dependence Graph Monitor Adaptor Client-Driven Pointer Analysis Pointer Analyzer Client Analysis Memory Model

  11. Monitor • Runs alongside main analysis • Monitors information loss • Detects polluting assignments Merge two accurate flow values  ambiguous value • Tracks complicit assignments Passing an ambiguous value from one variable to another • Records in a dependence graph For both pointer and client analyses

  12. Context insensitive Flow insensitive x = Complicit foo( ) foo( ) x = y; x = x foo(param ) x y x FS: x param CS: foo x Dependence graph (I) • Polluting assignment Add a node for the variable – annotate with a diagnosis • Complicit assignment Add an edge from left side back to right side

  13. Pointer dereference x =(*ptr) x Polluted target Polluted pointer ptr ptr ptr y y x x Dependence graph (II) • Indirect assignments or

  14. Precision policy CS:foo CS: foo CS: bar FS: x FS: ptr ? FS:ptr FS:x CS:bar Adaptor • After analysis... • Start at the “maybe error” variables • Find all reachable nodes – collect the diagnoses Often a small subset of all imprecision DependenceGraph

  15. main ? ? stdin execl socket execl read read read In action... • Monitor analysis • Polluting assignments • Diagnose and apply “fix” • In this case: one procedure context-sensitive • Reanalyze !

  16. Programs • 18 real C programs • Unmodified source – all the issues of production code • Many are system tools – run in privileged mode • Representative examples:

  17. Methodology • 5 typestate error checkers: • Represent non-trivial program properties • Stress the pointer analyzer • Compare client-driven with fixed-precision • Goals: • First, reduce number of errors reported Conservative analysis – fewer is better • Second, reduce analysis time

  18. CS-FS CI-FS CS-FI CI-FI Client-Driven ? ? ? ? ? ? ? ? ? ? ? ? 1000X 0 100X 29 26 15 7 85 0 Normalized analysis time 31 5 0 0 41 1 0 28 89 0 0 7 0 5 Increasing number of CFG nodes 1 41 6 4 7 15 26 0 0 1 18 88 0 0 6 0 26 0 0 7 15 4 18 nn fcron make named SQLite muh (I) pfinger pureftp apache stunnel privoxy muh (II) cfengine wu-ftp (I) wu-ftp (II) blackhole ssh client ssh server 0 7 0 0 29 0 6 0 85 28 2 0 31 4 5 93 0 41 Results Remote access vulnerability 10X

  19. Why it works • Notice: • Different clients have different precision requirements • Amount of extra precision is small

  20. Related work • Pointer analysis and typestate error checking • Iterative flow analysis [Plevyak & Chien ‘94] • Demand-driven pointer analysis [Heintze & Tardieu ’01] • Combined pointer analysis [Zhang, Ryder, Landi ’98] • Effects of pointer analysis precision [Hind ’01 & others] • More precision is more costly • Does it help? Is it worth the cost?

  21. Conclusions • Client-driven pointer analysis • Precision should match the client and program Not all pointers are equal • Need fine-grained precision policies Key: knowing where to add more and what kind • Roadmap for scalability Use more expensive analysis on small parts of progams

  22. Thank You

  23. Time

  24. Precision policies

  25. Thank You

  26. Error detection problems • Remote access vulnerabillity: • File access: • Format string vulnerability (FSV): • Remote FSV: • FTP behavior: Data from an Internet socket should not specify a program to execute Files must be open when accessed Format string may not contain untrusted data Check if FSV is remotely exploitable Can this program be tricked into reading and transmitting arbitrary files

  27. Annotations (I) • Dependence and pointer information • Describe pointer structures • Indicate which objects are accessed and modified procedure fopen(pathname, mode) { on_entry { pathname --> path_string mode --> mode_string } access { path_string, mode_string } on_exit { return --> new file_stream } }

  28. Annotations (II) • Library-specific properties Dataflow lattices property State : { Open, Closed} initially Open property Kind : { File, Socket { Local, Remote } } ^ ^ Remote Local Closed Open File Socket ^ ^

  29. Annotations (III) • Library routine effects Dataflow transfer functions procedure socket(domain, type, protocol) { analyze Kind { if (domain == AF_UNIX) IOHandle <- Local if (domain == AF_INET) IOHandle <- Remote } analyze State { IOHandle <- Open } on_exit { return --> new IOHandle } }

  30. Annotations (IV) • Reports and transformations procedure execl(path, args) { on_entry { path --> path_string } reportif (Kind : path_string could-be Remote) “Error at “ ++ $callsite ++ “: remote access”; } procedure slow_routine(first, second) { when (condition) replace-with%{ quick_check($first); fast_routine($first, $second); }% }

  31. Flow values Types Transfer functions Inference rules Type Theory • Equivalent to dataflow analysis (heresy?) • Different in practice • Dataflow: flow-sensitive problems, iterative analysis • Types: flow-insensitive problems, constraint solver • Commonality • No magic bullet: same cost for the same precision • Extracting the store model is a primary concern Remember Phil Wadler’s talk?

  32. Is it correct? Three separate questions: • Are Sam Guyer’s experiments correct? • Yes, to the best of our knowledge • Checked PLAPACK results • Checked detected errors against known errors • Is our compiler implemented correctly? • Flip answer: who’s is? • Better answer: testing suites • How do we validate a set of annotations?

  33. Annotation correctness Not addressed in my dissertation, but... • Theoretical approach • Does the library implement the domain? • Formally verify annotations against implementation • Practical approach • Annotation debugger: interactive • Automated assistance in early stages of development • Middle approach • Basic consistency checks

  34. Optimistic False positives allowed It can even be unsound Tend to be “may” analyses Correctness is absolute “Black and white” Certify programs bug-free Cost tolerant Explore costly analysis Pessimistic Must preserve semantics Soundness mandatory Tend to be “must” analyses Performance is relative Spectrum of results No guarantees Cost sensitive Compile-time is a factor Error Checking vs Optimization

  35. Complexity • Pointer analysis • Address taken: linear • Steensgaard: almost linear (log log n factor) • Anderson: polynomial (cubic) • Shape analysis: double exponential • Dataflow analysis • Intraprocedural: polynomial (height of lattice) • Context-sensitivity: exponential (call graph) • Rarely see worst-case

  36. Find the error – part 3 • State-of-the-art compiler struct __sue_23 * var_72; struct __sue_25 * new_f = (struct __sue_25 *) malloc(sizeof (struct __sue_25)); _IO_no_init(& new_f->fp.file, 1, 0, ((void *) 0), ((void *) 0)); (& new_f->fp)->vtable = & _IO_file_jumps; _IO_file_init(& new_f->fp); if (_IO_file_fopen((struct __sue_23 *) new_f, filename, mode, is32) != ((void *) 0)) {   var_72 = & new_f->fp.file;   if ((var_72->_flags2 & 1) && (var_72->_flags & 8)) {     if (var_72->_mode <= 0) ((struct __sue_23 *) var_72)->vtable = & _IO_file_jumps_maybe_mmap;     else ((struct __sue_23 *) var_72)->vtable = & _IO_wfile_jumps_maybe_mmap;     var_72->_wide_data->_wide_vtable = & _IO_wfile_jumps_maybe_mmap;   } } if (var_72->_flags & 8192U) _IO_un_link((struct __sue_23 *) var_72); if (var_72->_flags & 8192U) status = _IO_file_close_it(var_72);   else status = var_72->_flags & 32U ? - 1 : 0; ((* (struct _IO_jump_t * *) ((void *) (& ((struct __sue_23 *) (var_72))->vtable) +                              (var_72)->_vtable_offset))->__finish)(var_72, 0); if (var_72->_mode <= 0)   if (((var_72)->_IO_save_base != ((void *) 0))) _IO_free_backup_area(var_72); if (var_72 != ((struct __sue_23 *) (& _IO_2_1_stdin_)) && var_72 != ((struct __sue_23 *) (& _IO_2_1_stdout_)) &&     var_72 != ((struct __sue_23 *) (& _IO_2_1_stderr_))) { var_72->_flags = 0;   free(var_72); } bytes_read = _IO_sgetn(var_72, (char *) var_81, bytes_requested);

  37. main Challenge 2: Scope • Call graph: • Objects flow throughout program • No scoping constraints • Objects referenced through pointers We need whole-program analysis ! sock = (AF_INET, SOCK_STREAM, 0); (sock, buffer, 100); (ref); socket read execl

  38. Application Broadway Source code Error reports Analyzer Library-specific messages Library Annotations Header files Source code Optimizer Application+Library Integrated source code The Broadway Compiler • Broadway – source-to-source C compiler Domain-independent compiler mechanisms • Annotations – lightweight specification language Domain-specific analyses and transformations Many libraries, one compiler

  39. Security vulnerabilities • How does remote hacking work? • Most are not direct attacks (e.g., cracking passwords) • Idea: trick a program into unintended behavior • Automated vulnerability detection: • How do we define “intended”? • Difficult to formalize and check application logic Libraries control all critical system services • Communication, file access, process control • Analyze routines to approximate vulnerability

  40. Conditions if(cond) x = x = = f( , ) End backup slides

More Related