1 / 43

Static Path-Aware Analysis of Program Invariants

Static Path-Aware Analysis of Program Invariants. Murali Krishna Ramanathan Department of Computer Science Purdue University (joint work with Suresh Jagannathan and Ananth Grama). How do I use this?. Motivation. Undocumented Program. Expert Programmer. New Programmer. BUGS. Tester.

airell
Télécharger la présentation

Static Path-Aware Analysis of Program Invariants

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Static Path-Aware Analysis of Program Invariants Murali Krishna Ramanathan Department of Computer Science Purdue University (joint work with Suresh Jagannathan and Ananth Grama)

  2. How do I use this? Motivation Undocumented Program Expert Programmer New Programmer BUGS Tester

  3. Context • What is a program invariant? • Property that must hold across all program executions • What is a failure? • Program run does not satisfy an expected invariant • System crashes • Logical bugs • Performance bugs • What is a specification? • Documentation of intended program invariants • e.g., lock must be followed by unlock • Unavailable or imprecise

  4. Issues • Deriving specifications • Where do we start? • Absence of formal documentation • Legacy code • Identifying the source of failures • How do we search? • Exponential number of execution paths to explore • Representing common information among paths

  5. Specification Inference • Challenges • What to look for? • Both relevant and irrelevant information present in the program source • How to be robust in the presence of bugs? • Assumptions • Programs are mostly well tested but can have bugs • Transparent – no programmer annotations

  6. Kinds of specifications • Control-flow preconditions • A call to fopen must always precede a call to fgets • Data-flow preconditions • The result of a call to socket must always be checked for error before a call to bind • Control-flow postconditions • A call to fopen is either followed by a call to fclose or error • Control-flow divergence preconditions • A call to read can be preceded either by a call to open or socket • …

  7. Preconditions fp := fopen(…) fp = fopen(…); if(fp != NULL) fgets(buf, SIZE, fp); • Predicate • Captures properties associated with variables and procedure calls • Preconditions for procedure • Composed of predicates that need to hold always before every call to a procedure fp != null fopen <- fgets

  8. Types of predicates fp = fopen(…); if(fp != NULL) fgets(buf, SIZE, fp); • Data-flow • captures data flow properties associated with variables • fp is assigned the return of fopen, fp is not null, • Control-flow • define precedence properties among procedures • fgets is preceded by fopen

  9. Control-flow preconditions (ICSE 07) 181 RI_FKey_check(PG_FUNCTION_ARGS) 182 { 199 ri_CheckTrigger(...); 210 pk_rel = heap_open(...); 296 match_type = ri_DetermineMatchType(...); 303 ri_BuildQueryKeyFull(...); 437 } “Check that RI trigger function was called in expected context” “Get the relation descriptors of the FK and PK tables…” “Convert the MATCH TYPE string into a switchable int” “Build up a new hashtable key for a prepared SPI Plan of a constraint trigger of MATCH FULL …”

  10. Control-flow preconditions 181 RI_FKey_check(PG_FUNCTION_ARGS) 182 { 199 ri_CheckTrigger(...); 210 pk_rel = heap_open(...); 212 if(TRIGGER_FIRED_BY_UPDATE(...)) ... 218 else ... 231 if(!HeapTupleSatisfies(...)) ... 296 match_type = ri_DetermineMatchType(...); 298 if(match_type==RI_MATCH_TYPE_PARTIAL) 299 ereport(...); 303 ri_BuildQueryKeyFull(...); 437 }

  11. Control-flow preconditions 181 RI_FKey_check(PG_FUNCTION_ARGS) 182 { 199 ri_CheckTrigger(...); 210 pk_rel = heap_open(...); 248 if (tgnargs == 4) 249 { 250 ri_BuildQueryKeyFull(...); 294 } 437 } ri_BuildQueryKeyFullnot preceded byri_DetermineMatchType Leads to a potential crash

  12. Static Specification Mining • To generate preconditions for a procedure • Generate predicates at each call-site of the procedure • Ideally common predicates across all the call-sites form the preconditions for the procedure • How to find common predicates? • Use mining techniques • Construct patterns built from alignments or permutations of predicate sets • Approximation: Patterns appearing in programs denote preconditions

  13. Approach • Analyze control-flow graph • Build precedence relation (a <- b): • A binary relation between procedures a and b • A call to b is always preceded by call to a • Necessitates an inter-procedural analysis • Relations can cross procedure boundaries • Convergence requires fixpoint calculation • Procedure signatures • Frequent subsequence mining • Mine the chains formed by precedence relations

  14. Path Exploration Path-Sensitive Exploration: q <- p, q <- r <- p q Path-Insensitive Exploration: q , r <- p r q q Path-Aware Exploration: q <- p p

  15. Precedes relation q q r t q q q exit p p q <-p q <-p

  16. Inter-procedural Analysis h() { if(cond) lwrap(); else lwrap(); … uwrap(); } lwrap () { init(); } uwrap () { access(); }

  17. Procedure Signatures s entry s u q t r q q s <- t s <- q <- p <- t Procedure signature for s: q <- p ret p

  18. Mining sequences • Sequence mining: • Input: set of sequences (I) • Output: sequences that occur ‘frequently’ as subsequences in I • Use the Apriori-all algorithm [Agrawal and Srikant, Mining Sequential Patterns, ICDE ’95]

  19. Motivation for sequence mining • Control paths: Invariant: • a, b, c, e a <- c <- e • g, a, d, c, e • a, c, e • a, c, d, e, f • e, f, d, a (Faulty path, no call to a and c before e) • Intersection of these paths • e is preceded by nothing • Use mining to overcome brittleness of path intersection

  20. Sequence Mining - Example • Input sequences: Min Frequency: 4/5 • a, b, c, e • g, a, d, c, e • a, c, e • a, c, d, e, f • e, f, d, a • Input sequences: Min Frequency: 4/5 • a, b, c, e • g, a, d, c, e • a, c, e • a, c, d, e, f • e, f, d, a • Input sequences: Min Frequency: 4/5 • a, b, c, e • g, a, d, c, e • a, c, e • a, c, d, e, f • e, f, d, a • Input sequences: Min Frequency: 4/5 • a, b, c, e • g, a, d, c, e • a, c, e • a, c, d, e, f • e, f, d, a Maximal

  21. Data-flow preconditions (PLDI 07) • Challenges • Data-flow predicates may be aliased • No anchors for data-flow predicates if (x > 0) f(x); if (y > 0) f(y); x = g(…); h(x); if(x > 0) f(x);

  22. Motivating Example main(…) { for(ai = options.listen_addrs;…) { listen_sock = socket(ai->ai_family,…); if(listen_sock < 0) error(); if(num_listen_socks >= 16) error(); if((ret = getnameinfo(…))) … if(setsockopt(listen_sock,…) == -1) error(); if(bind(listen_sock, ai->ai_addr,…) < 0) … } } • In a call to bind, the first parameter is always assigned the return value of a call to socketand is checked for error

  23. Generate Predicates main(…) { for(ai = options.listen_addrs;…) { listen_sock = socket(ai->ai_family,…); if(listen_sock < 0) error(); if(num_listen_socks >= 16) error(); if((ret = getnameinfo(…))) … if(setsockopt(listen_sock,…) == -1) error(); if(bind(listen_sock, ai->ai_addr,…) < 0) … } } listen_sock: return(socket), num_listen_socks: (<,16) (param_1, bind) ret: return(getnameinfo) (param_1, setsockopt), (>=,0)

  24. Another call-site ssh_control_listener(void) { if(control_fd = socket(PF_UNIX,…) < 0) error(); old_umask = umask(0177); if(bind(control_fd,(struct sockaddr *)&addr,…)) … control_fd: return(socket), old_umask: return(umask) (param_1, bind) (>=,0)

  25. Structural Similarity Problem listen_sock: return(socket), num_listen_socks: (<,16) (param_1, bind) ret: return(getnameinfo) (param_1, setsockopt), (>=,0) old_umask: return(umask) control_fd: return(socket), (param_1, bind) (>=, 0) • How to group the attribute sets that need to be mined together? • Find maximal matching of attribute sets • NP-hard • Use approximations based on program structures

  26. Approximations • Type • attribute sets divided based on type of variable • Parameter • Supplied as arguments to the same parameter for any given procedure • Result • Variables that are assigned the return values of the same function • …

  27. Example revisited listen_sock: return(socket), num_listen_socks: (<,16) • Variable names are not comparable • Use positional information • Different number of attributes • Interspersed with irrelevant operations (param_1, bind) ret: return(getnameinfo) (param_1, setsockopt), (>=,0) old_umask: return(umask) control_fd: return(socket), (param_1, bind) (>=, 0)

  28. Is intersection robust? sockfd: return(socket), listen_sock: return(socket), • Same limitations as with control-flow preconditions • Adopt frequent itemset mining • Order of events is less critical • Aggregate collection of data-flow facts at call-sites (param_1, bind) (param_1, bind) (param_1, setsockopt), Precondition: (>=, 0) return(socket), (param_1, bind) control_fd: return(socket), (param_1, bind) (>=, 0) missing! (>=, 0)

  29. Locality main() { fp = init_file(…); fgets(buf, SIZE, fp); } init_file(…) { fp = fopen(…); if(fp != NULL) return fp; exit(-1); } main() { fp = fopen(…); if(fp != NULL) read_file(fp); } read_file(FILE *fp) { … fgets(buf, SIZE, fp); … } • Interprocedural analysis to capture precondition crossing procedure boundaries

  30. Example p1 p1, p2 q p1 s p1 s q p1 p1, p2 p1 r r s p1 p2 p1 t p2 Intraprocedural edge Interprocedural edge

  31. Experiments • Applied on open source C programs • Input to the implementation: control flow graphs • Control flow nodes varied from 16K to 958K • Roughly 2M LoC • Procedure count varied from 298 to 8568 • Precondition predicates varied from 189 to 5963 • Analysis time varied from 26s to 20m

  32. Experimental Goals • Path awareness improves precision • Useful for bug detection • Generates salient documentation

  33. Effectiveness of path awareness • Fewer protocols generated using our approach • Reduction not at the expense of increase in false negatives • Reduces false positives

  34. Bug Detection: Openssh • Procedure prime_testin openssh-4.4p1 • Testing difficult as it performs Miller-Rabin primality testing • Program crashes due to the absence of a error check • e.g., BN_mod_word(p, …), if p is null, program crashes • Fixed in openssh-4.5p1 • Error check not always necessary • e.g., BN_is_prime(…, ctx,…), ctx can either be null or pre-allocated

  35. Bug detection • Case Study: Linux • Hardware Bug • Difficult to detect using traditional testing techniques • Platform dependent error • Transparently identified using our approach • Performance Bug • Cache lookup operation was absent • Not easily specified as a bug for testing • Deviation delays data write flushes • Difficult to identify using traditional testing techniques

  36. Change in Confidence • Increase in confidence reduces the number of predicates

  37. Related Work • Static techniques • Inferring Specifications from Within, Kremenek et al, OSDI 06 • Bugs as deviant behavior, Engler et al, SOSP 01 • … • Dynamic techniques • Strauss, Ammons et al, POPL 02 • Daikon, Ernst et al, TSE 01 • … • Our approach • Path-aware analysis • Generates preconditions • Predicates of arbitrary size • Annotation free

  38. Future Work • Richer specifications • Post-conditions, divergence structures, … • More sophisticated mining techniques • Graph mining, … • Validating generated specifications • Integration with theorem prover • Specifications and concurrency • Atomicity violations

  39. Other work • Dynamic analysis • Detecting cause of assertion failures (under review) • Static path profiles (under review) • Impact analysis – ASE 06 • Memory aliasing – FASE 06 • Test case prioritization – SAC 08 • Distributed Systems • Randomized leader election (Distributed Computing 07) • Eliminating duplicates in P2P systems (TPDS 07) • Search in P2P systems (P2P 05) • Efficient tag detection in RFID systems (SECON 05)

  40. Why not mine post-conditions? fp = fopen(…); if(fp == NULL) exit(-1); fclose(…); • Precedence protocol: • A call to fclose is always preceded by a call to fopen • Successor protocol: • A call to fopen is always succeeded by a call to fclose

  41. Why parameter tracing is insufficient? uldap_connection_find (…) { //code fragment from httpd if (APR_SUCCESS == apr_thread_mutex_trylock(l->lock)) { … compare_client_certs(st->client_certs, l->client_certs) … } • In a call to compare_client_certs, the return value of a call to apr_thread_mutex_trylock must be APR_SUCCESS. • Predicate for compare_client_certsincludes • “return value of apr_thread_mutex_trylock(…)is APR_SUCCESS”

  42. Predicate size distribution • Majority of predicates less than 3

More Related