1 / 104

Mining Specifications (lots of) code  specifications of correctness

Mining Specifications (lots of) code  specifications of correctness. Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research. program. program. program. program. verifier. Specifications. Bugs!.

shina
Télécharger la présentation

Mining Specifications (lots of) code  specifications of correctness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Specifications(lots of) code  specifications of correctness Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research

  2. program program program program verifier Specifications Bugs! Motivation: why specifications? Verification tools • find bugs early • make guarantees • scale with programs • need specifications

  3. program Easy to write, big payoff program program program • array accesses • memory allocation • type safety • ... verifier Bugs! Language-usage specifications

  4. program Harder to write, smaller payoff program • cut-and-paste (X11) • network server (socket API) • device drivers (kernel API) • ... verifier Bugs! Library-usage specifications

  5. program Hardest to write, smallest payoff • symbol table well-formed • IR well-formed • ... verifier Bugs! Program specifications

  6. Solution: specification mining • Specification mining gleans specifications from • artifacts of program development: • From programs (static)? • From executions of test cases (dynamic)? • From other artifacts?

  7. Mining from traces ... socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) read(so = 8, buf = 0x100, len = 12, return = 12) close(so = 8, return = 0) close(so = 7, return = 0) ... • Advantages: • No infeasible paths • Pointer/alias analysis is easy • Few bugs, as program passes its tests • Common behavior is correct behavior

  8. socket(return = X) read(so = Y) write(so = Y) accept(so = X, return = Y) close(so = X) close(so = Y) Output: a specification start end • Specification says what programs should do: • Temporal dependences (accept follows socket) • Data dependences (accept input is socket output)

  9. 40 start start . read(so = Y) socket(return = X) A 20 . 20 C 10 . . 10 D write(so = Y) 20 accept(so = X, return = Y) B 20 10 10 E close(so = X) F close(so = Y) 10 end end socket(...) socket(...) socket(...) accept(...) accept(...) accept(...) write(...) write(...) write(...) read(...) read(...) read(...) close(...) close(...) close(...) How we mine specifications Scenarios (dep. graphs) Traces ... socket(domain = 2, type = 1, proto = 0, return = 7)) ... ... socket(domain = 2, type = 1, proto = 0, return = 7)) ... ... socket(domain = 2, type = 1, proto = 0, return = 7)) ... extract scenarios Strings PFSA learner A C E G B standardize A C E G B A C E G B PFSA postprocess Specification

  10. Outline of the talk • The specification mining problem • Our specification mining system • Annotating traces with dependences • Extracting and standardizing scenarios • Probabilistic learning and postprocessing • Experimental results • Related work

  11. An impossible problem Find a Turing machine that generates C, given T. I (all traces) C (all correct traces) • Unsolvable: • No restrictions on C • No connection • between C and T • Simple variants • are also • undecidable [Gold67] T (training traces)

  12. A simpler problem Find a PFSAthat generates an approximation of P. 1 P a probability distribution Probability Correct Noise 0

  13. A simpler problem Find a PFSAthat generates an approximation of P. 1 P a probability distribution over all scenarios Probability Correct scenarios Noise 0 All scenarios

  14. A simpler problem • Find a PFSAthat generates • an approximation of P. • Tractable, plus • Scenarios are small • Noise handled • Finite-state • Weights useful for • postprocessing 1 P a probability distribution over all scenarios Probability Correct scenarios Noise 0 All scenarios

  15. Outline of the talk • The specification mining problem • Our specification mining system • Annotating traces with dependences • Extracting and standardizing scenarios • Probabilistic learning and postprocessing • Verifying traces • Experimental results • Related work

  16. dependence annotator Traces Annotated traces Dependence annotation socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)

  17. dependence annotator Traces Annotated traces Dependence annotation socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0) • Definers: • socket.return • accept.return • close.so • Users: • accept.so • read.so • write.so • close.so

  18. dependence annotator Traces Annotated traces Dependence annotation socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0) • Definers: • socket.return • accept.return • close.so • Users: • accept.so • read.so • write.so • close.so

  19. Outline of the talk • The specification mining problem • Our specification mining system • Annotating traces with dependences • Extracting and standardizing scenarios • Probabilistic learning and postprocessing • Experimental results • Related work

  20. Seeds Annotated traces scenario extractor Abstract scenarios Extracting scenarios socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)

  21. Seeds Annotated traces scenario extractor Abstract scenarios Extracting scenarios socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)

  22. Seeds Annotated traces scenario extractor Abstract scenarios Extracting scenarios socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)

  23. Seeds Annotated traces scenario extractor Abstract scenarios Simplifying scenarios socket(domain = 2, type = 1, proto = 0, return = 7) [seed] accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)

  24. Seeds Annotated traces scenario extractor Abstract scenarios Simplifying scenarios socket(return = 7) [seed] accept(so = 7, return = 8) write(so = 8) close(so = 8) close(so = 7) Drops attributes not used in dependences.

  25. Seeds Annotated traces scenario extractor Abstract scenarios Equivalent scenarios Simplified scenarios Standardizing scenarios Standardization Abstract scenarios • Two transformations: • Naming: foo(val = 7)  foo(val = X) • Reordering: foo(); bar();  bar(); foo(); • Finds the least standardized scenario, in • lexicographic order

  26. Seeds Annotated traces scenario extractor Abstract scenarios Use-def and def-def dependences Standardizing scenarios socket(return = 7) [seed] accept(so = 7, return = 8) write(so = 8) read(so = 8) close(so = 8) close(so = 7)

  27. Seeds Annotated traces scenario extractor Abstract scenarios Use-def and def-def dependences Standardizing scenarios socket(return = 7) [seed] accept(so = 7, return = 8) read(so = 8) write(so = 8) close(so = 8) close(so = 7) Reorder

  28. Seeds Annotated traces scenario extractor Abstract scenarios Use-def and def-def dependences Standardizing scenarios socket(return = X) [seed] accept(so = X, return = Y) read(so = Y) write(so = Y) close(so = Y) close(so = X) Reorder Name

  29. Seeds Annotated traces scenario extractor Abstract scenarios Standardizing scenarios socket(return = X) [seed] accept(so = X, return = Y) read(so = Y) write(so = Y) close(so = Y) close(so = X) A B D E F G Each interaction is a letter to the PFSA learner.

  30. Outline of the talk • The specification mining problem • Our specification mining system • Annotating traces with dependences • Extracting and standardizing scenarios • Probabilistic learning and postprocessing • Experimental results • Related work

  31. Abstract scenarios automaton learner Specification PFSA learning • Algorithm due to Raman et al.: • Build a weighted retrieval tree • Merge similar states

  32. Abstract scenarios automaton learner Specification PFSA learning • Algorithm due to Raman et al.: • Build a weighted retrieval tree • Merge similar states A 100 B 100 C 99 1 99 F D E 1 99 G F G 99

  33. Abstract scenarios automaton learner Specification PFSA learning • Algorithm due to Raman et al.: • Build a weighted retrieval tree • Merge similar states A 100 B 100 C 99 1 D E 99 G 99 F G 1

  34. Abstract scenarios automaton learner Specification PFSA learning • Algorithm due to Raman et al.: • Build a weighted retrieval tree • Merge similar states A 100 B 100 C 99 1 D E 99 F G 100

  35. Abstract scenarios automaton learner Specification Postprocessing: coring • Remove infrequent transitions • Convert PFSA to NFA A 100 B 100 C 99 1 D E 99 F G 100

  36. Abstract scenarios automaton learner Specification Postprocessing: coring • Remove infrequent transitions • Convert PFSA to NFA A B C D E F G

  37. Outline of the talk • The specification mining problem • Our specification mining system • Annotating traces with dependences • Extracting and standardizing scenarios • Probabilistic learning and postprocessing • Experimental results • Related work

  38. Where to find bugs? • in programs (static verification)? • or in traces (dynamic verification)?

  39. socket(...) socket(...) socket(...) accept(...) accept(...) accept(...) write(...) write(...) write(...) read(...) read(...) read(...) close(...) close(...) close(...) How we verify specifications Scenarios (dep. graphs) Traces ... socket(domain = 2, type = 1, proto = 0, return = 7)) ... ... socket(domain = 2, type = 1, proto = 0, return = 7)) ... ... socket(domain = 2, type = 1, proto = 0, return = 7)) ... extract scenarios Strings Check automaton membership A C E G B standardize A C E G B A C E G B

  40. socket(return = X) [seed] read(so = Y) accept(so = X, return = Y) write(so = Y) close(fd = X) close(fd = Y) Verifying traces ... socket(return = 7) accept(so = 7, return = 8) write(so = 8) read(so = 8) close(so = 8) close(so = 7) ... ... socket(return = 7) accept(so = 7, return = 8) write(so = 8) read(so = 8) close(so = 8) ... OK (both sockets closed) Bug! (socket 7 not closed)

  41. Experimental results Attempted to mine and verify two published X11 rules Challenge: small, buggy training sets (16 programs)

  42. Learning by trial and error Start with a rule learned from one, trusted trace. Then: Randomly select an unused trace no Trace obeys rule? Expert: is trace buggy? yes yes Report bug Add trace to training set; learn a new rule no (rule too specific)

  43. Results • A timestamp-passing rule • 4 traces did not need inspection • learned the rule! (compact: 7 states) • bugs in 2 out of 16 programs (ups, e93) • English specification was incomplete (3 traces) • expert and corer agreed on 81% of the hot core • SetOwner(x) must be followed by GetSelection(x) • failed to learn the rule (very small learning set) but • bugs in 2 out of 5 programs (xemacs, ups)

  44. Outline of the talk • The specification mining problem • Our specification mining system • Annotating traces with dependences • Extracting and standardizing scenarios • Probabilistic learning and postprocessing • Experimental results • Related work

  45. Related work Arithmetic pre/post conditions • Daikon [Ernst et al], Houdini [Flanagan and Leino] • properties orthogonal from us • eventually, we may need to include and learn some arithmetic relationships Temporal relationships over calls • intrusion detection: [Ghosh et al], [Wagner and Dean] • software processes: [Cook and Wolf] • error checking: [Engler et al SOSP 2001] • lexical and syntactic pattern matching • user must write templates (e.g., <a> always follows <b>) • design patterns: [Reiss and Renieris]

  46. Conclusion • Introduced specification mining, a new approach for learning correctness specifications • Refined the problem into a problem of probabilistic learning from traces • Developed and demonstrated a practical specifications miner

  47. End of talk

  48. How we mine specifications Program Test inputs dependence annotator tracer run Instrumented program Traces Annotated traces ... socket(domain = 2, type = 1, proto = 0, return:T0 = 7) [SETUP socket:T0 7] accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T0 = 8) [USE socket:T0 8] close(so:T0 = 8, return = 0) close(so:T0 = 7, return = 0) ...

  49. How we mine specifications Program int s = socket(AF_INET, SOCK_STREAM, 0); [DO SETUP] while(cond1) { int ns = accept(s, &addr, &len); while(cond2) { [USE NS] if (cond3) return; } close(ns); } close(s);

  50. How we mine specifications Program tracer Instrumented program int s = socket(AF_INET, SOCK_STREAM, 0); [DO SETUP] while(cond1) { int ns = accept(s, &addr, &len); while(cond2) { [USE NS] if (cond3) return; } close(ns); } close(s);

More Related