1.04k likes | 1.18k Vues
Mining Specifications (lots of) code specifications of correctness. Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research. program. program. program. program. verifier. Specifications. Bugs!.
E N D
Mining Specifications(lots of) code specifications of correctness Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research
program program program program verifier Specifications Bugs! Motivation: why specifications? Verification tools • find bugs early • make guarantees • scale with programs • need specifications
program Easy to write, big payoff program program program • array accesses • memory allocation • type safety • ... verifier Bugs! Language-usage specifications
program Harder to write, smaller payoff program • cut-and-paste (X11) • network server (socket API) • device drivers (kernel API) • ... verifier Bugs! Library-usage specifications
program Hardest to write, smallest payoff • symbol table well-formed • IR well-formed • ... verifier Bugs! Program specifications
Solution: specification mining • Specification mining gleans specifications from • artifacts of program development: • From programs (static)? • From executions of test cases (dynamic)? • From other artifacts?
Mining from traces ... socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) read(so = 8, buf = 0x100, len = 12, return = 12) close(so = 8, return = 0) close(so = 7, return = 0) ... • Advantages: • No infeasible paths • Pointer/alias analysis is easy • Few bugs, as program passes its tests • Common behavior is correct behavior
socket(return = X) read(so = Y) write(so = Y) accept(so = X, return = Y) close(so = X) close(so = Y) Output: a specification start end • Specification says what programs should do: • Temporal dependences (accept follows socket) • Data dependences (accept input is socket output)
40 start start . read(so = Y) socket(return = X) A 20 . 20 C 10 . . 10 D write(so = Y) 20 accept(so = X, return = Y) B 20 10 10 E close(so = X) F close(so = Y) 10 end end socket(...) socket(...) socket(...) accept(...) accept(...) accept(...) write(...) write(...) write(...) read(...) read(...) read(...) close(...) close(...) close(...) How we mine specifications Scenarios (dep. graphs) Traces ... socket(domain = 2, type = 1, proto = 0, return = 7)) ... ... socket(domain = 2, type = 1, proto = 0, return = 7)) ... ... socket(domain = 2, type = 1, proto = 0, return = 7)) ... extract scenarios Strings PFSA learner A C E G B standardize A C E G B A C E G B PFSA postprocess Specification
Outline of the talk • The specification mining problem • Our specification mining system • Annotating traces with dependences • Extracting and standardizing scenarios • Probabilistic learning and postprocessing • Experimental results • Related work
An impossible problem Find a Turing machine that generates C, given T. I (all traces) C (all correct traces) • Unsolvable: • No restrictions on C • No connection • between C and T • Simple variants • are also • undecidable [Gold67] T (training traces)
A simpler problem Find a PFSAthat generates an approximation of P. 1 P a probability distribution Probability Correct Noise 0
A simpler problem Find a PFSAthat generates an approximation of P. 1 P a probability distribution over all scenarios Probability Correct scenarios Noise 0 All scenarios
A simpler problem • Find a PFSAthat generates • an approximation of P. • Tractable, plus • Scenarios are small • Noise handled • Finite-state • Weights useful for • postprocessing 1 P a probability distribution over all scenarios Probability Correct scenarios Noise 0 All scenarios
Outline of the talk • The specification mining problem • Our specification mining system • Annotating traces with dependences • Extracting and standardizing scenarios • Probabilistic learning and postprocessing • Verifying traces • Experimental results • Related work
dependence annotator Traces Annotated traces Dependence annotation socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)
dependence annotator Traces Annotated traces Dependence annotation socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0) • Definers: • socket.return • accept.return • close.so • Users: • accept.so • read.so • write.so • close.so
dependence annotator Traces Annotated traces Dependence annotation socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0) • Definers: • socket.return • accept.return • close.so • Users: • accept.so • read.so • write.so • close.so
Outline of the talk • The specification mining problem • Our specification mining system • Annotating traces with dependences • Extracting and standardizing scenarios • Probabilistic learning and postprocessing • Experimental results • Related work
Seeds Annotated traces scenario extractor Abstract scenarios Extracting scenarios socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)
Seeds Annotated traces scenario extractor Abstract scenarios Extracting scenarios socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)
Seeds Annotated traces scenario extractor Abstract scenarios Extracting scenarios socket(domain = 2, type = 1, proto = 0, return = 7) accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)
Seeds Annotated traces scenario extractor Abstract scenarios Simplifying scenarios socket(domain = 2, type = 1, proto = 0, return = 7) [seed] accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8) write(so = 8, buf = 0x100, len = 23, return = 23) close(so = 8, return = 0) close(so = 7, return = 0)
Seeds Annotated traces scenario extractor Abstract scenarios Simplifying scenarios socket(return = 7) [seed] accept(so = 7, return = 8) write(so = 8) close(so = 8) close(so = 7) Drops attributes not used in dependences.
Seeds Annotated traces scenario extractor Abstract scenarios Equivalent scenarios Simplified scenarios Standardizing scenarios Standardization Abstract scenarios • Two transformations: • Naming: foo(val = 7) foo(val = X) • Reordering: foo(); bar(); bar(); foo(); • Finds the least standardized scenario, in • lexicographic order
Seeds Annotated traces scenario extractor Abstract scenarios Use-def and def-def dependences Standardizing scenarios socket(return = 7) [seed] accept(so = 7, return = 8) write(so = 8) read(so = 8) close(so = 8) close(so = 7)
Seeds Annotated traces scenario extractor Abstract scenarios Use-def and def-def dependences Standardizing scenarios socket(return = 7) [seed] accept(so = 7, return = 8) read(so = 8) write(so = 8) close(so = 8) close(so = 7) Reorder
Seeds Annotated traces scenario extractor Abstract scenarios Use-def and def-def dependences Standardizing scenarios socket(return = X) [seed] accept(so = X, return = Y) read(so = Y) write(so = Y) close(so = Y) close(so = X) Reorder Name
Seeds Annotated traces scenario extractor Abstract scenarios Standardizing scenarios socket(return = X) [seed] accept(so = X, return = Y) read(so = Y) write(so = Y) close(so = Y) close(so = X) A B D E F G Each interaction is a letter to the PFSA learner.
Outline of the talk • The specification mining problem • Our specification mining system • Annotating traces with dependences • Extracting and standardizing scenarios • Probabilistic learning and postprocessing • Experimental results • Related work
Abstract scenarios automaton learner Specification PFSA learning • Algorithm due to Raman et al.: • Build a weighted retrieval tree • Merge similar states
Abstract scenarios automaton learner Specification PFSA learning • Algorithm due to Raman et al.: • Build a weighted retrieval tree • Merge similar states A 100 B 100 C 99 1 99 F D E 1 99 G F G 99
Abstract scenarios automaton learner Specification PFSA learning • Algorithm due to Raman et al.: • Build a weighted retrieval tree • Merge similar states A 100 B 100 C 99 1 D E 99 G 99 F G 1
Abstract scenarios automaton learner Specification PFSA learning • Algorithm due to Raman et al.: • Build a weighted retrieval tree • Merge similar states A 100 B 100 C 99 1 D E 99 F G 100
Abstract scenarios automaton learner Specification Postprocessing: coring • Remove infrequent transitions • Convert PFSA to NFA A 100 B 100 C 99 1 D E 99 F G 100
Abstract scenarios automaton learner Specification Postprocessing: coring • Remove infrequent transitions • Convert PFSA to NFA A B C D E F G
Outline of the talk • The specification mining problem • Our specification mining system • Annotating traces with dependences • Extracting and standardizing scenarios • Probabilistic learning and postprocessing • Experimental results • Related work
Where to find bugs? • in programs (static verification)? • or in traces (dynamic verification)?
socket(...) socket(...) socket(...) accept(...) accept(...) accept(...) write(...) write(...) write(...) read(...) read(...) read(...) close(...) close(...) close(...) How we verify specifications Scenarios (dep. graphs) Traces ... socket(domain = 2, type = 1, proto = 0, return = 7)) ... ... socket(domain = 2, type = 1, proto = 0, return = 7)) ... ... socket(domain = 2, type = 1, proto = 0, return = 7)) ... extract scenarios Strings Check automaton membership A C E G B standardize A C E G B A C E G B
socket(return = X) [seed] read(so = Y) accept(so = X, return = Y) write(so = Y) close(fd = X) close(fd = Y) Verifying traces ... socket(return = 7) accept(so = 7, return = 8) write(so = 8) read(so = 8) close(so = 8) close(so = 7) ... ... socket(return = 7) accept(so = 7, return = 8) write(so = 8) read(so = 8) close(so = 8) ... OK (both sockets closed) Bug! (socket 7 not closed)
Experimental results Attempted to mine and verify two published X11 rules Challenge: small, buggy training sets (16 programs)
Learning by trial and error Start with a rule learned from one, trusted trace. Then: Randomly select an unused trace no Trace obeys rule? Expert: is trace buggy? yes yes Report bug Add trace to training set; learn a new rule no (rule too specific)
Results • A timestamp-passing rule • 4 traces did not need inspection • learned the rule! (compact: 7 states) • bugs in 2 out of 16 programs (ups, e93) • English specification was incomplete (3 traces) • expert and corer agreed on 81% of the hot core • SetOwner(x) must be followed by GetSelection(x) • failed to learn the rule (very small learning set) but • bugs in 2 out of 5 programs (xemacs, ups)
Outline of the talk • The specification mining problem • Our specification mining system • Annotating traces with dependences • Extracting and standardizing scenarios • Probabilistic learning and postprocessing • Experimental results • Related work
Related work Arithmetic pre/post conditions • Daikon [Ernst et al], Houdini [Flanagan and Leino] • properties orthogonal from us • eventually, we may need to include and learn some arithmetic relationships Temporal relationships over calls • intrusion detection: [Ghosh et al], [Wagner and Dean] • software processes: [Cook and Wolf] • error checking: [Engler et al SOSP 2001] • lexical and syntactic pattern matching • user must write templates (e.g., <a> always follows <b>) • design patterns: [Reiss and Renieris]
Conclusion • Introduced specification mining, a new approach for learning correctness specifications • Refined the problem into a problem of probabilistic learning from traces • Developed and demonstrated a practical specifications miner
How we mine specifications Program Test inputs dependence annotator tracer run Instrumented program Traces Annotated traces ... socket(domain = 2, type = 1, proto = 0, return:T0 = 7) [SETUP socket:T0 7] accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T0 = 8) [USE socket:T0 8] close(so:T0 = 8, return = 0) close(so:T0 = 7, return = 0) ...
How we mine specifications Program int s = socket(AF_INET, SOCK_STREAM, 0); [DO SETUP] while(cond1) { int ns = accept(s, &addr, &len); while(cond2) { [USE NS] if (cond3) return; } close(ns); } close(s);
How we mine specifications Program tracer Instrumented program int s = socket(AF_INET, SOCK_STREAM, 0); [DO SETUP] while(cond1) { int ns = accept(s, &addr, &len); while(cond2) { [USE NS] if (cond3) return; } close(ns); } close(s);