420 likes | 600 Vues
Automatically Proving the Correctness of Compiler Optimizations. Sorin Lerner Todd Millstein Craig Chambers University of Washington. Goal: correct compilers. The compiler is usually part of the trusted computing base. “But I use gcc, and it works great!”. gcc-bugs mailing list.
E N D
Automatically Proving the Correctness of Compiler Optimizations Sorin Lerner Todd Millstein Craig Chambers University of Washington
Goal: correct compilers • The compiler is usually part of the trusted computing base. • “But I use gcc, and it works great!”
gcc-bugs mailing list Searched for “incorrect” and “wrong” in the gcc-bugs mailing list. Some of the results: • c/9525: incorrect code generation on SSE2 intrinsics • target/7336: [ARM] With -Os option, gcc incorrectly computes the elimination offset • optimization/9325: wrong conversion of constants: (int)(float)(int) (INT_MAX) • optimization/6537: For -O (but not -O2 or -O0) incorrect assembly is generated • optimization/6891: G++ generates incorrect code when -Os is used • optimization/8613: [3.2/3.3/3.4 regression] -O2 optimization generates wrong code • target/9732: PPC32: Wrong code with -O2 –fPIC • c/8224: Incorrect joining of signed and unsigned division • … And this is only for February 2003! On a mature compiler!
DIFF Testing Compiled Prog Source compiler input output exp- ected output run! • To get benefits, must: • run over many inputs • compile many test cases • No correctness guarantees: • neither for the compiled prog • nor for the compiler
Semantic DIFF Verify each compilation Compiled Prog Source compiler • Translation validation • [Pnueli et al 98, Necula 00] • Credible compilation • [Rinard 99] • Compiler can still have bugs. • Compile time increases. • “Semantic Diff” is hard.
Correctness checker Proving the whole compiler correct Compiled Prog Source compiler
compiler Correctness checker Proving the whole compiler correct • Option 1: Prove compiler correct by hand. • Proofs are long… • And hard. • Compilers are proven correct as written on paper. What about the implementation? Correctness checker Link? Proof Proof Proof «¬ $ \ r t l / .
Our Approach compiler • Our approach: prove compiler correct automatically. Correctness checker Automatic Theorem Prover
Automatic Theorem Prover This seems really hard! Task of proving compiler correct Complexity of proving a compiler correct. Complexity that an automatic theorem prover can handle.
Automatic Theorem Prover Making the problem easier Task of proving compiler correct
Automatic Theorem Prover Making the problem easier Task of proving optimizer correct • Only prove optimizer correct. • Trust front-end and code-generator.
Automatic Theorem Prover Making the problem easier Task of proving optimizer correct Write optimizations in Cobalt, a domain-specific language.
Automatic Theorem Prover Making the problem easier Task of proving optimizer correct Write optimizations in Cobalt, a domain-specific language. Separate correctness from profitability.
Automatic Theorem Prover Making the problem easier Task of proving optimizer correct Write optimizations in Cobalt, a domain-specific language. Separate correctness from profitability. Factor out the hard and common parts of the proof, and prove them once by hand.
Results • Cobalt language • realistic C-like IL • implemented const prop and folding, branch folding, CSE, PRE, DAE, partial DAE, and simple forms of points-to analyses • Correctness checker for Cobalt opts • using the Simplify theorem prover • Execution engine for Cobalt opts • in the Whirlwind compiler
Caveats • May not be able to express your opt Cobalt: • no interprocedural optimizations for now. • optimizations that build complicated data structures may be difficult to express. • A sound Cobalt optimization may be rejected by the correctness checker. • Trusted computing base (TCB) includes: • front-end and code-generator, execution engine, correctness checker, proofs done by hand once
Outline • Overview • Forward optimizations (see paper for backwards) • Example: constant propagation • Strategy for proving forward optimizations sound • Profitability heuristics • Pure analyses
REPLACE Constant Prop (straight-line code) y := 5 statement y := 5 statements that don’t define y x := y x := 5 statement x := y
REPLACE Adding arbitrary control flow if statement y := 5 y := 5 y := 5 y := 5 is followed by statements that don’t define y until x := y x := 5 statement x := y then transform statement to x := 5
Constant prop in English if statement y := 5 is followed by statements that don’t define y until statement x := y then transform statement to x := 5
Constant prop in Cobalt if statement y := 5 stmt(Y := C) boolean expressions evaluated at nodes in the CFG is followed by followed by : mayDef(Y) statements that don’t define y until until statement x := y X := Y then X := C transform statement to x := 5 English version Cobalt version
Outline • Overview • Forward optimizations (see paper for backwards) • Example: constant propagation • Strategy for proving forward optimizations sound • Profitability heuristics • Pure analyses
Proving correctness automatically y := 5 y := 5 y := 5 • Witnessing region • Invariant: y == 5 x := y x := 5
Constant prop revisited • Ask a theorem prover to show: • A statement satisfying stmt(Y := C) establishes Y == C • A statement satisfying :mayDef(Y) maintains Y == C • The statements X := Y and X := C have the same semantics in a program state satisfying Y == C stmt(Y := C) followed by : mayDef(Y) until X := Y X := C with witness Y == C
Generalize to any forward optimization • Ask a theorem prover to show: • A statement satisfying 1 establishes P • A statement satisfying 2 maintains P • The statements s and s’ have the same semantics in a program state satisfying P 1 followed by 2 until s s’ with witness We showed by hand once that these conditions imply correctness. P
Outline • Overview • Forward optimizations (see paper for backwards) • Profitability heuristics • Pure analyses
Profitability heuristics • Optimization correct ) safe to perform any subset of the matching transformations. • So far, all transformations were also profitable. • In some cases, many transformations are legal, but only a few are profitable.
The two pieces of an optimization • Transformation pattern: • defines which transformations are legal. 1 followed by 2 until s s’ with witness P filtered through choose • Profitability heuristic: • describes which of the legal transformations to actually perform. • does not affect soundness. • can be written in a language of the user’s choice. • This way of factoring an optimization is crucial to our ability to prove optimizations sound automatically.
Profitability heuristic example: PRE • PRE as code duplication followed by CSE
Profitability heuristic example: PRE • PRE as code duplication followed by CSE a := ...; b := ...; if (...) { a := ...; x := a + b; } else { ... } x := a + b; • Code duplication x := a + b;
Profitability heuristic example: PRE • PRE as code duplication followed by CSE a := ...; b := ...; if (...) { a := ...; x := a + b; } else { } x := • Code duplication • CSE • self-assignment removal x := a + b; a + b; x;
Profitability heuristic example: PRE Legal placements of x := a + b Profitable placement a := ...; b := ...; if (...) { a := ...; x := a + b; } else { ... } x := a + b;
Outline • Overview • Forward optimizations (see paper for backwards) • Profitability heuristics • Pure analyses
Constant prop revisited (again) stmt(Y := C) followed by : mayDef(Y) until X := Y X := C with witness Y == C
mayDef in Cobalt stmt(Y := C) followed by : mayDef(Y) until X := Y X := C with witness Y == C
mayDef in Cobalt stmt(Y := C) followed by : mayDef(Y) until X := Y X := C with witness • Very conservative! • Can we do better? Y == C
mayDef in Cobalt stmt(Y := C) followed by : mayDef(Y) until X := Y X := C with witness • Very conservative! • Can we do better? Y == C
mayDef in Cobalt stmt(Y := C) followed by : mayDef(Y) until X := Y X := C with witness Y == C
mayDef in Cobalt stmt(Y := C) followed by : mayDef(Y) until X := Y X := C with witness • mayPntTo is a pure analysis. • It computes dataflow info, but performs no transformations. Y == C
mayPntTo in Cobalt decl X stmt(decl X) followed by : stmt(... := &X) defines s addrNotTaken(X) with witness mayPntTo(X,Y) , : addrNotTaken(Y) “no location in the store points to X”
Future work • Improving expressiveness • interprocedural optimizations • one-to-many and many-to-many transformations • Inferring the witness • Generate specialized compiler binary from the Cobalt sources.
Summary and Conclusion • Optimizations written in a domain-specific language can be proven correct automatically. • Our correctness checker found several subtle bugs in Cobalt optimizations. • A good step towards proving compilers correct automatically.