Download
certifying compilation for standard ml in a type analysis framework n.
Skip this Video
Loading SlideShow in 5 Seconds..
Certifying Compilation for Standard ML in a Type Analysis Framework PowerPoint Presentation
Download Presentation
Certifying Compilation for Standard ML in a Type Analysis Framework

Certifying Compilation for Standard ML in a Type Analysis Framework

80 Views Download Presentation
Download Presentation

Certifying Compilation for Standard ML in a Type Analysis Framework

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. 3 3 3 4 4 4 5 5 5 3 3 4 4 5 5 3 3 3 4 4 4 5 5 5 Certifying Compilation for Standard ML in a Type Analysis Framework Leaf Petersen Carnegie Mellon University

  2. Motivation Carnegie Mellon University

  3. Types • Types capture facts about programs. • Fact: This procedure expects a 32 bit integer. • Fact: This address points to executable code. • Fact: This data structure was produced here. • Programmers use types: • To keep their facts straight. • Capture and preserve invariants. • To check their facts. • Typechecker verifies truth. • Manage complexity. Carnegie Mellon University

  4. P.o P1:T1 P2 Pn .... Types and Compilers • Compilers use types. • Predict size of data. • Eliminate unnecessary dynamic checks. • Most compilers forget types early. Carnegie Mellon University

  5. P.o : To P1:T1 P2:T2 Pn:Tn .... Type Preserving Compilation • Transform types with program. • Optimize code based on types. • Verify that invariants still hold. • Emit types on object code. Carnegie Mellon University

  6. TILT • Type preserving compiler • Standard ML. • Sparc, Alpha, (now) x86 backends • Perry Cheng, Chris Stone, Leaf Petersen, Dave Swasey, and others. • Intermediate languages are typed • Type based optimizations. • Internal correctness checks. • Generates typed x86 object code (this thesis). Carnegie Mellon University

  7. Why TILT? • Want to compile SML efficiently. • Separate compilation is a must. • Traditional optimizations. • Loop optimizations, CSE, constant folding, and many more. • New challenges for optimization. • Polymorphism, GC, 1st class functions, modules, etc. Carnegie Mellon University

  8. Ptr/non-ptr Example: Unknown Types. • Module interfaces (and polymorphism) introduce unknown types: • Clients compiled against interface • Cannot know what t is (may be instantiated multiple times) • Cannot predict size of value (if sizes vary). • Cannot predict traceability of value. Carnegie Mellon University

  9. Old Solutions • C, C++, Java: No unknown types. • Objects: “partially known” types. • Traditional ML/Lisp compilers: Uniform data representation. • All values are same size (e.g. 32 bits). • Large values (e.g. 64 bit floats) must be boxed. • Traceability dealt with via tagging (e.g. 31 bit ints). Carnegie Mellon University

  10. TILT Solution • Types tell size and traceability of data. • Unknown types are instantiated with known types at runtime. • Most compilers discard types before generating code. • TILT: Keep types at runtime and use them to dynamically determine layout and traceability. Carnegie Mellon University

  11. Type analysis • type Optarray[t] = Typecase[t] of Boxed(Float) => Array64[Float] | _ => Array32[t] • Note: • Optarray[Int] == Array32[Int] • Optarray[a] where a is unknown is dynamic • Constructor for type Optarray? • optarray[t] : int x t -> Optarray[t] Carnegie Mellon University

  12. Type analysis • optarray[t](len : int,init : t) : Optarray[t] = typecase [t] of Boxed (Float) => new_array64[Float](len, unbox(init)) | _ => new_array32[t](len,init) • For statically known types, reduces at compile time • optarray[Int](10,0) = new_array32[Int](10,0) • For unknown types, reduces at runtime Carnegie Mellon University

  13. Type-passing Optimizations • Type analysis: • Enables global representation optimizations in the presence of unknown types. • TILT uses types at runtime for: • Better data-layouts. • Unboxed arrays of 64 bit floats • 32 bit ints • Optimized sum representations • Flatten aggregate arguments into registers. • Mostly tag-free garbage collection. Carnegie Mellon University

  14. There’s more • Types can help with generating efficient code. • But not the end of the story.... Carnegie Mellon University

  15. Mobile Code • Code has become mobile. • May know very little about producer. • Examples: • Web applets. • Grid computing. • Binary installations/upgrades. • Application downloads. • High risk from malicious/wrong code. Carnegie Mellon University

  16. The Certification Problem • Source language safety is checkable. • Typechecker checks the programmers facts. • Raw object code is not checkable. • Safety relies on trust in: • Safety of source language. • Correctness/identity of producer/compiler. • Integrity of the object code. Carnegie Mellon University

  17. Java Approach • Java bytecode • High-level language (almost Java) • Can be typechecked • Interpreted • slow, somewhat complicated • JIT compiled • somewhat faster, quite complicated • Large trusted computing base Carnegie Mellon University

  18. Certified Code • Typed object code • Types certify safety • Code consumer • Does no compiling • Checks that certificate applies (easy) • Small trusted computing base • Several instances exist: • TAL: Typed Assembly Language • PCC: Proof Carrying Code • Many extensions and variations Carnegie Mellon University

  19. Certifying Compilers • Programs in safe languages • Types provide needed annotations • Compiler can emit code with certificate of type/memory safety • Certifying compilers exist for: • Safe subsets of C (TAL & PCC) • Java (PCC) • Now for Standard ML Carnegie Mellon University

  20. Types in Compilation • Types can be used to generate efficient code. • Types can be used to generate certified code. • Want to combine the two paradigms. Carnegie Mellon University

  21. My Thesis Certifying compilation of type analyzing code is feasible for a full modern language such as Standard ML. Carnegie Mellon University

  22. Two compilers • Theoretical compiler • Formal translation • Prove important properties • Guide the implementation • Real compiler • Follows the structure of the theoretical compiler • Targets a real certified code system. Carnegie Mellon University

  23. Theory Carnegie Mellon University

  24. Theoretical compiler • Three languages: • Singleton free MIL • LIL • Idealized TAL (ITAL) • Formal translations: • MIL to LIL • Closure conversion of LIL code • LIL to ITAL Carnegie Mellon University

  25. Languages • Singleton free MIL • Lambda calculus • Syntactic restriction to named form • Type analysis through primitives • LIL • Much more fine-grained than MIL • type and type analysis representation • closure representation • ITAL • Machine language • Idealized TAL • Simplified TAL with LX primitives for type analysis Carnegie Mellon University

  26. Translations • MIL to LIL • Very different type structure • Moderately different term structure • See my dissertation. • Closure conversion • Very standard • LIL to ITAL • Type structure is almost identical • Term structure is very different • Explicit control flow • Binding replaced with state modification Carnegie Mellon University

  27. LIL typing ;;` e :  •  – LIL heap context •  – LIL type context •  – LIL term context • e – LIL expression (named form) •  – LIL type for e Carnegie Mellon University

  28. ITAL typing ;;M` I ok •  – ITAL heap context •  – ITAL type context • M – ITAL register file type • I – ITAL instruction sequence Carnegie Mellon University

  29. ITAL typing ;;M` I ok •  – ITAL heap context •  – ITAL type context • M – ITAL register file type • I – ITAL instruction sequence Carnegie Mellon University

  30. Register files • A register file type M maps registers to ITAL types • e.g. M(r) =  • Notation: M{r:} means M with the type of r set to . • Designated stack pointer register sp • M(sp) =  •  describes the stack slots Carnegie Mellon University

  31. LIL to ITAL Translations • || - heap context translation • || - type context translation • || - type translation • Exp e maps to instruction seq I • But what is the translation of a term context? Carnegie Mellon University

  32. Register files • LIL variables occupy ITAL registers (or stack slots) • Hence, the translation of a LIL context is an ITAL register file. • Problem: what register file? • Variables are related to registers via register allocation. Carnegie Mellon University

  33. Register allocation • Previous work builds register allocation into the translation. • Complex and tedious • Unclear how to incorporate real RA (e.g. Graph coloring) • Consequently, toy register allocators are used in formal presentations • Better idea: translate with respect to abstract register allocator. Carnegie Mellon University

  34. Allocator Definition: An allocator A is an object such that: • For every variable x: • A(x) = r or A(x) = sp(i) • frmsz(A) is a natural number • For every LIL typing context  and stack type , ||A = M for some register file type M Carnegie Mellon University

  35. Translation judgment ;;;A,` e : Ã I •  – LIL heap context •  – LIL type context •  – LIL term context • A – Allocator •  – describes stack below frame • I – ITAL instruction sequence • For this talk, I’m ignoring exceptions, other stuff. Carnegie Mellon University

  36. Translation judgment ;;;A[z! r1 , x! r1 , y! r2] ,  ` z = x+y : intÃadd r1,r2 Carnegie Mellon University

  37. Question ;;;A,` e : Ã I • Why should I be well-typed? • Is the equational theory rich enough? • Easy to rely on equations that don’t hold • Want to show soundness: • Each translation maps well-typed terms to well-typed terms. • Doesn’t hold for all allocators: only the good ones. Carnegie Mellon University

  38. Good allocator for  Definition: Let M = ||A. We say that A is a good allocator for  if: • M(sp) = f± such that frmsz(A) = f • |²|A is the empty machine state. • If  = 1, x:, 2 then • A is a good allocator for 1 and 2 • If A(x) = r then ||A = |1,2|A{r:||} • If A(x) = sp(i) then something similar. Carnegie Mellon University

  39. Good allocator for e Definition: An allocator A is a good allocator for an expression e if: • For all derivations of ;; ` e : , A is a good allocator for . • A is a good allocator for all sub-expressions of e. Carnegie Mellon University

  40. Soundness Theorem: If A is a good allocator for e and ;;` e :  and  is a well-formed stack type and ;;;A,` e : Ã I then ||;||;M` I ok where M = ||A Carnegie Mellon University

  41. Benefits of this approach • Theory close to implementation • Register allocation is a parameter • Separates out the mechanism • Concise specification of interface between code gen and RA • Translation isn’t bogged down with algorithmic details of RA Carnegie Mellon University

  42. Downside: completeness • Depends on register allocator • Full completeness doesn’t hold • Possible to show parametric completeness? • Not clear what this means • Worthwhile tradeoff • Formal presentation very close to implementation • In practice: • Soundness is hard (implementation had bugs). • Completeness is just a matter of covering all cases. • Likely that this can be solved (future work) Carnegie Mellon University

  43. Summary (Theory) • Formal translations: • MIL to LIL • Closure conversion of LIL code • LIL to ITAL • Proof of soundness for each • New approach to dealing with typed RA • Provides a guide for...... Carnegie Mellon University

  44. Practice Carnegie Mellon University

  45. Real Compiler • Implemented a certifying back end for TILT. • Targets TAL for x86. • Type representation and analysis made explicit • Not gc interface (yet). • Data layout issues made explicit. • Boxing/unboxing. • Closure representations. • Heap data layout. Carnegie Mellon University

  46. Elaborate HIL (Typed) Phase split MIL (Typed) Optimize MIL (Typed) Code Gen SML Source • Shrinking inlining • Speculative inlining • CSE/Dead code elim • Constant folding • Uncurrying • Monomorphization • Flattening • Eta reduction • Closure conversion • Hoisting • Others • Typecheck • Eliminate modules • Some data rep • Code generation • Type representation • Untyped output! • Subsequent compilation is mostly standard. RTL (Untyped) Carnegie Mellon University

  47. Elaborate HIL (Typed) Phase split MIL (Typed) Optimize MIL (Typed) Code Gen SML Source RTL (Untyped) Carnegie Mellon University

  48. New TILT IL • LIL: Low-level internal language • Based on LX (Crary & Weirich) • Data representation explicit • Still lambda calculus-ish • Call/return (not CPS) • All heap allocation explicit • Type analysis implemented at the term level • Neat Trick • See the dissertation Carnegie Mellon University

  49. Front end MIL (Typed) Type rep LIL (Typed) Optimize LIL (Typed) Closure Conv LIL (Typed) Code Gen TAL (Typed) • Singleton elim • Dynamic type reps • Data rep structure • Unified allocation • CSE/Dead code elim • Constant folding • Eta reduction • Switch reduction • Others • Types and terms • Recursive code • Some opts • Direct to TALx86 • Reg alloc/cogen • Small peephole opts Carnegie Mellon University

  50. Compilation fib.sml fib/asm.tal TILT TALx86 fib/obj.o fib/obj.to Carnegie Mellon University