1 / 21

Every Bit Counts

Every Bit Counts. A semantic approach to the binary representation of data and programs. Andrew Kennedy & Dimitrios Vytiniotis Microsoft Research Cambridge { akenn,dimitris }@microsoft.com. ICFP 2010 , Baltimore, MD. Before the fun starts: “is encoding and decoding relevant?”. Sure:

ion
Télécharger la présentation

Every Bit Counts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Every Bit Counts A semantic approach to the binary representation of data and programs Andrew Kennedy & Dimitrios Vytiniotis Microsoft Research Cambridge {akenn,dimitris}@microsoft.com ICFP 2010, Baltimore, MD

  2. Before the fun starts: “is encoding and decoding relevant?” Sure: • How to design easy-to-verify tamper-proof bytecode formats? • Semi-formal work for Java [Franz et al.] • How to incorporate semantic and statistical info for more compact encodings and compression? • Java bytecode and .NET CLI are quite bulky formats, also work on Javascript compression schemes etc. • Lots of work in the XML world, oracle-based PCC checking [Necula and Rahul], term compression [J. Cheney], … • How to make it easy to prove the correctness of a codec? • Lots of work in the generic programming realm, also PADS [Fisher et al.]… • … and offer all that in a nice DSL? • Easier in use and verification than picklercombinators [Kennedy]

  3. Let’s play a “guess who” game I have some PL researcher in mind. Can you guess who? Yes Do they do research in functional programming? Do they care a lot about minimal invariance? No Do they work on polytypic programming? Yes Are they taller than 1.90m? Yes Are they in the ICFP committee? No

  4. Guess which program I have some program* in mind. Can you guess which? Code 0100110 Aha! You thought of λx:Int.λy:Int.x * A closed program in STLC with Int base type.

  5. The idea Represent a codec by a strategy for playing a question & answer guessing-game • Encode • ask questions of data and record answers as bitstream • Decode • interpret bitstream as answers to the same Q&A strategy

  6. Example play, set-theoretically Is it a function application? No. Is its argument an Int? Yes. Is its body a variable? No. Non-Int-argument lambdas All well-typed programs … Lambda expressions Int-argument lambdas with variable body Int-argument lambdas Singleton set Int-argument lambdas with non-variable body Function application expressions Set of possible data values Binary partition of set

  7. From sets to types • Possible set of data values: type • Binary partition of set: type isomorphism • Singleton set: type isomorphism • Strategy: possibly-infinite binary decision tree whose • nodes contain type isomorphisms • leaves contain type isomorphisms Or, in code:

  8. A silly game: unary naturals isZero: 0 1 isZero: 0 1 … Infinite tree, crucially relying on laziness (co-induction in Coq)

  9. Generic encoding and decoding Encode a value of type to a bitstream If is a singleton, there’s no information to encode! Otherwise, use the map from to to “ask” in which partition lives Emit a bit and continue on the left or right subtree with the deconstructed value May throw error if bistream too short Example:

  10. Correctness for free* Set 01001010 * If the ISOs are indeed isomorphisms Bitstrings

  11. Non-ambiguity and non-redundancy for free* Set Non-redundant codes Unambiguous codes 01001010 01001110 * If the ISOs are indeed isomorphisms Bitstrings

  12. Game combinators Cast a game from one type to another through an isomorphism Given games for and , construct games for sum or product of and Dependent pairs: type of second component depends on value of first Combinators in action!

  13. Combinators= co-fixpoints

  14. No silly questions please, and Every Bit Counts! • If possible, strategy should not ask “silly questions” that reveal no new information e.g. Are you a number smaller than 5?YesAre you a number smaller than 7? Of course I am! • This corresponds to proper partitioning: For all isosin game sets and are non-empty Theorem: Suppose has proper partitioning, and there is a leaf for every element of its domain. If fails then there is some extension of such that succeeds.

  15. But what does that mean? Theorem: … blablablah … That feels highly compact! Can we take this domain to be the “set of well typed programs”? EVERY bitstring represents a non-empty set of elements in the domain

  16. Simple types Problem: Devise a game for STLC with no “silly questions”! • Idea 1:Parameterize game on environment (for open terms) and type:  Not every environment/type combination is inhabited. To avoid asking “silly” questions (at game construction time – not at encoding/decoding time) we have to solve inhabitation problems.

  17. Some ingenuity required Idea 2: Parameterize on environment and pattern of form where is a wildcard All environment/pattern combinations are inhabited, no need to solve hard problems at game construction time A provably EVERY BIT COUNTS encoding for STLC (and the proof did not kill us)

  18. The STLC game Can we play a game for variables with this pattern in this environment? Are you a variable? Are you an application? Application game: Play game for argument *Get* the argument and play game for the function using the argument’s type

  19. Pearly too! Haskell code for STLC, on one slide See paper for details, games for several statistical compression schemes, and even more game transformations

  20. Future directions Do it for real! E.g. .NET CIL, ghc Core Integrate arithmeticcoding. Put probabilities on arcs of tree. Develop “methodology” for codecs for typed programming languages. (=> No ingenuity required?)

  21. What’s left, after the fun? An elegant characterization codecs Q&A strategies and a DSL to program them • Q&A strategies can give rise to non-redundant, compactcoding schemes • Offer cheap verification • And are fun to program with Download and play: http://research.microsoft.com/people/dimitris

More Related