280 likes | 287 Vues
Modern Compiler Internal Representations. Silvius Rus 1/23/2002. Presentation Navigator. Introduction Challenges Staged compilation Generate efficient code Case studies Conclusions. Traditional Compiler Organization. Pass: output type Read code as text: ASCII characters
E N D
Modern Compiler Internal Representations Silvius Rus 1/23/2002
Presentation Navigator • Introduction • Challenges • Staged compilation • Generate efficient code • Case studies • Conclusions
Traditional Compiler Organization • Pass: output type • Read code as text: ASCII characters • Lexical scanner: language words • Syntactic parser: language phrases • Translation: attribute grammar phrases • Output generated code: binary stream • Focus on pipelining due to memory window constraints
Traditional Compiler Internal Representation • Grammatical structure not always built explicitly • Implicit, built-in semantics • Simple data structures: • Transition tables • Token streams and stacks
Presentation Navigator • Introduction • Challenges • Staged compilation • Generate efficient code • Case studies • Conclusions
Compiler Challenges • Versatile: • Understand multiple languages • Generate output for various architectures • Generated efficient code: • Fast: as fast as coded directly in the output language • Portable: runs on multiple platforms • Verifiable: runs provably within a specified class of behavior • Secure: provably respects certain security requirements • Extendable: need to extend in order to: • Incorporate new input language and/or target system • Take advantage of advances in run-time environments (such as ISA changes, multithreading, distributed/parallel execution) L+A < L*A
Understand Multiple Languages - Output for Multiple Targets • Abstract IR: • Same representation for Fortran, C, C++, Java, … • Possible only for conceptually similar languages • Good points: • Perform complex transformations on a single representation • Bad points: • Language semantics may either get lost or need additional particular representation • Specific architecture characteristics are more profitable to use than common (abstractable) ones
Presentation Navigator • Introduction • Challenges • Staged compilation • Generate efficient code • Case studies • Conclusions
Staged Compilation • Stage 1: • Load source file (text) into IR1 – machine independent • Optimize IR1 • Stream IR1 to text file • Save/reload, pipe, HTTP, … text file • SUIF files, Java bytecode, .NET assembly • Stage 2: • Load text file into IR2 – machine dependent • Perform machine specific optimization on IR2 • Generate executable code or interpret IR2
Staged Compilation • Prepare IR1 so that stage 2 is very cheap • Quicksilver • Insert templated optimized object code in bytecode • Pack speculative optimization validation predicates in bytecode • Keep method dependence graphs explicitly in bytecode • Microsoft .NET • Explicit type/class information in IL • Preformatted, quickly accessible metadata • Strings, tables, heaps • Custom data • Allow embedding of native code
Presentation Navigator • Introduction • Challenges • Staged compilation • Generate efficient code • Case studies • Conclusions
Generate Fast And Portable Code • Fast code • IR close to machine structure • Mapping data to registers • Mapping operations to opcodes • Scheduling instructions for superscalar/VLIW processors • Portable code • Machine description must be totally abstracted • QuickSilver: templated optimized code
Generate Verifiable Code • Microsoft .NET IL • Static and dynamic type safety - reflections • Managed code • Carries a minimum of information on itself • Usually signed by compiler in Stage 1 • Managed data • Only accessible from managed code • Garbage collected • Managed pointers
Generate Secure Code • Hard to define limits • Make sure you run what you mean to • Limit rights • Per user • Per software component • QuickSilver: digests • .NET IL: • Code is signed using encrypting of hashed original • Permissions are set per module
Generate Efficient Code • IR may also provide support for: • Versioning (Quicksilver, .NET) • Culture (.NET)
Presentation Navigator • Introduction • Challenges • Staged compilation • Generate efficient code • Case studies • Conclusions
Compiler Internal Representation - General Organization • High-level - completely machine independent • Abstract Syntax Tree • Control Flow Graph • Control Dependence Graph • Data Dependence Graph • Static Single Assignment • Medium-level - dependent on classes of machines • Virtual machine code, such as stack machine • Low level - dependent on particular ISA • Assembly, machine instruction graphs
Case Study: Polaris • High level representation • Abstract Syntax Tree • Control Flow Graph • Control Dependence Graph • Data Dependence Graph • Gated Static Single Assignment • Some generality • Backends for various parallel execution systems
Case Study: SUIF2 • Multiple level representation • CFG, CDG, … • Quads • Machsuif • Custom annotations • Multiple frontends: Fortran, C, Java • Multiple backends: SUIF VM, C, assembly • Decoupled passes communicate only via SUIF • Extendable: OSUIF
Case Study: Promis • Switch to Promis organization presentation • Switch to Promis IR presentation
Case Study: KCC • Kook and Associates (KAI) C++ compiler: • C++ dedicated internal representation • Advanced C++ specific optimization • Proprietary C++ specific object format • Interprocedural optimization with modular compilation • C++ specific debug information – usable with KDB • Outputs C with calls to proprietary run-time library • Uses GNU gcc to generate machine code
Case Study: Jalapeno QuickSilver • Quasi-static images • Java bytecode + proprietary format • Representation allows for optimizations • Explicit method dependence graph • Templated optimized object code • Speculative optimization validation predicates
Case Study: .NET • Advertised 9 digit $$ figure project • CLI (ECMA standard) • Common type system • Type info in intermediate code • Common exception system • Throw in Visual Basic, catch in C++ • Support for security, culture, versioning • Support for charging per-use • Custom info can be passed for original language specific description 30+ languages MSIL native code
Other Compilers – Open Source • GNU compiler: • C, Fortran, Java, C++ front-ends • Generates code for all major architectures • Low level internal representation • New version (3.x) has SSA • SGI open source project: discontinued
Other Compilers – Commercial • Fortran, C, C++, Java produced by OS and/or hardware producers • HP, SGI, Intel, Microsoft, SUN • Other commercial compiler producers: • Borland, Watcom, etc. • Internal representation – company secret
Presentation Navigator • Introduction • Challenges • Staged compilation • Generate efficient code • Case studies • Conclusions
Conclusions • Internal representation evolved • Programming paradigms • Changes in hardware • Changes in compiler/run-time system technology • New issues: security, verifiability, culture, versioning • Tendency: E Pluribus Unum