1 / 31

Week 1 – Lecture 1

Week 1 – Lecture 1. Compiler Construction. Introduction The Textbook Assessment Overview. The Big Picture. In this course we will be constructing a compiler! Moving from a High Level Language to a Low Level Language Compilers are complex programs > 10,000 lines of code

benard
Télécharger la présentation

Week 1 – Lecture 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Week 1 – Lecture 1 Compiler Construction • Introduction • The Textbook • Assessment • Overview

  2. The Big Picture • In this course we will be constructing a compiler! • Moving from a High Level Language to a Low Level Language • Compilers are complex programs • > 10,000 lines of code • Integrate aspects from many different areas of CS • Formal language theory, algorithms, data structures, HLL & LLL (obviously), user interaction (error reporting)

  3. L1 L2 Source Target What is a compiler? • A specialization of a language translator • Usually in CS: • the Source is a high level programming language • the Target is a machine code for a micro-processor C x86 processor

  4. Applications of Compiler Techniques • Potential Source languages include: • Natural languages (English, French,….) • Circuit layout languages • Mark-up languages (HTML, XML, …) • Command line languages (SQL interface) • Potential Target languages include: • Natural languages • Printer drivers • Markup languages • e.g. HTML to RTF converter • Could involve many of the aspects we will cover in compiler construction

  5. Compilers for Programming Languages • If we had 1 compiler for each {Source,Target} pair then we would have a lot of compilers! Source Languages Target Languages Pascal x86 (MMX) Sather C++ ARM C C# AMD K6 Java SPARC Compilers Prolog Fortran PowerPC 750 (G3) Haskell Lisp JVM

  6. Source Modularity for Code Generation Compilers x86 ARM G4 Intermediate Representation  Compiler portability (man gcc – lists different target machines)

  7. Modularity for Source Languages? Targets Sources Compilers C Java Prolog Intermediate Representation Typically compilers only compile one source language – but the techniques used are very similar and are shared across different compilers

  8. Typical Compiler Front-end  Analysis Back-end  Synthesis Independent of Source and Target languages Intermediate Representation Source Target course now week 6 Ideally: For a new Source language – we can add a new front-end to an existing back-end For a new Target language – we can add a new back-end to an existing front-end

  9. Front End • Knowledge about the source language • Lexical structure (tokens) • Syntax • Programming constructs • Conditionals, iteration etc • Semantics • Type checking • Error-reporting • UI component • Often basic (and unhelpful!) • May vary if part of an IDE or standalone Source program Lexical analyser Symbol table Syntax analyser Error Handler Semantic analyser

  10. Lexical Analysis Lexical Tasks the compiler has to perform: group together the 3 characters ‘max’ to form the single variable identifier max group together the 2 characters ‘<=’ to form the single relational operator <= (less than or equal to) int max = 20, x; read(x); if ( x <= max ) print(‘ok’); else print(‘too big’);

  11. Syntactic Analysis • Recognise the if .. then … else structure • Group the x <= max into a single expression with a relational operator • Recognise the format of the variable declaration list • Such that x is correctly declared to be an int • Loops, program blocks (begin…end) • Arithmetic expressions, etc

  12. Semantic analysis • Check that x<=max is a sensible thing to do • If x was a boolean and max a string then we would have a type error • Check that the ‘20’ is in fact an integer and so can be assigned to an int • And also (can be split over several phases) • Keep a note of all the variables used so we make sure they all refer to the same value (in memory)

  13. Data Structures • Stream of text as the source file • Group together text into larger units from a limited set • Nearly all programming constructs can be represented as tree structures If statement statement if Boolean expression statement else Relational operator expression expression

  14. Data Structures • Lexical Analyzer •  Stream of tokens (enumerated type) • NUMBER OPERATOR NUMBER • Syntax Analyzer / Parser •  Tree of program structure program assignment if_statement while_loop output_statement

  15. Back-end • Knowledge about target processor / virtual machine • Instruction set • ‘costs’ of different: • op-codes • instructions • Registers • Memory Semantic analyser Intermediate code generator Symbol table manager Code optimiser Error handler Code generator

  16. Putting it together Compiler A language-processing system Source program Skeletal source program Lexical analyser preprocessor Syntax analyser Source program compiler Semantic analyser Error Handler Symbol table Target asse mbly program assembler Intermediate code generator Relocatable machine code Loader link-editor Code optimiser Code generator Absolute machine code

  17. Grammars • We define/describe HL languages with grammars • A Grammar consists of: • T, set of Terminals • N, set of Non-terminals • N  T =  • P, set of Productions •    • Where  and  are members of T  N • S, special member of N, the Start symbol • G = {T, N, P, S}

  18. Type 0 Unrestricted Grammar Type 1 Context-Sensitive Grammar Type 2 Context Free Grammar Type 3 Regular Grammar Chomsky’s Grammar Hierarchy

  19. Grammars • Type 0 (unrestricted) •   , •  and  are unrestricted sequences,  is not null • languages formed from Type 0 grammars can be recognised by non-deterministic Turing machines • Type 1 (context sensitive) •  A    B  • A becomes B in the context of  …  • Complex for computer analysis

  20. Grammars • Type 2 (context free) • A   • A is a Non-terminal •  is a member of T  N   (can be empty) • Equivalent to a push-down automaton • Type 3 (regular) • A  wB, A  w (right linear) • w is a string of Terminals • A and B are Non-Terminals • Finite state automata

  21. In a compiler • Use the minimum complexity grammars that let us successfully cope with HL programming languages (and process them efficiently) • Regular grammars (=regular expressions) in the Lexical Analysis phase • ‘recognise the words’ • Context-free grammars in the Syntax Analysis phase • ’recognise the phrases’ •  define our HLL as a grammar based on the output of the Lexical Analysis • Deal with context sensitivity in the Semantic Analysis phase

  22. Source program Lexical Analyser Semantic Analyser Intermediate Representation Overall Front-End View Flex Text file Regular grammar tokens Tree structure Syntax Analyser Bison Context-free grammar Back-end Type-safe Tree structure Tree / Linearized tree

  23. The Textbook Compilers: principles, techniques & tools Aho, Sethi & Ullman Addison-Wesley {‘The Dragon Book’}

  24. Assessment • Building a compiler for a new language • Front-end • Lexical analysis • Parsing • Back end • Generating assembler code • Some formal and some practical • Formal more at the front-end

  25. Programming & Tools • Lexical analysis generator – lex / flex • Parser generator – yacc / bison • C / C++ • To implement the remainder of the compiler • Unix environment • make files will be useful for coordinating lex and yacc

  26. Instant Compilation • Consider the program: main() { int a = 3; a = a + 1; } Given a reasonably sensible assembly language a hand-compilation might be: LDA #3 STA 1 LDA 1 ADD a, #1 STA 1

  27. & an Instant Compiler could look like … Switch( source_code_construct ) { case INT_DEC: print( “LDA #”, INT.value) print(“STA 1”) break case INT_ADD: print(“LDA 1”) print(“ADD a,#”, ADD.value) print(“STA 1”) break } /* end switch */

  28. The Problems …. • Not efficient, (LDA #4; STA 1) • Only works for 1 variable • Only works at one location in memory • (usually let assembler deal with symbolic addresses) • Only has 2 programming constructs! • Not even slightly portable: • 1 instruction set & 1 source language

  29. More problems… • No error reporting • type checking? • Assumes: • Program is correct • Recognition of programming language constructs • int a = 3  INT_DEC • Access to values • INT.value, ADD.value • 1:1 relationship between integers and memory locations

  30. Solutions • We can view compilers as a solution to all of these problems • E.g. • Only compile correct programs to object code • Recognise all constructs in the language • Improve the efficiency of code • Execution speed • Memory usage • Meaningful error messages to the user • Cope with different target architectures

  31. Why are compilers called compilers? • In early compilers one of the main tasks was connecting object program to • standard library functions, I/O devices • collecting information from different sources(e.g. libraries) • OS and processor dependent • This is now performed by ‘linkers’ • Compile – ‘construct by collecting from different sources’

More Related