1 / 35

Compilers:

Compilers:. Overview. The Course Project. Course has hidden agenda: programming design and experience. Substantial project in parts. Provides example of how complicated problem might be approached. Validation (testing) is part of the project.

tynice
Télécharger la présentation

Compilers:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compilers: Overview

  2. The Course Project • Course has hidden agenda: programming design and experience. • Substantial project in parts. • Provides example of how complicated problem might be approached. • Validation (testing) is part of the project. • Similar to software engineering objectives in your sophomore group project.

  3. Generations of programming languages • 1GL: 0-1 bit machine languages • 2GL: assembly languages • 3GL: Fortran, Cobol, Lisp, C, C++, Java, imperative languages • 4GL: SQL, NOMAD (like SQL) • 5GL: Prolog, OPS5 (production system), declarative languages • . • . • . • nGL: intelligent interpreters that can translate human language into computer programs

  4. Two phases: Compile then run Many compilers can be installed on the same computer. Programs of different programming languages can be executed on the same machine. C compiler, Intel Pentium PC C compiler, Sun Spark workstation C++ compiler, AMD machine Etc.

  5. One shot: Translate and run • Statement by statement interpretation

  6. Translate then run • The Java approach • Prepare the file foo.java using an editor • Invoke the translator: javac foo.java • This creates foo.class • Run java: java foo javac bytecode java

  7. Language translators • Translator • General term for translating from one language (code) to another language (code) • Compilers • Compile the entire source code (program) • Do extensive preprocessing • Usually produce binary machine codes • Byte-compilers • Java applications are typically compiled to bytecode that can run on any Java virtual machine (JVM) independent of the actual physical machine. • Generate intermediate language code • Interpreters • Interpreters run programs as is with little or no preprocessing • Matlab • Generate intermediate language code which is then compiled into machine code and executed, all in one shot.

  8. Structure of a compiler • Lexical Analysis • Parsing • Semantic Analysis • Optimization • Code Generation The first 3, at least, can be understood by analogy to how humans comprehend English.

  9. Lexical Analysis • First step: recognize letters and words. • Words are smallest unit above letters This is a sentence. • Note the • Capital “T” (start of sentence symbol) • Blank “ “ (word separator) • Period “.” (end of sentence symbol) • Programming languages are typically more cryptic than English: *p->f+=-.12345e-5

  10. More Lexical Analysis • Lexical analyzer divides program text into “words” or “tokens” if x == y then z = 1; else z = 2; • Tokens: if, x, ==, y, then, z, =, 1, ;, else, z, =, 2, ;

  11. Parsing • Once words are understood, the next step is to understand sentence structure • Parsing = Diagramming sentences • The diagram is a tree.

  12. This line is a longer sentence article noun verb article adjective noun subject object sentence Diagramming a Sentence

  13. x == y z 1 z 2 relation assign assign predicate then-stmt else-stmt if-then-else Parsing Programs • Parsing program expressions is the same • Consider: If x == y then z = 1; else z = 2; • Diagrammed:

  14. Semantic Analysis • Once sentence structure is understood, we can try to understand “meaning” • But meaning is too hard for compilers • Compilers perform limited analysis to catch inconsistencies • for (int i = 0; i < 10; i += 1) A[i]=i;for (int i = 0; i < 10; i += 1) A[i]=2*i; • Some do more analysis to improve the performance of the program

  15. Semantic Analysis in English • Example: Jack said Jerry left his assignment at home. What does “his” refer to? Jack or Jerry? • Even worse: Jack said Jack left his assignment at home. How many Jacks are there? Which one left the assignment?

  16. Programming languages define strict rules to avoid such ambiguities This C++ code prints “4”; the inner definition is used { int Jack = 3; { int Jack = 4; cout << Jack; } } Semantic Analysis in Programming

  17. Optimization • No strong counterpart in English, but akin to editing • Automatically modify programs so that they • Run faster • Use less memory • In general, conserve some resource • Not an emphasis in the project.

  18. Optimization Example (in C) for (int i = 0; i < N; i += 1) { for (int j = 0; j < N; j += 1) { A[i][j] += B[i]*C[j] } } for (i = 0; i < N; i += 1) { double tmp1 = B[i]; double* tmp2 = &A[i]; for (int j = 0; j < N; j +=1) tmp2[j] += tmp1*C[j]; } Outer products between vectors B and C.

  19. Optimization is tricky for (i = 0; i < N; i += 1) A[i] *= D/A[0]; is NOT tmp1 = D/A[0]; for (i = 0; i < N; i += 1) A[i] *= tmp1; A[0] <- A[0] * D/A[0] A[1] <- A[1] * D/A[0] A[2] <- A[2] * D/A[0] . . . A[N] <- A[N] * D/A[0] What is the right way to optimize this?

  20. Optimization Example X = Y * 0 is the same as X = 0 NO! Valid for integers, but not for floating point numbers

  21. Examples of common optimizations in PLs • Dead code elimination • for (int i = 0; i < 10; i += 1) A[i]=i;for (int i = 0; i < 10; i += 1) A[i]=2*i; • Evaluating repeated expressions only once • for (int j = 0; j < N; j +=1) C[j] = A[j]+B[k]; • Replace expressions by simpler equivalent expressions • 1+2+3+…+N • (1+N)*N/2 • Move constant expressions out of loops • for (int j = 0; j < N; j +=1) C[j] = A[j]+3*3

  22. Optimization example • Evaluate expressions at compile time • Java constants • The static modifier causes the variable to be available without loading an instance of the class where it is defined. • The final modifier causes the variable to be unchangeable. • public static final int MAX_UNITS = 25;

  23. Optimization Example class Demo{ public :inline void cube(int n) {   cout<<n*n*n; }  };void main(){. .   cube(10);. .  cube (6);} • Inline procedures • C++ supports inline functions, not Java • Each function call cube(n) will be replaced by the code of cube function itself by the compiler. • No parameter passing to subroutine • No subroutine address call In C++, to print a value to the screen, write the word cout, followed by the insertion operator (<<). Here it is simply the output of the inline function.

  24. Intermediate Languages • Many compilers perform translations between successive intermediate forms • JavaScript • C • 3-address code • assembly • Intel machine code • All but first and last are intermediate languages internal to the compiler • Typically there is at least one intermediate language involved. • IL’s generally ordered in descending level of abstraction • Highest is source • Lowest is assembly

  25. Intermediate Languages (Cont.) • IL’s are useful because lower levels expose features hidden by higher levels • array index addressing • registers • memory layout • etc. • But lower levels obscure high-level meaning • Reverse engineering • Try reading the assembly code to figure out what the program is doing.

  26. Code Generation • Having understood the program, translate it into something useful (executable) • Classical compilers produce machine code or assembly. Target: bare hardware. • Java implementations, many “scripting languages” produce virtual machine code (“bytecode”). Target: interpreter or JIT (Just In Time) compiler.

  27. Machine Code Generator Should... • Translate all the instructions in the intermediate representation to assembly language • Allocate space for the variables, arrays etc. • Adhere to calling conventions • Create the necessary symbolic information for variables and identifiers

  28. Machines understand... LOCATION DATA 0046 8B45FC 0049 4863F0 004c 8B45FC 004f 4863D0 0052 8B45FC 0055 4898 0057 8B048500 000000 005e 8B149500 000000 0065 01C2 0067 8B45FC 006a 4898 006c 89D7 006e 033C8500 000000 0075 8B45FC 0078 4863C8 007b 8B45F8 007e 4898 0080 8B148500 000000

  29. Machines understand... LOCATION DATA ASSEMBLY INSTRUCTION 0046 8B45FC movl -4(%rbp), %eax 0049 4863F0 movslq %eax,%rsi 004c 8B45FC movl -4(%rbp), %eax 004f 4863D0 movslq %eax,%rdx 0052 8B45FC movl -4(%rbp), %eax 0055 4898 cltq 0057 8B048500 movl B(,%rax,4), %eax 000000 005e 8B149500 movl A(,%rdx,4), %edx 000000 0065 01C2 addl %eax, %edx 0067 8B45FC movl -4(%rbp), %eax 006a 4898 cltq 006c 89D7 movl %edx, %edi 006e 033C8500 addl C(,%rax,4), %edi 000000 0075 8B45FC movl -4(%rbp), %eax 0078 4863C8 movslq %eax,%rcx 007b 8B45F8 movl -8(%rbp), %eax 007e 4898 cltq 0080 8B148500 movl B(,%rax,4), %edx

  30. Memory Registers ALU Control Overview of a modern processor • ALU • Control flow • Memory • Registers

  31. Arithmetic and Logic Unit • Performs most of the data operations • Has the form: • OP <oprnd1>, <oprnd2> • <oprnd2> = <oprnd1> OP <oprnd2> • OP <oprnd1> • Operands are: • Immediate Value $25 • Register %rax • Memory 4(%rbp) • Operations are: • Arithmetic operations (add, sub, imul) • Logical operations (and, nor) • Unitary operations (inc, dec) Memory Registers ALU Control

  32. Program (character stream) Token Stream Parse Tree Intermediate Representation Binary executable Assembly code Lexical Analyzer (Scanner) Syntax Analyzer (Parser) Intermediate Code Generator High-level IR Low-level IR Code Generator Assembler & linker Processor

  33. Compilers Today • The overall structure of almost every compiler adheres to our outline • The proportions have changed since FORTRAN • Early: lexing, parsing most complex, expensive • Today: optimization dominates all other phases, lexing and parsing are cheap

  34. Trends in Compilation • Optimization is especially important for • scientific programs • advanced processors (Digital Signal Processors, advanced speculative architectures) • Small devices where speed = longer battery life • Parallelism and parallel architectures (esp., multicore) • Dynamic features: dynamic types, dynamic loading. • Compilers • More needed and more complex • Driven by increasing gap between • new languages • new architectures • Lots of research

  35. Why study languages and compilers ? • Application of a wide range of theoretical techniques • Data Structures • Theory of Computation • Algorithms • Computer Architecture • Good SW engineering experience • Learn to build a large and reliable system • Improve understanding of program behavior • Increase ability to learn new languages • Better understanding of programming languages

More Related