Recap

Recap Roman Manevich Mooly Sagiv

Outline • Subjects Studied • Questions & Answers

Lexical Analysis (Scanning) • input • program text (file) • output • sequence of tokens • Read input file • Identify language keywords and standard identifiers • Handle include files and macros • Count line numbers • Remove whitespaces • Report illegal symbols • [Produce symbol table]

Summary • For most programming languages lexical analyzers can be easily constructed automatically • Exceptions: • Fortran • PL/1 • Lex/Flex/Jlex are useful beyond compilers

Syntax Analysis (Parsing) • input • Sequence of tokens • output • Abstract Syntax Tree • Report syntax errors • unbalanced parenthesizes • [Create “symbol-table” ] • [Create pretty-printed version of the program] • In some cases the tree need not be generated (one-pass compilers)

Pushdown Automaton input u t w $ V control parser-table $ stack

Pushdown automata Deterministic Report an error as soon as the input is not a prefix of a valid program Not usable for all context free grammars context free grammar parser tokens Efficient Parsers cup “Ambiguity errors” parse tree

Top-Down (Predictive Parsing) LL Construct parse tree in a top-down matter Find the leftmost derivation For every non-terminal and token predict the next production Preorder tree traversal Bottom-Up LR Construct parse tree in a bottom-up manner Find the rightmost derivation in a reverse order For every potential right hand side and token decide when a production is found Postorder tree traversal Kinds of Parsers

2 3 4 5 Top-Down Parsing 1 input t1 t2

3 2 1 Bottom-Up Parsing input t1 t2 t4 t5 t6 t7 t8

Example Grammar for Predictive LL Top-Down Parsing expression  digit | ‘(‘ expression operator expression ‘)’ operator  ‘+’ | ‘*’ digit  ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’

static int Parse_Expression(Expression **expr_p){ Expression *expr = *expr_p = new_expression() ; /* try to parse a digit */ if (Token.class == DIGIT) { expr->type=‘D’; expr->value=Token.repr –’0’; get_next_token(); return 1; } /* try parse parenthesized expression */ if (Token.class == ‘(‘) { expr->type=‘P’; get_next_token(); if (!Parse_Expression(&expr->left)) Error(“missing expression”); if (!Parse_Operator(&expr->oper)) Error(“missing operator”); if (Token.class != ‘)’) Error(“missing )”); get_next_token(); return 1; } return 0; }

Parsing Expressions • Try every alternative production • For P  A1 A2 … An | B1 B2 … Bm • If A1 succeeds • Call A2 • If A2 succeeds • Call A3 • If A2 fails report an error • Otherwise try B1 • Recursive descent parsing • Can be applied for certain grammars • Generalization: LL1 parsing

int P(...) { /* try parse the alternative P  A1 A2 ... An */ if (A1(...)) { if (!A2()) Error(“Missing A2”); if (!A3()) Error(“Missing A3”); .. if (!An()) Error(Missing An”); return 1; } /* try parse the alternative P  B1 B2 ... Bm */ if (B1(...)) { if (!B2()) Error(“Missing B2”); if (!B3()) Error(“Missing B3”); .. if (!Bm()) Error(Missing Bm”); return 1; } return 0;

Predictive Parser for Arithmetic Expressions • Grammar • C-code? • E  E + T • E  T • T  T * F • T  F • 5 F  id • 6 F  (E)

Input A context free grammar A stream of tokens Output A syntax tree or error Method Construct parse tree in a bottom-up manner Find the rightmost derivation in (reversed order) For every potential right hand side and token decide when a production is found Report an error as soon as the input is not a prefix of valid program Bottom-Up Syntax Analysis

Constructing LR(0) parsing table • Add a production S’  S$ • Construct a finite automaton accepting “valid stack symbols” • States are set of items A • The states of the automaton becomes the states of parsing-table • Determine shift operations • Determine goto operations • Determine reduce operations • Report an error when conflicts arise

$ 2: S E $  E 14: T (E ) 7: E E +T 1: S E$ 4: E T 6: E E +T 10: T i 12: T  (E) 2: S E $ 7: E E +T T E 5: E T i 11: T i + ( i 13: T (E) 4: E T 6: E E +T 10: T i 12: T  (E) 7: E E +T 10: T i 12: T  (E) i ( + T 8: E E +T  ) 15: T (E) 

$ 2: S E $  E 14: T (E ) 7: E E +T Parsing “(i)$” 1: S E$ 4: E T 6: E E +T 10: T i 12: T  (E) 2: S E $ 7: E E +T T E 5: E T i 11: T i + ( i 13: T (E) 4: E T 6: E E +T 10: T i 12: T  (E) 7: E E +T 10: T i 12: T  (E) i ( + T 8: E E +T  ) 15: T (E) 

Summary (Bottom-Up) • LR is a powerful technique • Generates efficient parsers • Generation tools exit LALR(1) • Bison, yacc, CUP • But some grammars need to be tuned • Shift/Reduce conflicts • Reduce/Reduce conflicts • Efficiency of the generated parser

Summary (Parsing) • Context free grammars provide a natural way to define the syntax of programming languages • Ambiguity may be resolved • Predictive parsing is natural • Good error messages • Natural error recovery • But not expressive enough • But LR bottom-up parsing is more expressible

Abstract Syntax • Intermediate program representation • Defines a tree - Preserves program hierarchy • Generated by the parser • Declared using an (ambiguous) context free grammar (relatively flat) • Not meant for parsing • Keywords and punctuation symbols are not stored (Not relevant once the tree exists) • Big programs can be also handled (possibly via virtual memory)

Semantic Analysis • Requirements related to the “context” in which a construct occurs • Examples • Name resolution • Scoping • Type checking • Escape • Implemented via AST traversals • Guides subsequent compiler phases

Abstract InterpretationStatic analysis • Automatically identify program properties • No user provided loop invariants • Sound but incomplete methods • But can be rather precise • Non-standard interpretation of the program operational semantics • Applications • Compiler optimization • Code quality tools • Identify potential bugs • Prove the absence of runtime errors • Partial correctness

Basic Compiler Phases

Overall Structure

Techniques Studied • Simple code generation • Basic blocks • Global register allocation • Activation records • Object Oriented • Assembler/Linker/Loader

Two Phase SolutionDynamic ProgrammingSethi & Ullman • Bottom-up (labeling) • Compute for every subtree • The minimal number of registers needed (weight) • Top-Down • Generate the code using labeling by preferring “heavier” subtrees (larger labeling)

“Global” Register Allocation • Input: • Sequence of machine code instructions(assembly) • Unbounded number of temporary registers • Output • Sequence of machine code instructions(assembly) • Machine registers • Some MOVE instructions removed • Missing prologue and epilogue

Graph Coloring with Coalescing Build: Construct the interference graph Simplify: Recursively remove non MOVE nodes with less than K neighbors; Push removed nodes into stack Coalesce: Conservatively merge unconstrained MOV related nodes with fewer than K “heavy” neighbors Freeze: Give-Up Coalescing on some low-degree MOV related nodes Potential-Spill: Spill some nodes and remove nodes Push removed nodes into stack Select: Assign actual registers (from simplify/spill stack) Actual-Spill: Spill some potential spills and repeat the process

higher addresses administrative stack pointer frame pointer frame size lower addresses previous frame A Typical Stack Frame argument 2 outgoing parameters argument 1 lexical pointer return address dynamic link registers locals temporaries current frame outgoing parameters argument 2 argument 1 next frame

Heap Memory Management • Part of the runtime system • Utilities for dynamic memory allocation • Utilities for automatic memory reclamation • Garbage Colletion

Garbage Collection • Techniques • Mark and sweep • Copying collection • Reference counting • Modes • Generational • Incremental vs. Stop the world

1000 In mainbefore foo(argv[2]) 996 data segment 992 fp 5000 abcdefgh0 988 sp 984 980 989 988 987 983 979 975

1000 inside foo(argv[2]) 996 data segment 992 5000 abcdefgh0 fp 988 984 980 989 988 sp 987 983 979 975

1000 before strcpy 996 data segment 992 5000 abcdefgh0 fp 988 984 980 989 988 987 983 979 sp 975

1000 inside strcpy 996 data segment 992 5000 abcdefgh0 988 984 980 989 988 987 983 979 975 fp sp

1000 return from strcpy 996 data segment 992 5000 abcdefgh0 fp 988 984 980 989 988 987 983 979 sp 975

1000 Return from foowhere to? 996 data segment 992 5000 abcdefgh0 fp 988 984 980 989 988 sp 987 983 979 975

Recap

Recap

Presentation Transcript

Recap

Recap…

Recap

RECAP

Recap

Recap

Recap

RECAP

Recap

Recap

RECAP

Recap

Recap

RECAP

Recap

Recap

Recap

RECAP

RECAP

Recap