1 / 40

Recap

Recap. Roman Manevich Mooly Sagiv. Outline. Subjects Studied Questions & Answers. Lexical Analysis (Scanning). input program text (file) output sequence of tokens Read input file Identify language keywords and standard identifiers Handle include files and macros Count line numbers

Télécharger la présentation

Recap

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recap Roman Manevich Mooly Sagiv

  2. Outline • Subjects Studied • Questions & Answers

  3. Lexical Analysis (Scanning) • input • program text (file) • output • sequence of tokens • Read input file • Identify language keywords and standard identifiers • Handle include files and macros • Count line numbers • Remove whitespaces • Report illegal symbols • [Produce symbol table]

  4. Summary • For most programming languages lexical analyzers can be easily constructed automatically • Exceptions: • Fortran • PL/1 • Lex/Flex/Jlex are useful beyond compilers

  5. Syntax Analysis (Parsing) • input • Sequence of tokens • output • Abstract Syntax Tree • Report syntax errors • unbalanced parenthesizes • [Create “symbol-table” ] • [Create pretty-printed version of the program] • In some cases the tree need not be generated (one-pass compilers)

  6. Pushdown Automaton input u t w $ V control parser-table $ stack

  7. Pushdown automata Deterministic Report an error as soon as the input is not a prefix of a valid program Not usable for all context free grammars context free grammar parser tokens Efficient Parsers cup “Ambiguity errors” parse tree

  8. Top-Down (Predictive Parsing) LL Construct parse tree in a top-down matter Find the leftmost derivation For every non-terminal and token predict the next production Preorder tree traversal Bottom-Up LR Construct parse tree in a bottom-up manner Find the rightmost derivation in a reverse order For every potential right hand side and token decide when a production is found Postorder tree traversal Kinds of Parsers

  9. 2 3 4 5 Top-Down Parsing 1 input t1 t2

  10. 3 2 1 Bottom-Up Parsing input t1 t2 t4 t5 t6 t7 t8

  11. Example Grammar for Predictive LL Top-Down Parsing expression  digit | ‘(‘ expression operator expression ‘)’ operator  ‘+’ | ‘*’ digit  ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’

  12. Example Grammar for Predictive LL Top-Down Parsing expression  digit | ‘(‘ expression operator expression ‘)’ operator  ‘+’ | ‘*’ digit  ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’

  13. static int Parse_Expression(Expression **expr_p){ Expression *expr = *expr_p = new_expression() ; /* try to parse a digit */ if (Token.class == DIGIT) { expr->type=‘D’; expr->value=Token.repr –’0’; get_next_token(); return 1; } /* try parse parenthesized expression */ if (Token.class == ‘(‘) { expr->type=‘P’; get_next_token(); if (!Parse_Expression(&expr->left)) Error(“missing expression”); if (!Parse_Operator(&expr->oper)) Error(“missing operator”); if (Token.class != ‘)’) Error(“missing )”); get_next_token(); return 1; } return 0; }

  14. Parsing Expressions • Try every alternative production • For P  A1 A2 … An | B1 B2 … Bm • If A1 succeeds • Call A2 • If A2 succeeds • Call A3 • If A2 fails report an error • Otherwise try B1 • Recursive descent parsing • Can be applied for certain grammars • Generalization: LL1 parsing

  15. int P(...) { /* try parse the alternative P  A1 A2 ... An */ if (A1(...)) { if (!A2()) Error(“Missing A2”); if (!A3()) Error(“Missing A3”); .. if (!An()) Error(Missing An”); return 1; } /* try parse the alternative P  B1 B2 ... Bm */ if (B1(...)) { if (!B2()) Error(“Missing B2”); if (!B3()) Error(“Missing B3”); .. if (!Bm()) Error(Missing Bm”); return 1; } return 0;

  16. Predictive Parser for Arithmetic Expressions • Grammar • C-code? • E  E + T • E  T • T  T * F • T  F • 5 F  id • 6 F  (E)

  17. Input A context free grammar A stream of tokens Output A syntax tree or error Method Construct parse tree in a bottom-up manner Find the rightmost derivation in (reversed order) For every potential right hand side and token decide when a production is found Report an error as soon as the input is not a prefix of valid program Bottom-Up Syntax Analysis

  18. Constructing LR(0) parsing table • Add a production S’  S$ • Construct a finite automaton accepting “valid stack symbols” • States are set of items A • The states of the automaton becomes the states of parsing-table • Determine shift operations • Determine goto operations • Determine reduce operations • Report an error when conflicts arise

  19. $ 2: S E $  E 14: T (E ) 7: E E +T 1: S E$ 4: E T 6: E E +T 10: T i 12: T  (E) 2: S E $ 7: E E +T T E 5: E T i 11: T i + ( i 13: T (E) 4: E T 6: E E +T 10: T i 12: T  (E) 7: E E +T 10: T i 12: T  (E) i ( + T 8: E E +T  ) 15: T (E) 

  20. $ 2: S E $  E 14: T (E ) 7: E E +T Parsing “(i)$” 1: S E$ 4: E T 6: E E +T 10: T i 12: T  (E) 2: S E $ 7: E E +T T E 5: E T i 11: T i + ( i 13: T (E) 4: E T 6: E E +T 10: T i 12: T  (E) 7: E E +T 10: T i 12: T  (E) i ( + T 8: E E +T  ) 15: T (E) 

  21. Summary (Bottom-Up) • LR is a powerful technique • Generates efficient parsers • Generation tools exit LALR(1) • Bison, yacc, CUP • But some grammars need to be tuned • Shift/Reduce conflicts • Reduce/Reduce conflicts • Efficiency of the generated parser

  22. Summary (Parsing) • Context free grammars provide a natural way to define the syntax of programming languages • Ambiguity may be resolved • Predictive parsing is natural • Good error messages • Natural error recovery • But not expressive enough • But LR bottom-up parsing is more expressible

  23. Abstract Syntax • Intermediate program representation • Defines a tree - Preserves program hierarchy • Generated by the parser • Declared using an (ambiguous) context free grammar (relatively flat) • Not meant for parsing • Keywords and punctuation symbols are not stored (Not relevant once the tree exists) • Big programs can be also handled (possibly via virtual memory)

  24. Semantic Analysis • Requirements related to the “context” in which a construct occurs • Examples • Name resolution • Scoping • Type checking • Escape • Implemented via AST traversals • Guides subsequent compiler phases

  25. Abstract InterpretationStatic analysis • Automatically identify program properties • No user provided loop invariants • Sound but incomplete methods • But can be rather precise • Non-standard interpretation of the program operational semantics • Applications • Compiler optimization • Code quality tools • Identify potential bugs • Prove the absence of runtime errors • Partial correctness

  26. Basic Compiler Phases

  27. Overall Structure

  28. Techniques Studied • Simple code generation • Basic blocks • Global register allocation • Activation records • Object Oriented • Assembler/Linker/Loader

  29. Two Phase SolutionDynamic ProgrammingSethi & Ullman • Bottom-up (labeling) • Compute for every subtree • The minimal number of registers needed (weight) • Top-Down • Generate the code using labeling by preferring “heavier” subtrees (larger labeling)

  30. “Global” Register Allocation • Input: • Sequence of machine code instructions(assembly) • Unbounded number of temporary registers • Output • Sequence of machine code instructions(assembly) • Machine registers • Some MOVE instructions removed • Missing prologue and epilogue

  31. Graph Coloring with Coalescing Build: Construct the interference graph Simplify: Recursively remove non MOVE nodes with less than K neighbors; Push removed nodes into stack Coalesce: Conservatively merge unconstrained MOV related nodes with fewer than K “heavy” neighbors Freeze: Give-Up Coalescing on some low-degree MOV related nodes Potential-Spill: Spill some nodes and remove nodes Push removed nodes into stack Select: Assign actual registers (from simplify/spill stack) Actual-Spill: Spill some potential spills and repeat the process

  32. higher addresses administrative stack pointer frame pointer frame size lower addresses previous frame A Typical Stack Frame argument 2 outgoing parameters argument 1 lexical pointer return address dynamic link registers locals temporaries current frame outgoing parameters argument 2 argument 1 next frame

  33. Heap Memory Management • Part of the runtime system • Utilities for dynamic memory allocation • Utilities for automatic memory reclamation • Garbage Colletion

  34. Garbage Collection • Techniques • Mark and sweep • Copying collection • Reference counting • Modes • Generational • Incremental vs. Stop the world

  35. 1000 In mainbefore foo(argv[2]) 996 data segment 992 fp 5000 abcdefgh0 988 sp 984 980 989 988 987 983 979 975

  36. 1000 inside foo(argv[2]) 996 data segment 992 5000 abcdefgh0 fp 988 984 980 989 988 sp 987 983 979 975

  37. 1000 before strcpy 996 data segment 992 5000 abcdefgh0 fp 988 984 980 989 988 987 983 979 sp 975

  38. 1000 inside strcpy 996 data segment 992 5000 abcdefgh0 988 984 980 989 988 987 983 979 975 fp sp

  39. 1000 return from strcpy 996 data segment 992 5000 abcdefgh0 fp 988 984 980 989 988 987 983 979 sp 975

  40. 1000 Return from foowhere to? 996 data segment 992 5000 abcdefgh0 fp 988 984 980 989 988 sp 987 983 979 975

More Related