Using Yacc
E N D
Presentation Transcript
Introduction • Grammar • CFG • Recursive Rules • Shift/Reduce Parsing • See Figure 3-2. • LALR(1) • What Yacc Cannot Parse • It cannot deal with ambiguous grammars • If you give it one that it cannot handle it will tell you, so there is no problem of overcomplex parsers silently failing.
The Structure of a Yacc grammar (Definition section) %% (Rules section) %% (User subroutines section)
The Definition Section • The definition section includes declarations of the tokens used in the grammar, the types of values used on the parser stack, and other odds and ends. • You don’t have to specify the number of the token. • It can also include a literal block, C code enclosed in %{ %}
The Rules Section • Since ASCII keyboards don’t have a key, we use a colon between the left- and right-hand sides of a rule, and we put a semicolon at the end of each rule • The symbol on the left-hand side of the first rule in the grammar is normally the start symbol, though you can use a %start declaration in the definition section to override that.
Symbol Values and Actions • Every symbol in a yacc parser has a value • The semantic record • A number, a literal text string, …. • Nonterminal symbols can have any values you want, created by code in the parser • In real parsers, the values of different symbols use different data types • int, double, char *, …. • If you have multiple value types, you have to list all the value types used in a parser so that yacc can create a C union typedef called YYSTYPE to contain them • By default, yacc makes all values of type int
Symbol Values and Actions • $$: • The value of the LHS symbol • The semantic routine should give value to it. • $i: • The value of the i-th symbol in the RHS of the production • Terminal symbol: The value was given by the lex. • Nonterminal symbol: The value was given previously by an execution of some semantic routine.
The Lexer • The parser is the higher level routine, and calls the lexer yylex() • Yacc defines the token names in the parser as C preprocessor names in y.tab.h • See ch3-01.l • Whenever the lexer returns a token to the parser, if the token has an associated value, the lexer must store the value in yylval before returning • In the first example, we explicitly declare yylval. • In more complex parsers, yacc defines yylval as a union and puts the definition in y.tab.h
Compiling and Running a Simple Parser • See P. 59. • Note that you cannot exchange the order of the executions of yacc and lex.
Arithmetic Expressions and Ambiguity • You may input an ambiguity grammar to test Yacc • There are 16 shift/reduce conflicts in the program of P.60 • There are two ways to specify precedence and associativity in a grammar implicitly and explicitly • To specify them implicitly, • Rewrite the grammar using separate non-terminal symbols for each precedence level • See P.62 • To specify them explicitly • Add some rule to the definition section %left ‘+’ ‘-’ %left ‘*’ ‘/’ %nonassoc UMINUS
Exercise • Using the expression rules shown in P.62 of “lex and yacc” to write a yacc program. • Hint: ch3-01.y and ch3-01.l • Please list your source code and execution results.
When Not to Use Precedence Rules • You can use precedence rules to fix any shift/reduce conflict that occurs in the grammar • We recommend that you use precedence in only two situations • In expression grammars • To resolve the “dangling else” conflict in grammars for if-then-else language constructs
Variables and Typed Tokens • See Example 3-2, P.64 • Symbol Values and %union