1 / 35

Syntax

Syntax. Outline. Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract Syntax Trees Ambiguous Grammar Associativity and Precedence EBNFs and Syntax Diagrams. Programming Language Specification.

latona
Télécharger la présentation

Syntax

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Syntax

  2. Outline • Programming Language Specification • Lexical Structure of PLs • Syntactic Structure of PLs • Context-Free Grammar / BNF • Parse Trees • Abstract Syntax Trees • Ambiguous Grammar • Associativity and Precedence • EBNFs and Syntax Diagrams

  3. Programming Language Specification • PLs require precise definitions (i.e. no ambiguity) • Language form (Syntax) • Language meaning (Semantics) • Consequently, PLs are specified using formal notation: • Formal syntax • Tokens • Grammar • Formal semantics • Operational • Denotational • Axiomatic

  4. Lexical Structure of PLs

  5. Lexical Structure of PLs (cont.) • Main task of scanner: identify tokens • Basic building blocks of programs • E.g. keywords, identifiers, numbers, punctuation marks • Lexeme – an instance of a token. • One can think of programs as strings of lexemes rather than of characters • A token of a language is a category of its lexemes (or instances) • Some tokens can have one or more lexemes • E.g. keyword, identifier, number • In some cases, a token has only one single possible lexeme • E.g.equal_sign, plus_op, mult_op

  6. Lexical Structure of PLs (cont.) • Consider the following Java statement: index = 2 * count + 17 ; • The lexemes and tokens of this statement are:

  7. Lexical Structure of PLs (cont.) • Tokens in a programming language are described formally by regular expressions. • Regular expressions – descriptions of patterns of characters • Regular expression operations • Basic operations • Concatenation item sequencing • Choice or selection | • Repetition * • Grouping ( ) • Additional operations • One or more repetitions + • Range of characters [ - ] • Optional ? • Any character .

  8. Lexical Structure of PLs (cont.) • Regular expression examples • (a|b)*c • String that match include ababaac, aac, bbc, c, and babc • [0-9]+ • Integer constants with one or more digits • [0-9]+(\.[0-9]+)? • Floating-point literals • [a-zA-Z][a-zA-Z0-9_]* • Identifiers

  9. Lexical Structure of PLs (cont.) • Scanners generators: • lex, flex • ANTLR – Another Tool for Language Recognition • These programs can be used to generate a program (i.e., a scanner) that can extract tokens from a stream of characters. • Many PLs provide good support for regular expressions – Java, C#, Perl, Ruby, … • Support for regular expressions in Java • java.util.regex package • split() method of String class

  10. Syntactic Structure of PLs • Specifying the formof a programming language • Tokens • Regular Expression • Syntax – organization of tokens • Context-Free Grammars (CFGs)

  11. Context-Free Grammar • Context-free grammars (CFGs) are used to describe the syntax of PLs. • Proposed by Noam Chomsky – a noted linguist • BNF (Backus-Naur Form) is a notation for describing syntax. • Proposed by John Backus and Peter Naur • CFG and BNF are nearly identical and are used interchangeably. • BNF is a metalanguage for programming languages. • A metalanguage is a language that is used to describe another language.

  12. Context-Free Grammar (cont.) • CFG or BNF consists of a series of rules or productions. • Productions are made up of: • Nonterminals – structures that are broken down into further structures • Terminals – things that cannot be broken down • Metasymbols • Symbols that are part of CFG/BNF • These are not actual symbols in the language being described • Sometimes, a metasymbol is also an actual symbol in a language • One of the nonterminals is designated as the start symbol. • The start symbol stands for the entire structure being defined.

  13. Context-Free Grammar (cont.) • CFG/BNF Example (Figure 4.2, page 83) (1) sentence→noun-phrase verb-phrase . (2) noun-phrase→articlenoun (3) article →a | the (4) noun →girl | dog (5) verb-phrase→verb noun-phrase (6) verb→sees | pets

  14. Context-Free Grammar (cont.) • The language of a CFG is the set of strings of terminals that can be generated from the start symbol by a derivation: sentencenoun-phraseverb-phrase . (rule 1) articlenounverb-phrase . (rule 2) thenounverb-phrase . (rule 3) thegirlverb-phrase . (rule 4) thegirlverbnoun-phrase . (rule 5) thegirlseesnoun-phrase . (rule 6) the girl seesarticlenoun . (rule 2) the girl sees anoun . (rule 3) the girl sees a dog . (rule 4)

  15. Context-Free Grammar (cont.) • Derivation – Generating sentences of the language through a sequence of applications of rules (or productions), beginning with a special nonterminal called the start symbol. • Leftmost derivation – The replaced nonterminal is always the leftmost nonterminal. • Rightmost derivation – The replaced nonterminal is always the rightmost nonterminal. • A derivation may be neither leftmost nor rightmost. Derivation order has no effect on the language generated by a grammar.

  16. Context-Free Grammar (cont.) • A grammar for a small language <program> → begin <stmt_list> end <stmt_list> → <stmt> | <stmt> ; <stmt_list> <stmt> → <var> := <expr> <expr> → <var> + <var> | <var> - <var> | <var> <var> → A | B | C • Derive the following program: begin A := B + C ; B := C end • Is the language defined by this grammar finite or infinite?

  17. Context-Free Grammar (cont.) • Left recursive rule – A BNF rule is left recursive if the left-hand side (LHS) appears at the beginning of its right-hand side (RHS). • Right recursive rule – A BNF rule is right recursive if the LHS appears at the right end of the RHS. • Examples: number®number digit |digit digit ®0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 expr®expr+expr |exprexpr | (expr ) | number • Uses of recursion in BNF: • to show repetition • to describe complex structures

  18. Parse Trees • A parse tree is a graphical representation of hierarchical syntactic structure of sentences. It describes graphically the replacement process in a derivation. • A parse tree is labeled by nonterminals at interior nodes and terminals at leaves. • A parse tree better expresses the structure inherent in a derivation.

  19. Parse Trees (cont.) Problem 1: <assign> → <id> := <expr> <expr> → <id> + <expr> | <id> * <expr> | ( <expr> ) | <id> <id> → A | B | C Show a leftmost derivation and a parse tree for each of the following statements: A := A + ( B * C ) A := B + C + A A := A * ( B + C ) A := B * ( C * ( A + B ) )

  20. Parse Trees (cont.) Problem 2: Describe, in English, the language defined by the following grammar: <S> → <A> <B> <C> <A> → a <A> | a <B> → b <B> | b <C> → c <C> | c Problem 3: Consider the following grammar: <S> → <A> a <B> b <A> → <A> b | b <B> → a <B> | a Which of the following sentences are in the language generated by this grammar? baab bbbab bbaaaaa bbaab

  21. Parse Trees (cont.) Problem 4: Consider the following grammar: <S> → a <S> c <B> <S> → <A> | b <A> → c <A> | c <B> → d | <A> Which of the following sentences are in the language generated by the grammar? abcd acccbd acccbcc acd accc

  22. Abstract Syntax Trees • Parse trees are still too detailed in their structure, since every step in a derivation is expressed as nodes • Abstract Syntax Tree or (just syntax tree) shows the essential structure of a parse tree. • AST is more compact than the corresponding parse tree • An (abstract) syntax tree condenses a parse tree to its essential structure • Language designers and translator writers are most interested in abstract syntax. • A programmer is most interested in concrete syntax • Examples on the next two slides…

  23. Abstract Syntax Trees (cont.) Parse Tree Corresponding AST

  24. Abstract Syntax Trees (cont.) Parse Tree Corresponding AST

  25. Ambiguous Grammars • A grammar is ambiguous if it is possible to construct two or more distinct parse trees for the same string • Example: • Grammar: expr®expr+expr |exprexpr | (expr ) | NUMBER • Expression: 2 + 3 * 4 • Parse trees – ambiguity in operator precedence

  26. Ambiguous Grammars (cont.) • Another Example: • Grammar: expr®expr+expr |exprexpr | (expr ) | NUMBER • Expression: 2 - 3 - 4 • Parse trees – ambiguity in operator associativity

  27. Ambiguous Grammars (cont.) • Ways to resolve ambiguities in a grammar • Revise grammar – desired approach • Provide disambiguating rule (semantic help) • Revising grammar to address precedence and associativity ambiguities • Do not write rules that allow a parse tree to grow on both left and right sides • Use left recursive rules for left-associative operators • Use right recursive rules for right-associative operators • Add new rules that establish “precedence cascade” between rules to specify precedence • Make sure operators with higher precedence appear lower in the cascade of rules • Revised grammar expr®expr+term | term term®term*factor | factor factor ®(expr )| NUMBER

  28. Ambiguous Grammars (cont.) Problem 1: <expr> → <expr> + <expr> | <expr> - <expr> | <expr> * <expr> | <expr> / <expr> | ( <expr> ) | NUMBER NUMBER = [0-9]+ Show that this grammar is ambiguous by constructing two distinct parse trees for each of the following expressions: 30 + 5 + 2 30 – 5 – 2 30 * 5 * 2 30 / 5 / 2 30 + 5 * 2

  29. Ambiguous Grammars (cont.) • Revised unambiguous grammar <expr> → <expr> + <term> | <expr> - <term> | <term> <term> → <term> * <factor> | <term> / <factor> | <factor> <factor> → ( <expr> ) | NUMBER NUMBER = [0-9]+

  30. Ambiguous Grammars (cont.) Problem 2: Show that the following grammar is ambiguous: <S> → <A> <A> → <A> + <A> | <id> <id> → a | b | c

  31. Ambiguous Grammars (cont.) • Are there other alternatives to resolving ambiguities? • Yes, but they change the language! • Fully-parenthesized expressions: expr®(expr+expr ) |(expr-expr )| NUMBER • Prefix expressions: expr®+exprexpr |-exprexpr | NUMBER

  32. Extended BNF • Adds new metasymbols (or operations) to BNF to enhance readability and writability. • These new extensions do not enhance the descriptive power of BNF. • It facilitates development of parsing tools based on an approach called Recursive-Descent Parsing. • New metasymbols added to EBNF: • { } zero or more repetitions • [ ] optional parts • ( | ) multiple-choice

  33. Extended BNF (cont.) • Examples: BNF: <number> → <number> <digit> | <digit> EBNF: <number> → <digit> {<digit>} BNF: <expr> → <expr> + <term> | <term> EBNF: <expr> → <term> {+ <term>} BNF: <expr> → <term> ^ <expr> | <term> EBNF: <expr> → <term> [^ <expr>] BNF: <selection> → if <logic-expr> then <statement> | if <logic-expr> then <statement> else <statement> EBNF <selection> →if <logic-expr> then <statement> [else <statement>] BNF: <for-stmt> → for <var> := <expr> to <expr> do <statement> | for <vat> := <expr> downto <expr> do <statement> EBNF: <for-stmt> → for <var> := <expr> (to | downto) <expr> do <stmt>

  34. Extended BNF (cont.) • More examples: BNF: <expr> → <expr> + <term> | <term> <term> → <term> * <power> | <term> / <power> | <term> % <power> | <power> <power> → <factor> ^ <power> | factor <factor> → (<expr>) | NUMBER NUMBER = [0-9]+ EBNF: <expr> → <term> {+ <term>} <term> → <power> { * <power> | / <power> | % <power> } <power> → <factor> [^ <power>] <factor> → (<expr>) | NUMBER NUMBER = [0-9]+

  35. Syntax Diagrams • A graphical representation for a grammar rule • An alternative to EBNF • Circle or ovals for terminals • Squares or rectangles for nonterminals • Terminals and nonterminals are connected with lines and arrows • Visually appealing but takes up space • Rarely seen any more: EBNF is much more compact

More Related