Language Grammars: Definitions and Parsing Techniques

Languages • A Language L over a finite alphabet  is a set of strings of characters from the alphabet. • Examples are : • L = { s | s is a grammatical English sentence} and  = { w | w is a word in the dictionary} (consisting of over a hundred thousand words, each treated as an atomic symbol). • L = { s | s is a string of digits} and  = { 0,1,2,3,4,5,6,7,8,9} • L = { P | P is a string of ASCII characters forming a Java program} and  = the printable ASCII character set.

Grammars A context free grammar G is a 4-tuple : G = ( V,,P,S ) where 1.V is a set of nonterminals (or string variables), each representing a sublanguage from which the variable takes its values. Examples are <noun phrase> which can take on values such as “the big box” and T which can take on string values used to represent products in an algebraic expression. 2. is a finite alphabet. Examples are the English vocabulary (consisting of over a hundred thousand words, each treated as an atomic symbol). Another example is the printable ASCII character set. The binary alphabet consists of {0,1}. The alphabet contains the symbols from which language strings are formed. 3.P is a finite set of productions or rules used to define the sublanguages represented by the nonterminals. In a context free grammar, a rule has the format A  X where A  V and X  ( V  )* . The interpretation is that the strings in the sublanguage represented by A can be constructed according to the format indicated by X. For a terminal character in X, the terminal character is used in the A string and for a variable in X, a string in the sublanguage is substituted for the variable. Examples are <noun phrase>  <determiner> <adj-list> <noun> and T a * T. 4.S is a designated variable (referred to as the start symbol or the head of the language). It represents the language being defined by the grammar G.

Grammars and Derivations Derivations If u,v are strings in ( V  )* , A is in V and A  X is in P, then uAv  uXv , referred to as uAv “derives” uXv by application of the rule A  X. For repeated applications of 0 or more rules, the symbol * is used. Language Definition The language L(G) defined by G is { x | x *, S * x }

Parsing • Given a Grammar G with distinguished nonterminal S and a string X over the alphabet, does S * X? • Parsing attempts to find a sequence of rules by which • S * X

Parse tree for d d . d d d I d I d I • D d D d D d Grammar for Decimal Numbers I  d I I  d I  • D D  d D D  d A parse tree has intermediate nodes for nonterminals, a child node for each RHS character in the production used to replace the nonterminal, a leaf node for each character in the language string produced by the derivation. The language is the set of strings for which there exist parse trees.

A Grammar for Sentences

A Grammar for Sentences Alphabet or Vocabulary

Top down Left to Right Parse Repeat Select a rule to replace the leftmost nonterminal whose right hand side will ultimately generate a prefix of the remaining source.

Top down Left to Right Parse Leftmost character of the sentential form is <Det>. Select the rule <Determiner>  [the] and click <Det> to “expand”.

Top down Left to Right Parse

Lexemes and Tokens • A lexeme is a string of terminal characters belonging to some lexical class such as adjective, determiner, noun, etc. Examples are : • “young” – adjective - a • “the” - determiner • “woman” - noun • A token with a syntactic or lexical code. Examples are : • <“young”,a> • <“the”,d> • <“woman”,n>

Finite state automata and language recognition d I d S · · F D d d Finite state automaton has  = {d,•} , start state S and legal final states I and D. The transition function is represented by above diagram or table below: d • S I F I I D F D D D - Accepts : ddd, d.dd, .ddd Rejects d.dd.d

Top down Left to Right Parse • LL(1) Parsing: • Start with the nonterminal representing the language as the unmatched sentential form • Repeat until source string has been generated or until failure • Let X be the leftmost character • If X is terminal it must the first character of the remaining source (otherwise failure) • If X is nonterminal then the rules for X must not overlap as far as the 1st character generated by a rule. • Select the rule which generates (in 1 step or more) a 1st character matching the next source character and apply this rule.

Example Parse LL(1) parse table • S  NvNP • P   • P  pN • N  dAn • A  aA • A   Grammar

LL(1) Parsing FIRST: Define First(X) as the set of characters which can begin a string derived from grammar symbol X Follow: Define Follow(X) as the set of characters which can follow grammar symbol X in a string derived from the start symbol S Pseudo Code First: If X is a terminal then First(X) = {X} If X is a nonterminal and X → λ then add λ to First(X) If X → X1X2..XkXk+1..Xn with λ in First(Xi) , 1 <= i <= k, then add First(Xk+1) to First(X) and if λ in First(Xi) , 1 <= i <= n, add λ to First(X) Follow: $ is in Follow(S) If A → αBβ with β <> λ, then add First(β) – { λ} to Follow(B) If A → αB or A → αBβ with λ in First(β), then add Follow(A) to Follow(B) LL(1) parse table Let T be a table with rows for nonterminals and columns for terminals. If Ri A → α and t in First(α) then enter i in T(A,t). If Ri A → α and λ in First(α) and t in Follow(A) then enter i in T(A,t).

LL(1) Parsing – Computation of First & Follow

Example of First & Follow LL(1) parse table

Computation of First And Follow

Computation of Rule Signatuesand LL(1) Parse Table

LL(1) parse – Example 2 : the dog bit the young boy in the leg  dnvdanpdn (tokens generated by lexical analyzer)

Tabular Parsing Tabular Parsing Primitive Item : X . for X  V  Phrase Item : A  X . Y for some A  XY  P. • Source string z = z1 z2 . . zn where zI • Table Ti,j contains any • primitive item X . • for which X  zj . . zj+i-1 and any • phrase item A  X . Y for which • X  zj . . zj+i-1 .

Tabular Parsing Example N. dAn. dA.n d.An d.n d. A. a. . n.

Tabular Parsing Example

Tabular Parsing Pseudo Code Initialize Table for j = 1 to n, add z1 . to Tij . Process Rest of Table for j = n down to 1 Process jth column for i = 1 to n+j-1 repeat for each completed phrase item A  Z. in Ti,j add primitive item A. to Ti,j for each primitive item X. in Ti,j if A  XY then add initial phrase item A  X . Y to Ti,j for each partial phrase item A  X.UY in TI,j , Process( i,j, A  X.UY) until no change in Ti,j

Tabular Parsing Pseudo Code Process( i,j, A  X.UY ) Let k = i+j Process kth column For p = 1 to n+k-1 for each primitive item U. in Tp,k add extended phrase item A  XU.Y to TI+p,j

Tabular Parsing Example An attribute grammar for converting Infix to Postfix Note: Grammar is not LL(1)

Tabular Parsing Based Translation

Tabular Parsing Example 3

Language Grammars: Definitions and Parsing Techniques

Language Grammars: Definitions and Parsing Techniques

Presentation Transcript

LANGUAGES

Languages

LANGUAGES

Languages

Languages .

Languages

1.6 Machine Languages, Assembly Languages, and High-level Languages

Languages

Languages

Languages

Intonation languages and tone languages

Languages

Languages

Languages

Languages

Languages