Parsing and Ambiguity Problem in Context-Free Grammars
130 likes | 231 Vues
Determine if a string x is in language L(G) of CFG G = (V, Σ, R, S) and find a derivation S * x. Study parse trees for derivations in the parsing problem. Understand and resolve ambiguity in CFGs. Explore leftmost derivations and parse trees. Detect and manage ambiguity issues in grammars.
Parsing and Ambiguity Problem in Context-Free Grammars
E N D
Presentation Transcript
Given a string x and a CFG G = (V, Σ, R, S), determine whether xL(G) and if xL(G), find a derivation S * x. This problem is called Parsing. To solve the parsing problem, we first study the parse tree.
The parse tree is the graph representation of a derivation, which can be defined in the following way: • A vertex with a label which is a nondeterminal • symbol is a parse tree. (2) If A → y1y2 … yn is a rule in R, then the tree A y2 y1 . . . yn is a parse tree.
(3) If A → ε is a rule in R, then A ε is a parse tree. (4) If a parse tree has a leaf which is the root of another parse tree, then their union is a parse tree. (5) Nothing else is a parse tree.
Each derivation has a parse tree. Consider CFG G = ({S}, {a, b, c}, R, S) where R = {S → SbS | ScS | a}. The derivation S SbS SbScS abScS abSca abaca has the following parse tree. S S S b c S S a a a
But, a parse tree may be owned by several derivations. For example, the derivation S SbS SbScS SbSca abSca abaca Has the same parse tree as above.
Leftmost derivation A derivation S * y is called a leftmost derivation and write S * y if y is obtained from S by a sequence of steps at each of which apply a rule to the leftmost nonterminal symbol. left left S SbS abS abScS abacS abaca Each parse tree uniquely corresponds exactly one leftmost derivation.
The parse tree for S * x in L(G) has at least |x| leaves; their concatenation is x.
ambiguous A string x in L(G) may have two or more parse tree witness S * x. The grammar G is said to be ambiguous if such a case exists. CFG G = ({S}, {a, b, c}, R, S) where R = {S → SbS | ScS | a} is ambiguous because abaca has two parse trees. S S S c S b S S S b S a c S a S a a a a
How to remove ambiguity is an important issue in theory of compiler. However, determine whether a CFG is ambiguous is undecidable. CFG G = ({S, A}, {0,1}, R, S) where R = {S → A00, A → ε | AA | 0 | 1} is ambiguous because 00 has two parse trees: S S A 0 0 A 0 0 A A ε εε
The ambiguity for this CFG can be removed by removing rule A → ε . CFG G = ({S, A}, {0,1}, R, S) where R = {S → 00 | A00, A → AA | 0 | 1}
Parsing Algorithm A string w in (V U Σ)* is a left sentential form if S * w. left The leftmost graph g(G) for CFG G is defined as follows: (a) vertex set = the set of all left sentential forms (b) there exists directed edge (x, y) if x y. left Usually, g(G) is an infinite digraph.
If no rule in form A → ε exists, then g(G) is nondecreasing and hence a depth-first search or breath-first search would solve the parsing problem.