1 / 73

Introduction to Context-Free Languages and Context-Free Grammars

Learn about context-free languages, context-free grammars, derivations, parse trees, and ambiguity in the context of syntax analysis. Explore the concept of push-down automata and its role in recognizing context-free languages.

jdaniels
Télécharger la présentation

Introduction to Context-Free Languages and Context-Free Grammars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cairo University FCI Compilers CS419 Lecture11: Syntax Analysis: Context Free Languages - Context Free Grammars - Derivations – Parse Trees - Ambiguity Push-Down Automata (PDA) Dr. HussienSharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University Welcome to a journey to

  2. Today • Grammar • Definition as 4-tuple • Regular Grammars (RGs) … left-linear vs. right-linear • Context Free Grammars (CFGs) • Context Sensitive Grammars (CSGs) • Context Free Languages (CFLs) … examples • Parse Trees • Derivations … leftmost vs. rightmost • Ambiguity and Disambiguation • Grammar Simplification FCI-CU-EG

  3. Regular Languages

  4. Context-Free Languages Regular Languages

  5. CFG & PDA

  6. Context-Free Languages Context-Free Grammars Pushdown Automata stack automaton

  7. Pushdown Automaton -- PDA tape tape head stack head finite control stack

  8. Pushdown Automaton -- PDA Input String Stack States Costas Busch - RPI

  9. What is a Grammar • A grammar is a precise description of a formal language. • It describes what possible sequence of symbols/strings constitute valid words or sentences in that language • Natural Formal Languages: • Arabic, English, French, Spanish … etc • Programming Languages: • C, C++, Java, C#, HTML, XML …

  10. What is a Grammar • A grammar G <N, Σ ,P, S> consists of the following components: • A finite set N of non-terminal symbols or variables. • A finite set Σ of terminal symbols that are disjoint from N. • A finite set P of production rules of the form (Σ U N)* N (Σ U N)*→ (Σ U N)*where * is the Kleene star operator and U denotes the set union. Each production rule maps from one string of symbols to another where the left hand side contains at least one non terminalsymbol.  • A distinguished start symbol S ∈ N.

  11. Regular languages • A language is said to be a regular language if it is generated by a regular grammar. • A grammar is said to be regular if it's either right-linear or left-linear. • Specifically, a grammar  G <N, Σ ,P, S> is said to be: • right-linearif each of its production rules is either in the form A → xBor A →x, • left-linear if each of its production rules is either in the form A → Bxor A → x, • Where: • A and B are non terminal symbols in N and, • x is a string of terminal symbols in Σ*.

  12. Example • Let A={a,b,c}, then the grammar for the A* language can be described by the following production rules: S→  S→aS S→bS S→cS • How do we know that this grammar describes the language A*? We must be able to describe each string of the language in terms of the grammar rules. • Prove that the string aacbis in A*???

  13. Example • If A={a,b,c}, and the production rules is the set P the grammar G=<N,T,S,P> ≡ <{S,A,B}, {a,b,c}, S, P>, where P ≡ S→AB A→  |aA B→  |bB. • Let us derive the string aab: S⇒AB⇒aAB⇒aaAB⇒aaB⇒aabB⇒aab. • Note: that the language can have more than one grammar. So we should not be surprised when two people come up with two different grammars for the same language.

  14. Combining grammars Suppose M and N are languages whose grammars have disjoint sets of non-terminals. Suppose also that the start symbols for the grammars M and N are A and B respectively. We can obtain the following new languages and grammars: Union Rule: the language M ∪ N starts with the production rule S → A | B . Product Rule: the language M ∙ N starts with the production S → A B. Closure Rule: the language M* starts with the production S →AS | .

  15. Context-free languages • A language is said to be context-free if it is generated by a context-free grammar (CFG). • A grammar G <N, Σ, P, S> is context-free if the production rules are of the form N →(N U Σ)*. • Unlike regular grammars, the right hand sidesof the production rules in CFGs are unrestrictedand can be any combination of terminals and non terminals. • Regular languages (RLs) are subsets of context free languages (CFLs). • Things that cannot be expressed by regular grammars, but needed in Parsing of CFLs: • Palindromes. • Balanced brackets. • Counting!!

  16. CFG • A context-free grammar is a notation for defining context free languages. • It is more powerful than finite automata or REs, but still cannot define all possible languages. • Useful for nested structures, e.g., parentheses in programming languages. • Basic idea is to use “variables” (non-terminals) to stand for sets of strings. • These variables are defined recursively, in terms of one another.

  17. CFG • CFG is used to generate the strings belonging to CFL. • Each production has the form A → w, where A is a nonterminal and w is a string of terminals and non-terminals. • Any non-terminal can be expanded out to any of its productions at any point. • Language of a CFG: set of strings of terminals that can be derived from its start symbol • Pushdown Automata (PDA) is the automata capable of accepting languages defined by CFGs.

  18. CFGs: Alternate Definition Many textbooks use different symbols and terms to describe CFG’s G = (V, S, P, S) V = variables a finite set S = alphabet or terminals a finite set P = productions a finite set S = start variable SV Productions’ form, where AV, a(VS)*: • A  a

  19. Definition: Context-Free Grammars Grammar Variables Terminal symbols Start variables Productions of the form: is string of variables and terminals

  20. CSG • A context-sensitive grammar is a notation for defining context sensitive languages. • Each production has the form wAx → wyx • where w and x are strings of terminals and non-terminals and y is a string of terminals • The productions give rules saying "if you see Ain a given context, you may replace A by the string y

  21. CSG Example

  22. CFGs & CFLs: Example 1 {anbn | n0} One of our canonical non-RLs. S e | a S b Formally: G = ({S}, {a,b}, {S e, S a S b}, S)

  23. ? ? CFGs & CFLs: Example 2 {ambncm+n | m,n0} Rewrite as {ambncncm | m,n0}: S  S’ | a S c S’  e | b S’ c Derivation Example: a4b3c7

  24. CFGs & CFLs: Non-Example {anbncn | n0} It doesn’t belong to CFLs. It can’t be described by CFG. Intuition: Can count to n, then can count down from n, but forgetting n after that. • i.e., a stack as a counter. • Will see this when using a machine corresponding to CFGs.

  25. Parsing • Parsing using CFG means categorizing the statements of a language into categories defined by the CFG. • Parsing can be expressed using a special type of graph called Trees where no cycles exist. • A parse tree is the graph representation of a derivation. • Programmatically; Parse tree can be represented as a dynamic data structure using a single root node. Dr. Hussien M. Sharaf

  26. Parse tree • A vertex with a label which is a Non-terminal symbol is a parse tree. (2) If A → y1 y2 … yn is a rule in R, then the tree A y2 y1 . . . yn is a parse tree. Dr. Hussien M. Sharaf

  27. CFG: S → (S) S → SS S → є Derivations • Thelanguage described by a CFG is the set of strings that can be derivedfrom the start symbol using the rules of the grammar. • At each step, we choose a non-terminal to replace. S(S) (SS) ((S)S) (( )S) (( )(S)) (( )((S))) (( )(( ))) sentential form derivation This example demonstrates a leftmost derivation : one where we always expand the leftmost non-terminal in the sentential form.

  28. Derivations Definition: v is one-step derivable from u, written u  v, if: • u = xz • v = xz •  in R Definition: v is derivablefrom u, written u * v, if: There is a chain of one-step derivations of the form: u  u1  u2  …  v

  29. Derivations Definition:Given a context-free grammar G = (, NT, R, S), the language generated or derived from G is the set: L(G) = {w : } S * w Definition:A language L is context-free if there is a context-free grammar G = (, NT, R, S), such that L is generated from G

  30. Derivation • We derive strings in the language of a CFG by starting with the start symbol, and repeatedly replacing some variable A by the right side of oneof its productions. • Example: • S → aSb • S → ab • Same grammar using (or ‘|’) • S → aSb | ab

  31. Derivation • CFG: • S → aSb • S → ab • Derivation example for “aabb” • Using S → aSb generates uncompleted string that still has a non- terminal S. • Then using S → abto replace the inner S • Generates “aabb” • S aSb aabb……[Successful derivation of aabb]

  32. Derivation-Example : Palindrome • Describe palindrome of a’s and b’s using CFG • 1] S → aSa 2] S → bSb • 3] S → Λ • Derive “baab” from the above grammar. • S → bSb [by 2] → baSab [by 1] → baab [by 3]

  33. CFG -Example : Even-Palindrome • i.e. {Λ, ab, abbaabba,… } • S → aSa| bSb| Λ Derive abaaba S a S a b S b a S a Λ Can you modify this grammar to accept odd-length palindromes?

  34. CFG – Example • Describe anything (a+b)* using CFG 1] S → Λ 2] S → Y 3] Y→ aY 4] Y → bY 5] Y →a 6] Y→ b • Derive “aab” from the above grammar. • S → Y [by 1] Y → aY [by 3] Y → aaY [by 3] Y → aab [by 6]

  35. S A B Root label = start node. A A b B Each interior label = variable. a a b Each parent/child relation = derivation step. Each leaf label = terminal or e. All leaf labels together = derived string = yield. Derivations and Parse Trees S  A | A B A e | a | A b | A A B b | bc | B c | bB Sample derivations: S  AB  AAB aABaaBaabBaabb S  AB AbBAbbAAbbAabbaabb These two derivations use same productions, but in different orders. This ordering difference is often uninteresting. Derivation trees give way to abstract away ordering differences.

  36. Derivations and Parse Trees • We can graphically describe a derivation using a parse tree: • the root is labeled with the start symbol, S • each internal node is labeled with a non-terminal • the children of an internal node A are the right-hand side of a production A • each leaf is labeled with a terminal • A parse tree has a unique leftmost and a unique rightmost derivation (however, we cannot tell which one was used by looking at the tree)

  37. Leftmost vs. Rightmost Derivations Definition. A left-most derivation of a sentential form is one in which rules transforming the left-most nonterminal are always applied Definition. A right-most derivation of a sentential form is one in which rules transforming the right-most nonterminal are always applied

  38. S A B A A b B a a b Leftmost vs. Rightmost Derivations S  A | A B A e | a | A b | A A B b | bc | B c | b B Sample derivations for string aabb: S  AB  AAB aABaaBaabBaabb S  AB AbBAbbAAbbAabbaabb These two derivations are special: 1st derivation is leftmost. Always picks leftmost variable. 2nd derivation is rightmost. Always picks rightmost variable.

  39. Leftmost derivation: Rightmost derivation: Derivation Order: String aab

  40. Leftmost derivation: Rightmost derivation: Another Example: String abbbb

  41. Derivation TreesExample

  42. Derivation Tree

  43. Derivation Tree yield

  44. Partial Derivation Trees Partial derivation tree

  45. Partial derivation tree

  46. sentential form Partial derivation tree yield

More Related