830 likes | 848 Vues
Delve into converting NFAs to DFAs using subset construction method, eliminating non-determinism, and creating transition tables. Learn the limitations of FA and explore push-down automata. Understand the steps involved and the significance of subset construction in the conversion process. Gain insights into lexical analysis and the formal ways to approach it. Dive into the architecture and phases of a compiler, covering lexical analyzer, syntax analyzer, semantic analysis, and optimization. Discover the theoretical aspects of regular languages, grammars, and automata, including the capabilities and limitations of finite state automata.
E N D
Cairo University • FCI Welcome to a journey to Compilers CS419 • Lecture10: • Lexical Analysis: • Finite Automata (NFA to DFA) Continued • Limitations of FA + Non-regular Languages • Context-free Languages + Push-down Automata
Non Deterministic Features of NFA There are three main cases of non-determinism in NFAs: • Transition to a state without consuming any input. • Multiple transitions on the same input symbol. • No transition on an input symbol. To convert NFAs to DFAs we need to get rid of non-determinism from NFAs.
Subset Construction Method Using Subset construction method to convert NFA to DFA involves the following steps: • For every state in the NFA, determine all reachable statesfor every input symbol. • The set of reachable states constitute a single statein the converted DFA (Each state in the DFA corresponds to a subset of states in the NFA). • Find reachable statesfor each new DFAstate, until no more new states can be found.
Subset Construction Method Fig1. NFA without λ-transitions
Fig1. NFA without λ-transitions Step1 Subset Construction Method 3 b a a a 2 a b Construct a transition table showing all reachable states for every state for every input signal. a,b 1 5 a a,b 4 b
Fig1. NFA without λ-transitions Fig2. Transition table Subset Construction Method 3 b a ??? ??? a a 2 a b ??? ??? a,b 1 5 ??? ??? a ??? a,b ??? 4 ??? ??? b
Fig1. NFA without λ-transitions Fig2. Transition table Subset Construction Method Transition from state q with input a Transition from state q with input b 3 b a Starts here ??? ??? a a 2 a b ??? ??? a,b 1 5 ??? ??? a ??? a,b ??? 4 ??? ??? b
Subset Construction Method Fig2. Transition table Step2 The set of states resulting from every transition function constitutes a new state. Calculate all reachable states for every such state for every input signal.
Fig3. Subset Construction table Starts with Initial state
Fig3. Subset Construction table Starts with Initial state Fig2. Transition table
Fig3. Subset Construction table Starts with Initial state Fig2. Transition table Step3 Repeat this process(step2) until no more new states are reachable.
Fig3. Subset Construction table Fig2. Transition table ??? ???
Fig3. Subset Construction table Fig2. Transition table ??? ???
Fig3. Subset Construction table Fig2. Transition table ??? ???
Fig3. Subset Construction table Fig2. Transition table ??? ???
Fig3. Subset Construction table Fig2. Transition table ??? ??? We already got 4 and 5. So we don’t add them again.
Fig3. Subset Construction table Fig2. Transition table ??? ???
Fig3. Subset Construction table Fig2. Transition table ??? ???
Fig3. Subset Construction table Fig2. Transition table ??? ???
Fig3. Subset Construction table Fig2. Transition table Stops here as there are no more reachable states ??? ???
Fig4. Resulting FA after applying Subset Construction to Fig1. Fig3. Subset Construction table a b a 12345 245 35 a a,b a b b a ∅ 1 3 b a,b b a 2 a b 45 5 b a 4 b
Architecture/Phasesof a Compiler Stream of characters Scanner / Lexical Analyzer Stream of tokens Literal Table Parser / Syntax Analyzer Analysis Front-end Parse/syntax tree Semantic Analyzer Annotated tree Symbol Table Source Code Optimization Intermediate code Synthesis Back-end Code Generator Target code Target Code Optimization Target code
Steps: • Identify Tokens, Lexemes, Patterns. • Write Regular Expressions for patterns. • Write Regular Definitions. • Draw Transition Diagrams. • Design Non-deterministic Finite Automata (NFA). • Transform NFA to Deterministic Finite Automata (DFA). Dr. Mohammad Nassef How can we design a LA? 26
Three equivalent formal ways to look at this approach Lexical Analysis: Three Views Specification Regular Expressions Regular Languages Finite State Automata Regular Grammars Implementation Representation
FSA can recognize Regular Languages. • Conversely, FSA can’t recognize non-Regular Languages: • It can’t count • Can't differentiate between 024124and 027124 • It can’t remember • It can’t compare future input to past input • Palindromes • Balanced Parentheses • Statement Structure Recognition • It can’t recognize any non-linear intervals like an^2 Capabilities of FSA
Non-Recursively Enumerable Languages Recursively Enumerable Languages Recursive Languages Context-Free Languages Regular Languages Hierarchy of languages
Today • Grammar • Definition as 4-tuple • Regular Grammars (RGs) … left-linear vs. right-linear • Context Free Grammars (CFGs) • Context Sensitive Grammars (CSGs) • Context Free Languages (CFLs) … examples • Parse Trees • Derivations … leftmost vs. rightmost • Ambiguity and Disambiguation • Grammar Simplification FCI-CU-EG
Languages • Finite Automata accept all regular languages and only regular languages • Many simple languages are non regular: and there is no finite automaton that accepts them. - {anbn : n = 0, 1, 2, …} - {w : w a is palindrome} • Context-free Languages (CFLs) are a larger class of languages that encompasses all regular languages and many others, including the two above.
Regular Language vs. Context-free Language • Lexemes scanned by the Lexical Analyzer belong to regular languages. • It is important to make sure that each scanned lexeme is belonging to some token class. • When talking about parsing, the Syntax Analyzer should validate the structure of a statement or an instruction. • Validation includes checking the order of words/tokens, and relations among them. • Thus, parsing cannot be done using regular grammars. It must make use of context-free grammars.
Context-Free Languages Regular Languages
Context-Free Languages Context-Free Grammars Pushdown Automata stack automaton
Pushdown Automaton -- PDA tape tape head stack head finite control stack
Pushdown Automaton -- PDA Input String Stack States Costas Busch - RPI
Pushdown Automata (PDA) • Informally: • A PDA is an NFA with a stack. • Transitions are modified to accommodate stack operations. • Questions: • What is a stack? • How does a stack help? • A FA can “remember” only a finite amount of information, whereas a PDA can “remember” an infinite amount of (certain types of) information, in one memory-stack
q6 q0 Example of weakness of FA {0n1n | 0=<n} is not regular, but {0n1n | 0nk, for some fixed k} is regular, for any fixed k. • For k=3: L = {ε, 01, 0011, 000111} 0 0 0 q1 q2 q3 1 1 1 1 0/1 1 1 q7 q5 q4 0/1 0 0 0
FA vs. PDA • In a FA, each state remembers a finite amount of information. • To get {0n1n | 0n} with a DFA would require an infinite number of states using the preceding technique. • An infinite stack solves the problem for {0n1n | 0 n} as follows: • Read all 0’s and place them on a stack • Read all 1’s and match with the corresponding 0’s on the stack • Only need two states to do this in a PDA • Similarly for {0n1m0n+m | n,m0}
The States Push symbol Input symbol Pop symbol Costas Busch - RPI
CFG to PDA • A PDA for AnBn = {anbn: n 0}
Example PDA PDA : Costas Busch - RPI
Basic Idea: • Push the a’s on the stack 2. Match the b’s on input with a’s on stack 3. Match found Costas Busch - RPI
Execution Example: Time 0 Input Stack current state Costas Busch - RPI
Time 1 Input Stack Costas Busch - RPI
Time 2 Input Stack Costas Busch - RPI
Time 3 Input Stack Costas Busch - RPI
Time 4 Input Stack Costas Busch - RPI