130 likes | 275 Vues
This document introduces programming language description through two formal systems: regular expressions and context-free grammars. Regular expressions precisely describe the smaller components such as identifiers and numbers, while context-free grammars outline larger constructs like expressions and statements. The properties, limitations, and practical applications of each method are explored. Examples, such as parsing techniques in compilers, illustrate how these concepts function in software development. Learn the fundamentals essential for understanding programming language structure and syntax.
E N D
Description of programming languages Using regular expressions and context free grammars Description of programming languages
Introduction • Programming languages must be described in an exact language • No discussion whether a language element is legal or not • I will introduce 2 description languages • Regular expressions • Used to describes the “small” parts of a programming language • Identifiers, numbers, etc. • Context free grammars • Used to describes the “bigger” parts of a programming language • Expressions, statements, classes, etc. Description of programming languages
Regular expressions defined • We need an alphabet called Σ • Example alphabets: ASCII, UNICODE • Regular expressions are sets • Ø (the empty set) is a regular expression • { ε } is a regular set • ε means the empty string • All sets {a} where a is in the alphabet Σ are regular expressions • From two regular expressions R and S we can generate more regular expressions • R | S R U S • RS Concatenations of strings from R and from S • R* if R is {a} then R* is {ε, a, aa, aaa, … } Description of programming languages
Regular expressions examples • Set of positive integers • (0|1|2|3|4|5|6|7|8|9) (0|1|2|3|4|5|6|7|8|9)* • Set of words in English • (a|b|…|z)(a|b|…|z)* • Not exactly English … • bbz is in the set, but is not an English word Description of programming languages
Regular expressions, short hand notation • R+ means R R* • 1 or more occurrences • R? means ε | R • 0 or 1 occurrence • [a-z] means a|b|c|…|z • [a-zA-Z] means [a-z] | [A-Z] • Examples • Integer: -?[0-9]+ • Identifier: [a-zA-Z][a-zA-Z0-9]* Description of programming languages
Regular expressions in Java • Java API which uses regular expressions • Class String • String[].split(String regex) • “Java is my favorite language”.split(“ “) • produces an array {Java, is, my, favorite, language} • “ “ is a very simple regular expression • Package java.util.regex • Class Pattern • Class Matcher Description of programming languages
What regular expressions can’t do • Regular expression can describe simple languages. • Regular expressions have no “memory” • Cannot describe parenthesis structures • (((a + b) + c) + d) • if (…) { if (…) … else …} else … • We need something stronger! • Context free grammars Description of programming languages
Context free grammars defined • A context free grammar consists of 4 parts • V is an alphabet • Σ is a set of terminals,Σ⊂ V • The elements of the set V − Σ are called non-terminals • R is a set of production rules, (V − Σ) X V* • S the start symbol, S ∈ V − Σ Description of programming languages
Context free grammars examples • Example a, b • Alphabet {a, b, A} • Terminals { a, b } • Non-terminals { A } • Production • {A → Aa, A → Ab, A → a, A → b} • Some derivations • A → Aa → Aaa → Abaa → abaa • A → Ab → ab • A → Ab → bb Description of programming languages
We only state the productions explicitly Terminals and non-terminals can be inferred by looking at the productions Convention Capital letters: Non-terminals Non-capital letters: Terminals Boolean expressions E → true E → false E → E && E E → E || E E → (E) E → !E Derivations E → E && E → E && (E) → E && (E || E) →* true && (false || true) Sometimes pictured as a (parse) tree. Example: Boolean expressions Description of programming languages
What context free grammars can’t do • Context free grammars cannot be used to check that a variable is declared before it is used • And by no means to check the variables type Description of programming languages
The phases of a compiler • Lexical analysis (scanning) • Using regular expressions • Syntax analysis (parsing) • Using context free grammars • Semantic analysis • Using a symbol table • Code generation Description of programming languages
References • Wikipedia • Regular expression http://en.wikipedia.org/wiki/Regular_expression • Context-free grammar http://en.wikipedia.org/wiki/Context-free_grammar • FriedlMastering Regular Expressions, 2nd edition, O’Reilly 2002 • An entire book (460 pages) devoted to regular expressions • J2SE 5.0 API specification • package java.util.regex • Scott A. HommelRegular Expressions, The Java Tutorial • http://java.sun.com/docs/books/tutorial/extra/regex/index.html • Lewis & PapadimitriouElements of the Theory of Computation, Pearson 1997 • Introduction to regular expressions and context free grammars (and a lot more) • Aho, Sethi & UllmanCompilers: Principles, Techniques and Tools, Addison Wesley 1986 • A famous book on compilers. • Referred to as “The Dragon Book” Description of programming languages