Increasing power of LL(k) parsers

Increasing power of LL(k) parsers Bc. Jozef Lang (xlangj01) Bc. Zoltán Zemko (xzemko01)

Outline • LL(1) parsers • Why increasing the power of LL(k) parsers? • LL(k) parsers • Linear approximate LL(k) parsing • LL-regular parsing • Parse tree grammars • Extended LL(1) grammars • Conclusion

LL(1) parsing • Deterministic top-down parsing • Prediction is made only of one symbol, thus LL(1) is an 1 look-ahead parsing • The starting terminal symbol of every non-terminal symbol is needed when a parse table is constructed • FIRSTk set -- set of terminals that are at the first k positions of strings that non-terminal can be derived to • FOLLOW set – set of all terminal symbols that can follow non-terminal symbol in any sequential form derivable from S#

LL(1) versus strong-LL(1) • If all entries of the parse table have at most one element then the grammar is called strong-LL(1) • Every LL(1) grammar is also a strong-LL(1) grammar • When a parse table entry contains more than one entry then it is a LL(1) conflict • A parser with LL(1) conflict is not deterministic and thus is less efficient

LL(1) versus strong-LL(1) (2) • LL(1) conflicts can be solved by • Left recursion elimination • Left factoring • Conflict resolvers

Why increasing the power of LL(k) parsers? • Let’s have a following grammar where idf produces identifiers: • This fragment defines expression elements like x, sin(0.41), T[2,3] • First token is common for all expressions, but the second token distinguish between alternatives • Look ahead of only one token is not enough • It would be handy to increase power of deterministic LL parsing.

LL(k) parsers • It is sometimes handy to look ahead of k symbols, where k > 1 • Need to define FIRSTksets • Let’s have sequential form x. FIRSTk(x) is a set of terminals where: where y is some sequential form

LL(k) parsers (2) • Assume that we have following rule in a grammar G • Grammar G is a LL(k) grammar iff the sets FIRSTk(a1x#k) … FIRSTk(akx#k) are pairwise disjoint. • Symbol #k represents the number of look-ahead symbols • It is obvious that every LL(k) grammar is a subset of LL(k+1) grammars, this does not hold vice versa.

LL(k) parsers (3) • Similarly as by LL(1) parsers, producing parse tables for LL(k) parsers is difficult • FOLLOW set for a LL(k) grammar is defined as an union of FIRSTk(x#k) for any prediction Ax#k • As by LL(1) parsers, the parse table will be indexed by with a pair consisting of a non-terminal symbol and string of terminals with the length equal to k • If a parse table has for every entry at most one element then the grammar denoted by this parse table is strong-LL(k) • For k > 1 there are grammars that are LL(k) but not strong-LL(k)

LL(k) parsers (4) • Strong-LL(k) parsers are only seldom used in practice • Similar effect can be obtained by using conflict resolvers

Linear-approximate LL(k) parsing • Difficult constructing of LL(k) parse tables can be avoided by a simple trick • In addition to FIRST set, introduce SECOND, THIRD etc set • The size complexity is reduced from O(tk) to k tables of O(t), where k is the number of sets • Linear-approximate LL(k) grammar is weaker than LL(k) grammar because it breaks the relationship between tokens • Let’s assume that we have LL(2) grammar that has look ahead sets of { ab, cd }{ad, cb } • Linear-approximate LL(2) grammar has FIRST set { ac } and SECOND {bd} – there are not disjoint

LL -regular parsing • LL(k) provides bounded look-ahead • There are grammars where a discriminating token can be arbitrarily far away • Unbounded look-ahead is needed • Unbounded look-ahead forms its own context-free grammar • Context-free grammar can be approximated by regular grammar • There is no algorithm to approximate context-free grammar, but there are several heuristics

Parse tree grammar from LL(1) • A straightforward process • Basic idea is to create new rule for every prediction • The non-terminals are numbered by an increasing global counter • Then are inserted into prediction stack • New created rules forms parse tree grammar • As far as the parser is deterministic, the parse tree grammar is obtained instead of parse forest grammar

Parse tree grammar from LL(1) (2)

Extended LL(1) grammars • Some parsers accept Extended LL(1) grammars instead of ordinary one • To accept Extended LL(1) grammar parser must transform it to ordinary one without introducing LL(1) conflicts • An advantage of Extended LL(1) grammars is that they allow a more efficient implementation in recursive descent parsers

Conclusion • LL(1) is very intuitive, makes its steps according to prediction of one token • There are situations where look-ahead only of one symbol is not sufficient • The power of LL parsers can be improved by extending the bounding look-ahead to • a bounded length resulting in LL(k) parsing • a unbounded length resulting in LL – regular parsing • Linear-approximate LL(2) parsing is a convenient and simplified form of a LL(2) parsing

Thank you for your attention

Increasing power of LL(k) parsers