1 / 13

Readings in Natural Language Processing An Efficient Context-Free Parsing Algorithm

Readings in Natural Language Processing An Efficient Context-Free Parsing Algorithm. J. Earley (1969). 이필용 (2003. 10. 15). Introduction. Context-free grammar: Used extensively for describing the syntax of programming language and natural languages

mandar
Télécharger la présentation

Readings in Natural Language Processing An Efficient Context-Free Parsing Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Readings in Natural Language ProcessingAn Efficient Context-Free Parsing Algorithm J. Earley (1969) 이필용 (2003. 10. 15)

  2. Introduction • Context-free grammar: Used extensively for describing the syntax of programming language and natural languages • The algorithm described here seems to be the most efficient of the general algorithms, and also it can handle a larger class of grammars in linear time than most of the restricted algorithms

  3. Terminology • Example grammar of simple arithmetic expression, grammar AE (Root : E, Terminal symbols : {a, +, *}, Nonterminals : {E, T, P}) E -> T E -> E+T T -> P T -> T * P P -> a

  4. Terminology • Derivation : E => E + T => T + T => T + P => T * P + P E => E + T => E + P => T + P => T * P + P • Represented : E / \ E + T | | T P / \ T * P

  5. Informal Explanation • Ex) P -> a. 0 T -> P. 0 T -> T.*P 0 E -> T. 0 E -> E.+T 0 • Each state in the set represents • A production (currently scanning a portion of the input string) which is derived from its right side • A point in that production which shows how much of the production’s right side we have recognized so far • A pointer back to the position in the input string at which we began to look for that instance of the production • A k-symbol string which is a syntactically allowed successor to that instance of the production, with a dot in it, followed by an integer and a string

  6. Informal Explanation • input string = a+a*a, k = 1S0φ -> .Eㅓ ㅓ 0 S3 P -> a. ㅓ+* 2(X1=a) E -> .E+T ㅓ+ 0 (X4=*) T -> P. ㅓ+* 2 E -> .T ㅓ+ 0 E -> E+T. ㅓ+ 0 T -> .T+P ㅓ+* 0 T -> T.*P ㅓ+* 2 T -> .P ㅓ+* 0 P -> .a ㅓ+* 0 S4 T -> T*.P ㅓ+* 2 (X5=a) P -> .a ㅓ+* 4S1 P -> a. ㅓ+* 0(X2=+) T -> P. ㅓ+* 0 S5 P -> a. ㅓ+* 4 E -> T. ㅓ+ 0 (X6=ㅓ) T -> T*P. ㅓ+* 2 T -> T.*P ㅓ+* 0 E -> E+T. ㅓ+ 0φ -> E.ㅓ ㅓ 0 T -> T.*P ㅓ+* 2 E -> E.+T ㅓ+ 0 φ -> E.ㅓ ㅓ 0 E -> E.+T ㅓ+ 0S2 E -> E+.T ㅓ+ 0(X3=a) T -> .T*P ㅓ+* 2 S6φ -> Eㅓ. ㅓ 0 T -> .P ㅓ+* 2 T -> .a ㅓ+* 2

  7. The Recognizer • Implementation • For each nonterminal, we keep a linked list of its alternatives, for use in prediction • The states in a state set are kept in a linked list so they can be processed in order • As each state set Si is constructed, we put entries into a vector of size i. The fth entry in this vector(0<=f<=i) is a pointer to a list of all states in Si with pointer f, i.e. states of the form (p, j, f, a) ∈ Si for some p, j, a. The vector and lists can be discarded after Si is constructed • For the use of the completer, we also keep, for each state set Si and nonterminal N, a list of all states (p, j, f, a) ∈ Si such that Cp(j+1)=N

  8. The Recognizer • Implementation • If the grammar contains null productions (A -> λ), we cannot implement the completer in a straightforward way. When performing the completer on a null state (A -> · a i) we want to add to Si each state in Si with A to the right of the dot

  9. Time and Space Bounds • The general case – n^3 recognizer in general- number of state in any state set Si is proportional to i (~i)- Total time for processing the states in Si plus the scanner and predictor operations is ~i- Completer takes ~i^2 steps in Si- Summing from i = 0, …, n + 1 gives ~n^3 steps • Unambiguous grammars

  10. Time and Space Bounds • Linear time • Space

  11. Empirical Results • Tested against top-down and bottom-up parsers • Superior to the backtracking algorithmsG1 G2 G3 G4 root: root: root: root: S->Ab S->aB S->ab|aSb S->AB A->a|Ab B->aB|b A->a|Ab B->bc|bB|Bd Gram- Sen- mar tence TD STD BU SBU Ours G1 ab^n (n^2+7n+2)/2 (n^2+7n+2)/2 9n+5 9n+5 4n+7 G2 a^nb 3n+2 2n+2 11*2^n+7 4n+4 4n+4 G3 a^nb^n 5n – 1 5n-1 11*2^(n-1)-5 6n 6n+4 G4 ab^ncd ~2^(n+6) ~2^(n+2) ~2^(n+6) (n^3+21n^2+46n+15)/3 18n+8

  12. The practical use of the algorithm • The FormMake recognizer into a parserTime bounds for the parser are the same as those of the recognizer, while the space bound goes up to n^3 in general in order to store the parser tree • The Use- Probably be most useful in natural language processing systems where the power of context-free grammars is used- Do in time n any grammar that a time n parser can do at all

  13. Conclusion • Matches or surpass the best previous results for times n^3(Younger), n^2(Kasami), and n(Knuth) with one single algorithm • Knuth’s algorithm works only on LR(k) grammars and Kasami’s only on unambiguous ones, but ours works on them all and maybe others

More Related