500 likes | 600 Vues
Explore the Bikel Collins Parser for enhanced parsing accuracy using the Treebank for rules extraction and frequencies analysis. Discover training methods and automated evaluation techniques. Learn to operate the parser effectively.
 
                
                E N D
LING 581: Advanced Computational Linguistics Lecture Notes February 6th
Passive Sentences? • Frequencies • Total: 5507 (approx.) • With LGS? • Without LGS?
Treebank Rules • Just how many rules are there in the WSJ treebank? • What’s the most common POS tag? • What’s the most common syntax rule?
Treebank Total # of tags: 1,253,013
Treebank Total # of rules: 978,873 # of different rules: 31,338
Treebank Total # of rules: 978,873 # of different rules: 17,554
Today’s Topic Let’s segue from Treebank search to stochastic parsers trained on the WSJ Penn Treebank Examples: • Berkeley Parser • http://tomato.banatao.berkeley.edu:8080/parser/parser.html • Stanford Parser • http://nlp.stanford.edu:8080/parser/ are all trained on the Treebank. We’ll play with Bikel’s implementation of Collins’s Parser …
Using the Treebank • What is the grammar of the Treebank? • We can extract the phrase structure rules used, and • count the frequency of rules, and construct a stochastic parser
Using the Treebank • Breakthrough in parsing accuracy with lexicalized trees • think of expanding the nonterminal names to include head information and the words that are at the leaves of the subtrees.
Bikel Collins Parser • Java re-implementation of Collins’ parser • Paper • Daniel M. Bikel. 2004. Intricacies of Collins’ Parsing Model. (PS) (PDF) in Computational Linguistics, 30(4), pp. 479-511. • Software • http://www.cis.upenn.edu/~dbikel/software.html#stat-parser (page no longer exists)
Bikel Collins • Download and install Dan Bikel’sparser • dbp.zip (on course homepage) • File: install.sh • Java code • but at this point I think Windows won’t work because of the shell script (.sh) • maybe after files are extracted?
Bikel Collins • Download and install the POS tagger MXPOST parser doesn’t actually need a separate tagger…
Bikel Collins • Training the parser with the WSJ PTB • See guide • userguide/guide.pdf directory: TREEBANK_3/parsed/mrg/wsj chapters 02-21: create one single .mrg file events: wsj-02-21.obj.gz
Bikel Collins • Settings:
Bikel Collins • Parsing • Command • Input file format (sentences)
Bikel Collins • Verify the trainer and parser work on your machine
Bikel Collins • File: bin/parse is a shell script that sets up program parameters and calls java
Bikel Collins • File: bin/train is another shell script
Bikel Collins • Relevant WSJ PTB files
Bikel Collins • If you have tcl/tk installed, I use a wrapper to call Dan Bikel’s code makes it easy to work the parser without memorizing the command line options
Bikel Collins • For tree viewing, you can use tregex For demos, I use my own viewer
Bikel Collins • POS tagging (MXPOST, in directory jmx) • tagger_input • $prefix/jmx/mxpost $prefix/jmx/tagger.project < /tmp/test.txt 2> /tmp/err.txt • Parsing • set ddf "wsj-02-21.obj.gz” • set properties "collins.properties" • parser_input • $dbprefix/bin/parse 400 $dbprefix/settings/$properties $dbprefix/bin/$ddf /tmp/test2.txt 2>@ stdout • Training • set mrg "wsj-02-21.mrg” • set properties "collins.properties" • $dbprefix/bin/train 800 $dbprefix/settings/$properties $dbprefix/bin/$mrg 2>@ stdout Unix file descriptors 0 Standard input (stdin) • Standard output (stdout) • Standard error (stderr) GUI components frame .input text .input.t -height 4 -yscrollcommand {.input.s set} scrollbar .input.s -command {.input.tyview} frame .tagged text .tagged.t -height 9 -yscrollcommand {.tagged.s set} scrollbar .tagged.s -command {.tagged.tyview} Code proc tagger_input {} { set lines [.input.t get 1.0 end] set infile [open "/tmp/test.txt" w] puts -nonewline $infile [string trimright $lines] close $infile } proc parser_input {} { set lines [.tagged.t get 1.0 end] set infile [open "/tmp/test2.txt" w] puts -nonewline $infile [string trimright $lines] close $infile }
Statistical Parser Development • Methodology • partition corpus into a training and a test set • compare parser output on test set against withheld reference parses • Possible criteria • all or nothing: match the reference (“gold”) tree exactly • partial credit: use some “goodness of fit” metric
Statistical Parser Development • Standard Operating Procedure: 0 • one million words of 1989 Wall Street Journal (WSJ) articles • nearly 50,000 sentences (49,208), sectioned Penn Treebank (WSJ) training sections 2–21 almost 40K sentences (39,832) section 23 evaluation 24 2.4K sentences (2,416)
Computing Tree Similarity Evaluation: • we’ve got 2400 sentences to compare • How do we automate parser output scoring? • PARSEVAL: bracketing span match • A Procedure for Quantitatively Comparing the Syntactic Coverage of English Parsers (Black et al., 1991) • Computer program: EVALB (Sekine & Collins, 1997)
PARSEVAL Proposal: given trees T1 and T2 • Preprocessing stage: • Remove auxiliary verbs, not, infinitival marker to, empty categories, possessive ‘s, and punctuation • Remove unary branching nodes (including part of speech tags) • Compute scores: • # Crossing Parentheses • Recall • Precision in the literature, the F-score is reported, i.e. F = 2⋅precision⋅recall ÷ (precision + recall)
PARSEVAL Proposal: given trees T1 and T2 • Preprocessing stage: • Remove auxiliary verbs, not, infinitival marker to, empty categories, possessive ‘s, and punctuation • Remove unary branching nodes (including part of speech tags) would go there ⊢ go there hasbeen laughing ⊢ laughing does sing it correctly ⊢ sing it correctly is a cup is blue has a dollar does the laundry is not in here ⊢ is in here she opted to retire ⊢ she opted retire Lori’s mother ⊢ Lori mother (not deleted if copula or main verb)
PARSEVAL Proposal: given trees T1 and T2 • Preprocessing stage: • Remove auxiliary verbs, not, infinitival marker to, empty categories, possessive ‘s, and punctuation • Remove unary branching nodes (including part of speech tags)
PARSEVAL Proposal: given trees T1 and T2 • Preprocessing stage: • Remove auxiliary verbs, not, infinitival marker to, empty categories, possessive ‘s, and punctuation • Remove unary branching nodes (including part of speech tags) Black et al. (1991) describes other special case rules for sequences of prepositions etc.
PARSEVAL Proposal: given trees T1 and T2 • Compute scores: • # Crossing Parentheses • Recall • Precision
PARSEVAL Proposal: given trees T1 and T2 • Compute scores: • # Crossing Parentheses Parser (T1): Score = 1 Gold (T2):
PARSEVAL # Crossing Parentheses 0The1 prospect2 of3 cutting4 back5 spending6 Parser (T1): Gold (T2): easy to compute
PARSEVAL Proposal: given trees T1 and T2 (gold) • Compute scores: • # Crossing Parentheses • Recall • Precision
PARSEVAL Proposal: given trees T1 and T2 (gold) • Compute scores: • Recall • Precision |nodes(T1)∩nodes(T2 )| |nodes(T2)| |nodes(T1)∩nodes(T2 )| |nodes(T1)|
PARSEVAL Proposal: given trees T1 and T2 (gold) • Compute scores: • Recall • Precision Parser (T1): nodes(T1) ∩nodes(T2 ) Recall: 3/4 = 75% Precision: 3/5 = 60% F-score: 2⋅3/5⋅3/4 ÷ (3/5+3/4) = 69% Gold (T2):
PARSEVAL Recall and Precision: 0The1 prospect2 of3 cutting4 back5 spending6 Parser (T1): Gold (T2): also easy to compute
EVALB http://nlp.cs.nyu.edu/evalb/ Implements the basic PARSEVAL proposal (some small differences) • Preprocessing stage • Compute scores:
EVALB [6] THE PARAMETER (.prm) FILE The .prm file sets options regarding the scoring method. COLLINS.prmgives the same scoring behaviour as the scorer used in (Collins 97). The options chosen were: 1) LABELED 1 to give labelled precision/recall figures, i.e. a constituent must have the same span *and* label as a constituent in the goldfile. 2) DELETE_LABEL TOP Don't count the "TOP" label (which is always given in the output of tgrep) when scoring. 3) DELETE_LABEL -NONE- Remove traces (and all constituents which dominate nothing but traces) when scoring. For example .... (VP (VBD reported) (SBAR (-NONE- 0) (S (-NONE- *T*-1)))) (. .))) would be processed to give .... (VP (VBD reported)) (. .))) 4) DELETE_LABEL , -- for the purposes of scoring remove punctuation DELETE_LABEL : DELETE_LABEL `` DELETE_LABEL '' DELETE_LABEL . 5) DELETE_LABEL_FOR_LENGTH -NONE- -- don't include traces when calculating the length of a sentence (important when classifying a sentence as <=40 words or >40 words) 6) EQ_LABEL ADVP PRT Count ADVP and PRT as being the same label when scoring.
EVALB • To run the scorer: • > evalb -p Parameter_fileGold_fileTest_file • For example to use the sample files: • > evalb -p sample.prmsample.gldsample.tst
EVALB Gold standard: • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A-SBJ-1 (P this)) (B-WHATEVER (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test))) (-NONE- *)) • (S (A (P this)) (B (Q is) (A (R a) (T test))) (: *)) Test: • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (C (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (U test)))) • (S (C (P this)) (B (Q is) (A (R a) (U test)))) • (S (A (P this)) (B (Q is) (R a) (A (T test)))) • (S (A (P this) (Q is)) (A (R a) (T test))) • (S (P this) (Q is) (R a) (T test)) • (P this) (Q is) (R a) (T test) • (S (A (P this)) (B (Q is) (A (A (R a) (T test))))) • (S (A (P this)) (B (Q is) (A (A (A (A (A (R a) (T test)))))))) • (S (A (P this)) (B (Q was) (A (A (R a) (T test))))) • (S (A (P this)) (B (Q is) (U not) (A (A (R a) (T test))))) • (TOP (S (A (P this)) (B (Q is) (A (R a) (T test))))) • (S (A (P this)) (NONE *) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (S (NONE abc) (A (NONE *))) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (TT test)))) • (S (A (P This)) (B (Q is) (A (R a) (T test)))) • (S (A (P That)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test)))) • (S (A (P this)) (B (Q is) (A (R a) (T test))) (-NONE- *)) • (S (A (P this)) (B (Q is) (A (R a) (T test))) (: *))
EVALB Results: Sent. Matched Bracket Cross Correct Tag ID Len. Stat. Recal Prec. Bracket gold test Bracket Words Tags Accracy ============================================================================ 1 4 0 100.00 100.00 4 4 4 0 4 4 100.00 2 4 0 75.00 75.00 3 4 4 0 4 4 100.00 3 4 0 100.00 100.00 4 4 4 0 4 3 75.00 4 4 0 75.00 75.00 3 4 4 0 4 3 75.00 5 4 0 75.00 75.00 3 4 4 0 4 4 100.00 6 4 0 50.00 66.67 2 4 3 1 4 4 100.00 7 4 0 25.00 100.00 1 4 1 0 4 4 100.00 8 4 0 0.00 0.00 0 4 0 0 4 4 100.00 9 4 0 100.00 80.00 4 4 5 0 4 4 100.00 10 4 0 100.00 50.00 4 4 8 0 4 4 100.00 11 4 2 0.00 0.00 0 0 0 0 4 0 0.00 12 4 1 0.00 0.00 0 0 0 0 4 0 0.00 13 4 1 0.00 0.00 0 0 0 0 4 0 0.00 14 4 2 0.00 0.00 0 0 0 0 4 0 0.00 15 4 0 100.00 100.00 4 4 4 0 4 4 100.00 16 4 1 0.00 0.00 0 0 0 0 4 0 0.00 17 4 1 0.00 0.00 0 0 0 0 4 0 0.00 18 4 0 100.00 100.00 4 4 4 0 4 4 100.00 19 4 0 100.00 100.00 4 4 4 0 4 4 100.00 20 4 1 0.00 0.00 0 0 0 0 4 0 0.00 21 4 0 100.00 100.00 4 4 4 0 4 4 100.00 22 44 0 100.00 100.00 34 34 34 0 44 44 100.00 23 4 0 100.00 100.00 4 4 4 0 4 4 100.00 24 5 0 100.00 100.00 4 4 4 0 4 4 100.00 ============================================================================ 87.76 90.53 86 98 95 16 108 106 98.15 === Summary === -- All -- Number of sentence = 24 Number of Error sentence = 5 Number of Skip sentence = 2 Number of Valid sentence = 17 Bracketing Recall = 87.76 Bracketing Precision = 90.53 Complete match = 52.94 Average crossing = 0.06 No crossing = 94.12 2 or less crossing = 100.00 Tagging accuracy = 98.15 -- len<=40 -- Number of sentence = 23 Number of Error sentence = 5 Number of Skip sentence = 2 Number of Valid sentence = 16 Bracketing Recall = 81.25 Bracketing Precision = 85.25 Complete match = 50.00 Average crossing = 0.06 No crossing = 93.75 2 or less crossing = 100.00 Tagging accuracy = 96.88
EVALB [5] HOW TO CREATE A GOLDFILE FROM THE PENN TREEBANK The gold and parsed files are in a format similar to this: (TOP (S (INTJ (RB No)) (, ,) (NP (PRP it)) (VP (VBD was) (RB n't) (NP (NNP Black) (NNP Monday))) (. .))) To create a gold file from the treebank: tgrep -wn '/.*/' | tgrep_proc.prl will produce a goldfile in the required format. ("tgrep -wn '/.*/'" prints parse trees, "tgrep_process.prl" just skips blank lines). For example, to produce a goldfile for section 23 of the treebank: tgrep -wn '/.*/' | tail +90895 | tgrep_process.prl | sed 2416q > sec23.gold You don’t have the ancient program tgrep…
EVALB • However you can use tsurgeon from the Stanford tregexyou downloaded to accomplish the same thing • Example: • file: wsj_0927.mrg
EVALB ./tsurgeon.sh -treeFile wsj_0927.mrg -s ( (S (NP-SBJ-1 (NNP H.) (NNP Marshall) (NNP Schwarz)) (VP (VBD was) (VP (VBN named) (S (NP-SBJ (-NONE- *-1)) (NP-PRD (NP (NP (NN chairman)) (CC and) (NP (NN chief) (JJ executive) (NN officer))) (PP (IN of) (NP (NP (NNP U.S.) (NNP Trust) (NNP Corp.)) (, ,) (NP (NP (DT a) (JJ private-banking) (NN firm)) (PP (IN with) (NP (NP (NNS assets)) (PP (IN under) (NP (NN management))) (PP (IN of) (NP (QP (IN about) ($ $) (CD 17) (CD billion)) (-NONE- *U*)))))))))))) (. .))) ( (S (NP-SBJ (NP (NNP Mr.) (NNP Schwarz)) (, ,) (ADJP (NP (CD 52) (NNS years)) (JJ old)) (, ,)) (VP (MD will) (VP (VB succeed) (NP (NNP Daniel) (NNP P.) (NNP Davison)) (NP-TMP (NNP Feb.) (CD 1)) (, ,) (SBAR-TMP (RB soon) (IN after) (S (NP-SBJ (NNP Mr.) (NNP Davison)) (VP (VBZ reaches) (NP (NP (NP (DT the) (NN company) (POS 's)) (JJ mandatory) (NN retirement) (NN age)) (PP (IN of) (NP (CD 65))))))))) (. .))) ( (S (NP-SBJ-1 (NP (NNP Mr.) (NNP Schwarz)) (, ,) (SBAR (WHNP-2 (WP who)) (S (NP-SBJ (-NONE- *T*-2)) (VP (VBZ is) (NP-PRD (NP (NN president)) (PP (IN of) (NP (NNP U.S.) (NNP Trust))))))) (, ,)) (VP (MD will) (VP (VB be) (VP (VBN succeeded) (NP (-NONE- *-1)) (PP-LOC (IN in) (NP (DT that) (NN post))) (PP (IN by) (NP-LGS (NP (NNP Jeffrey) (NNP S.) (NNP Maurer)) (, ,) (NP (CD 42)) (, ,) (SBAR (WHNP-3 (WP who)) (S (NP-SBJ (-NONE- *T*-3)) (VP (VBZ is) (NP-PRD (NP (JJ executive) (NN vice) (NN president)) (PP (IN in) (NP (NP (NN charge)) (PP (IN of) (NP (NP (DT the) (NN company) (POS 's)) (NN asset-management) (NN group)))))))))))))) (. .))) ( (S (NP-SBJ (NP (NNP U.S.) (NNP Trust)) (, ,) (NP (NP (DT a) (JJ 136-year-old) (NN institution)) (SBAR (WHNP-2 (WDT that)) (S (NP-SBJ (-NONE- *T*-2)) (VP (VBZ is) (NP-PRD (NP (CD one)) (PP (IN of) (NP (NP (DT the) (JJS earliest) (NN high-net) (JJ worth) (NNS banks)) (PP-LOC (IN in) (NP (DT the) (NNP U.S.)))))))))) (, ,)) (VP (VBZ has) (VP (VBN faced) (NP (NP (VBG intensifying) (NN competition)) (PP (IN from) (NP (NP (JJ other) (NNS firms)) (SBAR (WHNP-3 (WDT that)) (S (NP-SBJ (-NONE- *T*-3)) (VP (VBP have) (VP (VP (VBN established) (NP (-NONE- *RNR*-1))) (, ,) (CC and) (VP (ADVP-MNR (RB heavily)) (VBN promoted) (NP (-NONE- *RNR*-1))) (, ,) (NP-1 (NP (JJ private-banking) (NNS businesses)) (PP (IN of) (NP (PRP$ their) (JJ own))))))))))))) (. .))) • You can then redirect standard output to a file …
EVALB Example • Assemblesection 23 into one file cat~/research/TREEBANK_3/parsed/mrg/wsj/23/*.mrg > wsj_23.mrg • atthis point not one tree per physicalline • Run tsurgeon ./tsurgeon.sh -treeFile wsj_23.mrg -s > wsj_23.gold • File wsj_23.gold contains one tree per line
Task • Run section 23 of the WSJ Treebank on the Bikel Collins parser • File: wsj-23.txt contains the sentences, one per line • File: wsj-23.lsp contains the sentences, with POS tags • Then run EVALB on the section to see how the Bikel Collins parser scores. Report back next time.