470 likes | 594 Vues
This note outlines the procedural steps required to evaluate the Bikel-Collins parser using Section 23 of the WSJ Penn Treebank. It includes setting up the necessary files, transforming the parsed trees into the required format using tsurgeon, and running the parser on the training data. The document provides commands for file manipulation, extracting sentences for parsing, and assessing performance with EVALB, emphasizing the need for proper configuration of file paths and Java classpaths. It concludes with questions encouraging students to explore the effects of training data on parser performance.
E N D
LING 581: Advanced Computational Linguistics Lecture Notes February 13th
Bikel Collins and EVALB • How did you guys do with Bikel-Collins on section 23?
Steps • Section 23 of the WSJ Penn Treebank is used for evaluation We need to get it into the one-tree-per-line format needed
Steps • Assemblesection 23 intoa single file • cd Desktop • cat ~/research/TREEBANK_3/parsed/mrg/wsj/23/*.mrg > wsj_23.mrg • more wsj_23.mrg (on my Desktop folder)
Steps • Use tsurgeon (that comes with tregex) to transform wsj-23.mrg into a file with one tree per line (wsj-23.gold) • File: README-tsurgeon.txt Program Command Line Options: -treeFile <filename> specify the name of the file that has the trees you want to transform. -po <matchPattern> <operation> Apply a single operation to every tree using the specified match pattern and the specified operation. -s Prints the output trees one per line, instead of pretty-printed.
Steps • Use tsurgeon (that comes with tregex) to transform wsj-23.mrg into a file with one tree per line (wsj-23.gold) • Note: (assume my current directory is my Desktop and I’ve installed tregex elsewhere) ../Downloads/stanford-tregex-2014-01-04/tsurgeon.sh -treeFilewsj_23.mrg -s > wsj_23.gold Error: Could not find or load main class edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon • File: tsurgeon.sh (Bourne-again shell script) #!/bin/sh export CLASSPATH=/Users/sandiway/Downloads/stanford-tregex-2014-01-04/stanford-tregex.jar:$CLASSPATH java -mx100m edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon "$@” ../Downloads/stanford-tregex-2014-01-04/tsurgeon.sh -treeFile wsj_23.mrg -s > wsj_23.gold
Steps • File: wsj_23.gold (on my Desktop)
Steps • Extracting the sentences for parsing • (Hmm, I forgot how I did it before..) • One way: load in sec 23, search /.*/, save matched sentences …
Steps • Extracting the sentences for parsing • (Hmm, I forgot how I did it before..) • One way: load in sec 23, search /.*/, save matched sentences …
Steps • Extracting the sentences for parsing • (Hmm, I forgot how I did it before..) • One way: load in sec 23, search /.*/, save matched sentences … • Run a Perl script to take out the file name, 0 and anything starting with a *, e.g.
Steps • Extracting the sentences for parsing • (Hmm, I forgot how I did it before..) • Another way: use the official Treebank pos files
Steps • Before you can run the parser, you need to train it • First, assemble the training data cd research/TREEBANK_3/parsed/mrg/wsj ls 00 06 12 18 24 01 07 13 19 MERGE.LOG 02 08 14 20 03 09 15 21 04 10 16 22 05 11 17 23 cat 0[2-9]/*.mrg 1[0-9]/*.mrg 2[01]/*.mrg > ~/Desktop/wsj-02-21.mrg
Steps • Before you can run the parser, you need to train it • ../research/dbp/bin/train 800 ../research/dbp/settings/collins.properties wsj-02-21.mrg • Executing command • \tjava -server –Xms800m –Xmx800m -cp /Users/sandiway/research/dbp/dbparser.jar -Dparser.settingsFile=../research/dbp/settings/collins.propertiesdanbikel.parser.Trainer -i wsj-02-21.mrg -o wsj-02-21.observed.gz -od wsj-02-21.obj.gz
Steps • Running the parser • (assume my data is on my Desktop – the current working directory) • Need to locate: • settings file: collins.properties • derived data file: wsj-02-21.obj.gz • Sentence file: test.txt • Run: • ../research/dbp/bin/parse 300 ../research/dbp/settings/collins.properties ../research/dbparser/bin/wsj-02-21.obj.gz test.txt • Output trees (in current working directory): • test.txt.parsed
Steps • Running the parser • (assume my data is on my Desktop – the current working directory) • If you have multiple processors, you can first split the input file into parts manually or using the split command: • split -l 500 wsj-23.txt wsj-23 Then merge the parses produced into one file…
Steps • Running evalb • Need: • parameter file: COLLINS.prm • Parser output: test.txt.parsed • Gold file: test.gold
Steps • (Assume my data files are on my Desktop and EVALB is installed in my research subdirectory) • ../research/EVALB/evalb -p ../research/EVALB/COLLINS.prmtest.goldtest.txt.parsed Recall 83.8, Precision 89.2
Summary • WSJ corpus: sections 00 through 24 • Training: normally 02-21 (20 sections) • concatenate mrg files as input to Bikel Collins trainer • training is fast • Evaluation: on section 23 • use tsurgeon to get data ready for EVALB • Use either the .pos files or tregex to grab the sentences for parsing • 2400+ sentences in section 23, parsing is slow… • Can split the sentences up and run on multiple processors
Summary • Question: • How much training data do you need? • Homework: • How does Bikel Collins vary in precision and recallon test data if you randomly pick 1..24 out of 25 sections to do the training with? • Test section: • I want you to pick a test section that’s not section 23 • Use EVALB • plot precision and recall graphs
Perl Code • Generate training data function shuffle permutes @sections
Training Data • What does the graph for the F-score (LR/LP) look like? F-score report your results next time… # of sentences used for training
Robustness and Sensitivity it’s often assumed that statistical models are less brittle than symbolic models • can get parses for ungrammatical data • are they sensitive to noise or small perturbations?
Robustness and Sensitivity Examples • Herman mixed the water with the milk • Herman mixed the milk with the water • Herman drank the water with the milk • Herman drank the milk with the water (mix) (drink) f(water)=117, f(milk)=21
Robustness and Sensitivity Examples • Herman mixed the water with the milk • Herman mixed the milk with the water • Herman drank the water with the milk • Herman drank the milk with the water (high) logprob = -50.4 (low) logprob = -47.2 different PP attachment choices
Robustness and Sensitivity First thoughts... • does milk forces low attachment? (high attachment for other nouns like water, toys, etc.) Is there something special about the lexical item milk? • 24 sentences in the WSJ Penn Treebank with milk in it, 21 as a noun
Robustness and Sensitivity First thoughts... Is there something special about the lexical item milk? • 24 sentences in the WSJ Penn Treebank with milk in it, 21 as a noun • but just one sentence (#5212) with PP attachment for milk Could just one sentence out of 39,832 training examples affect the attachment options?
sentences derived counts wsj-02-21.obj.gz parser parses Robustness and Sensitivity • Simple perturbation experiment • alter that one sentence and retrain
Robustness and Sensitivity • Simple perturbation experiment • alter that one sentence and retrain ✕ delete the PP with 4% butterfat altogether
Treebank sentences wsj-02-21.mrg training Derived counts wsj-02-21.obj.gz Robustness and Sensitivity • Simple perturbation experiment • alter that one sentence and retrain the Bikel/Collins parser can be retrained quicky or bump it up to the VP level
Robustness and Sensitivity • Result: • high attachment for previous PP adjunct to milk Could just one sentence out of 39,832 training examples affect the attachment options? YES • Why such extreme sensitivity to perturbation? • logprobs are conditioned on many things; hence, lots of probabilities to estimate • smoothing • need every piece of data, even low frequency ones
Details… • Two sets of files:
Robustness and Sensitivity • (Bikel 2004): • “it may come as a surprise that the [parser] needs to access more than 219 million probabilities during the course of parsing the 1,917 sentences of Section 00 [of the PTB].''
Robustness and Sensitivity • Trainer has a memory like a phone book:
Robustness and Sensitivity • (mod ((with IN) (milk NN) PP (+START+) ((+START+ +START+)) NP-A NPB () false right) 1.0) • modHeadWord(with IN) • headWord(milk NN) • modifier PP • previousMods(+START+) • previousWords((+START+ +START+)) • parent NP-A • head NPB • subcat() • verbIntervening false • side right • (mod ((+STOP+ +STOP+) (milk NN) +STOP+ (PP) ((with IN)) NP-A NPB () false right) 1.0) • modHeadWord (+STOP+ +STOP+) • headWord (milk NN) • modifier +STOP+ • previousMods (PP) • previousWords ((with IN)) • parent NP-A • head NPB • subcat () • verbIntervening false • side right • Frequency 1 observed data for: • (NP (NP (DT a)(NNmilk))(PP(IN with)(NP (ADJP (CD 4)(NN %))(NN butterfat))))
Robustness and Sensitivity 76.8% singular events 94.2% 5 or fewer occurrences
Robustness and Sensitivity • Full story more complicated than described here... • by picking different combinations of verbs and nouns, you can get a range of behaviors f(drank)=0 might as well have picked flubbed
Another experiment it’s often assumed that statistical models are less brittle than symbolic models…
Parsing Data • All 5! (=120) permutations of • colorless green ideas sleep furiously .
Parsing Data • The winning sentence was: • Furiously ideas sleep colorless green . • after training on sections 02-21 (approx. 40,000 sentences) sleep takes ADJP object with two heads adverb (RB) furiously modifies noun
Parsing Data • The next two highest scoring permutations were: • Furiously green ideas sleep colorless . • Green ideas sleep furiously colorless . sleep takes NP object sleep takes ADJP object
Parsing Data • (Pereira 2002) compared Chomsky’s original minimal pair: • colorless green ideas sleep furiously • furiously sleep ideas green colorless Ranked #23 and #36 respectively out of 120
Parsing Data But … • graph (next slide) shows how arbitrary these rankings are: • furiously sleep ideas green colorless Vastly outranks • colorless green ideas sleep furiously (and the top 3 sentences) when trained on randomly chosen sections covering 14,188 to 31,616 WSJ PTB sentences
Rank of Best Parse for Sentence Permutation • Note also that: • colorless green ideas sleep furiously totally outranks the top 3 (and #36) just briefly on one data point (when trained on 33605 sentences)