330 likes | 429 Vues
This paper delves into the intricacies of learning and inference for hierarchically split PCFGs, discussing annotation, lexicalization, adaptive splitting, parameter smoothing, and more. It provides detailed insights into refining grammar through splitting strategies, with an emphasis on efficiency and accuracy.
E N D
Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein
The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98]
The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98] • Head lexicalization[Collins ’99, Charniak ’00]
The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98] • Head lexicalization[Collins ’99, Charniak ’00] • Automatic clustering?
Forward X1 X7 X2 X4 X3 X5 X6 . He was right Backward [Matsuzaki et al. ‘05] Learning Latent Annotations EM algorithm: • Brackets are known • Base categories are known • Only induce subcategories Just like Forward-Backward for HMMs.
Limit of computational resources Overview - Hierarchical Training - Adaptive Splitting - Parameter Smoothing
DT-2 DT-3 DT-1 DT-4 Refinement of the DT tag DT
Refinement of the , tag • Splitting all categories the same amount is wasteful:
Likelihood with split reversed Likelihood with split Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful
Likelihood with split reversed Likelihood with split Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful
Number of Phrasal Subcategories NP VP PP
Number of Lexical Subcategories POS TO ,
Number of Lexical Subcategories NNP JJ NNS NN
Smoothing • Heavy splitting can lead to overfitting • Idea: Smoothing allows us to pool statistics
Linguistic Candy • Proper Nouns (NNP): • Personal pronouns (PRP):
Linguistic Candy • Relative adverbs (RBR): • Cardinal Numbers (CD):
Inference She heard the noise. Exhaustive parsing: 1 min per sentence
Treebank Coarse grammar Prune Parse Parse NP … VP NP-1 VP-6 NP-17 … VP-31 NP-12 … Refined grammar [Goodman ‘97, Charniak&Johnson ‘05] Coarse-to-Fine Parsing
Hierarchical Pruning < t Consider again the span 5 to 12: coarse: split in two: split in four: split in eight:
G1 G2 G3 G4 G5 G6 DT DT1 DT2 DT1 DT2 DT3 DT4 Learning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8 Intermediate Grammars X-Bar=G0 G=
0(G) 1(G) 2(G) 3(G) 4(G) 5(G) G1 G2 G3 G4 G5 G6 G1 G2 G3 G4 G5 G6 Learning Learning Projection i G Projected Grammars X-Bar=G0 G=
Final Results (Efficiency) • Parsing the development set (1600 sentences) • Berkeley Parser: • 10 min • Implemented in Java • Charniak & Johnson ‘05 Parser • 19 min • Implemented in C
Extensions • Acoustic modeling • Infinite Grammars • Nonparametric Bayesian Learning [Petrov, Pauls & Klein ‘07] [Liang, Petrov, Jordan & Klein ‘07]
Conclusions • Split & Merge Learning • Hierarchical Training • Adaptive Splitting • Parameter Smoothing • Hierarchical Coarse-to-Fine Inference • Projections • Marginalization • Multi-lingual Unlexicalized Parsing
Thank You! http://nlp.cs.berkeley.edu