420 likes | 547 Vues
This paper explores the design and optimization of grammars for parsing German using latent variable grammars. It discusses how annotation improves statistical fits, the transition from manual to automatic annotations, and the role of hierarchical training and adaptive splitting. Various methodologies including inference techniques like coarse-to-fine decoding and parameter smoothing are evaluated. The study examines the advantages and disadvantages of these approaches and presents results from shared tasks, highlighting the potential for multi-lingual parsing advancements.
E N D
Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley
The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98] • Head lexicalization[Collins ’99, Charniak ’00] • Automatic clustering?
Previous Work:Manual Annotation [Klein & Manning ’03] • Manually split categories • NP: subject vs object • DT: determiners vs demonstratives • IN: sentential vs prepositional • Advantages: • Fairly compact grammar • Linguistic motivations • Disadvantages: • Performance leveled out • Manually annotated
[Matsuzaki et. al ’05, Prescher ’05] Previous Work:Automatic Annotation Induction • Advantages: • Automatically learned: Label all nodes with latent variables. Same number k of subcategories for all categories. • Disadvantages: • Grammar gets too large • Most categories are oversplit while others are undersplit.
Overview [Petrov, Barrett, Thibaux & Kleinin ACL’06] • Learning: • Hierarchical Training • Adaptive Splitting • Parameter Smoothing • Inference: • Coarse-To-Fine Decoding • Variational Approximation • German Analysis [Petrov & Klein in NAACL’07]
Forward X1 X7 X2 X4 X3 X5 X6 . He was right Backward Learning Latent Annotations EM algorithm: • Brackets are known • Base categories are known • Only induce subcategories Just like Forward-Backward for HMMs.
Limit of computational resources Starting Point
DT-2 DT-3 DT-1 DT-4 Refinement of the DT tag DT
Refinement of the , tag • Splitting all categories the same amount is wasteful:
Oversplit? The DT tag revisited
Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful
Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful
Adaptive Splitting • Evaluate loss in likelihood from removing each split = Data likelihood with split reversed Data likelihood with split • No loss in accuracy when 50% of the splits are reversed.
Smoothing • Heavy splitting can lead to overfitting • Idea: Smoothing allows us to pool statistics
Treebank Coarse grammar Prune Parse Parse NP … VP NP-apple NP-1 VP-6 VP-run NP-17 NP-dog … … VP-31 NP-eat NP-12 NP-cat … … Refined grammar Refined grammar [Goodman ‘97, Charniak&Johnson ‘05] Coarse-to-Fine Parsing
Hierarchical Pruning Consider the span 5 to 12: coarse: split in two: split in four: split in eight:
G1 G2 G3 G4 G5 G6 DT DT1 DT2 DT1 DT2 DT3 DT4 Learning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8 Intermediate Grammars X-Bar=G0 G=
the the that that this this some some That That this this these these some some That • This • … • … … That That … … … this this … these this … … these … that … that these … … … some some … some some … EM State Drift (DT tag)
0(G) 1(G) 2(G) 3(G) 4(G) 5(G) G1 G2 G3 G4 G5 G6 G1 G2 G3 G4 G5 G6 Learning Learning Projection i G Projected Grammars X-Bar=G0 G=
Bracket Posteriors (after G0)
Bracket Posteriors (Movie) (Final Chart)
-2 -1 Parses: Derivations: -2 -1 -1 -2 -1 -1 -2 -1 -1 -2 -1 -1 Parse Selection Computing most likely unsplit tree is NP-hard: • Settle for best derivation. • Rerank n-best list. • Use alternative objective function / Variational Approximation.
Efficiency Results • Berkeley Parser: • 15 min • Implemented in Java • Charniak & Johnson ‘05 Parser • 19 min • Implemented in C
Parsing German Shared Task • Two Pass Parsing • Determine constituency structure (F1: 85/94) • Assign grammatical functions • One Pass Approach • Treat categories+grammatical functions as labels
Parsing German Shared Task • Two Pass Parsing • Determine constituency structure • Assign grammatical functions • One Pass Approach • Treat categories+grammatical functions as labels
Conclusions • Split & Merge Learning • Hierarchical Training • Adaptive Splitting • Parameter Smoothing • Hierarchical Coarse-to-Fine Inference • Projections • Marginalization • Multi-lingual Unlexicalized Parsing
Thank You! Parser is avaliable at http://nlp.cs.berkeley.edu