Fast Full Parsing by Linear-Chain Conditional Random Fields

Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester

Outline • Motivation • Parsing algorithm • Chunking with conditional random fields • Searching for the best parse • Experiments • Penn Treebank • Conclusions

Motivation • Parsers are useful in many NLP applications • Information extraction, Summarization, MT, etc. • But parsing is often the most computationally expensive component in the NLP pipeline • Fast parsing is useful when • The document collection is large • e.g. MEDLINE corpus: 70 million sentences • Real-time processing is required • e.g. web applications

Parsing algorithms • History-based approaches • Bottom-up & left-to-right (Ratnaparkhi, 1997) • Shift-reduce (Sagae & Lavie 2006) • Global modeling • Tree CRFs (Finkel et al., 2008; Petrov & Klein 2008) • Reranking (Collins 2000; Charniak & Johnson, 2005) • Forest (Huang, 2008)

Chunk parsing • Parsing Algorithm • Identify phrases in the sequence. • Convert the recognized phrases into new non-terminal symbols. • Go back to 1. • Previous work • Memory-based learning (Tjong Kim Sang, 2001) • F-score: 80.49 • Maximum entropy (Tsuruoka and Tsujii, 2005) • F-score: 85.9

Parsing a sentence S VP NP NP QP VBN NN VBD DT JJ CD CD NNS . Estimated volume was a light 2.4 million ounces .

1st iteration NP QP VBN NN VBD DT JJ CD CD NNS . Estimated volume was a light 2.4 million ounces .

2nd iteration NP NP VBD DT JJ QP NNS . volume was a light million ounces .

3rd iteration VP NP VBD NP . volume was ounces .

4th iteration S NP VP . volume was .

5th iteration S was

Complete parse tree S VP NP NP QP VBN NN VBD DT JJ CD CD NNS . Estimated volume was a light 2.4 million ounces .

Chunking with CRFs NP QP VBN NN VBD DT JJ CD CD NNS . • Conditional random fields (CRFs) • Features are defined on states and state transitions Estimated volume was a light 2.4 million ounces . Feature function Feature weight

Chunking with “IOB” tagging B : Beginning of a chunk I : Inside (continuation) of the chunk O : Outside of chunks NP QP B-NP I-NP O O O B-QP I-QP O O VBN NN VBD DT JJ CD CD NNS . Estimated volume was a light 2.4 million ounces .

Features for base chunking ? VBN NN VBD DT JJ CD CD NNS . Estimated volume was a light 2.4 million ounces .

Features for non-base chunking NP ? NP VBD DT JJ QP NNS . VBN NN Estimated volume volume was a light million ounces .

Finding the best parse • Scoring the entire parse tree • The best derivation can be found by depth-first search.

Depth first search POS tagging Chunking (base) Chunking (base) Chunking Chunking Chunking Chunking Chunking Chunking

Finding the best parse

Extracting multiple hypotheses from CRF CRF • A* search • Uses a priority queue • Suitable when top n hypotheses are needed • Branch-and-bound • Depth-first • Suitable when a probability threshold is given 0.2 0.3 0.18 BIOOOB BIIOOB BIOOOO

Experiments • Penn Treebank Corpus • Training: sections 2-21 • Development: section 22 • Evaluation: section 23 • Training • Three CRF models • Part-of-speech tagger • Base chunker • Non-base chunker • Took 2 days on AMD Opteron 2.2GHz

Training the CRF chunkers • Maximum likelihood + L1 regularization • L1 regularization helps avoid overfitting and produce compact modes • OWLQN algorithm (Andrew and Gao, 2007)

Chunking performance Section 22, all sentences

Beam width and parsing performance Section 22, all sentences (1,700 sentences)

Comparison with other parsers Section 23, all sentences (2,416 sentences)

Discussions • Improving chunking accuracy • Semi-Markov CRFs (Sarawagi and Cohen, 2004) • Higher order CRFs • Increasing the size of training data • Create a treebank by parsing a large number of sentences with an accurate parser • Train the fast parser using the treebank

Conclusion • Full parsing by cascaded chunking • Chunking with CRFs • Depth-first search • Performance • F-score = 86.9 (12msec/sentence) • F-score = 88.4 (42msec/sentence) • Available soon

Fast Full Parsing by Linear-Chain Conditional Random Fields