Gibbs Sampling with Treenes constraint in Unsupervised Dependency Parsing

Gibbs Sampling with Treenes constraint in Unsupervised Dependency Parsing David Mareček and Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University in Prague September 15, 2011, Hissar, Bulgaria

Motivations for unsupervised parsing We want to parse texts for which we do not have any manually annotated treebanks texts from different domains different languages We want to learn sentence structures from the corpus only What if the structures produced by linguists are not suitable for NLP? Annotations are expensive It’s a challenge: can we beat the supervised techniques in some application?

Outline Parser description Priors Models Sampling Sampling constraints Treeness Root fertility Noun-root dependency repression Evaluation on Czech treebank on all 19 CoNLL treebanks from shared task 2006-2007 Conclusions

Basic features of our approach • Learning is based on Gibbs sampling • We approximate probability of a tree by a product of probabilities of individual edges • We used only POS tags for predicting a dependency relation • but we plan to use lexicalization and unsupervised POS tagging in the future • We introduce treeness as a hard constraint in the sampling procedure • It allows non-projective edges

Models • We use two simple models in our experiments • the parent POS tag conditioned by the child POS tag • the edge length (signed distance between the two words) conditioned by the child POS tag

Gibbs sampling • We sample each dependency edge independently • 50 iterations • The rich get richer (self-reinforcing behavior) • counts are taken from the history • Exchangability • we can deal with each edge as it was the last one in the corpus • nominators and denominators in the product are exchangable • Dirichlet hyperparameters α1 α2 were set experimentally

Basic sampling For each node, sample its parent with respect to the probability distribution The sampling order of the nodes is random Problem: it may create cycles and discontinuous graphs 0.01 0.02 0.05 0.04 0.03 0.05 0.07 ROOT Její dcera bylavčera v zoologické zahradě. 5 3 2 6 7 1 4

Treeness constraint In case a cycle is created: choose one edge in the cycle (by sampling) and delete it take the formed subtree and attach it to one of the remaining nodes (by sampling) 0.02 0.01 0.02 0.04 0.02 0.02 0.05 0.02 ROOT Její dcera bylavčera v zoologické zahradě.

Root fertility constraint Individual phrases tend to be attached to the technical root A sentence has usualy only one word (the main verb) that dominate the others We constrain the root fertility to be one If it has more than one child, we do the resampling sample one child that will stay under the root resample parents of other children 0.04 0.02 0.01 0.02 0.05 0.04 0.02 0.03 ROOT Její dcera bylavčera v zoologické zahradě.

Nouns (especially subjects) often substitute verbs in the governing positions. Majority of grammars are verbocentric Nouns can be easily recognized as the most frequent coarse-grained tag category in the corpus We add the following model: This model is useless when an unsupervised POS tagging is used Noun-ROOT dependency repression

Evaluation measures Evaluation of unsupervised parser on GOLD data is problematic many linguistics decisions must have been done before annotating each corpus how to deal with coordination structures, auxiliary verbs, prepositions, subordinating conjunctions? We use three following measures: UAS (unlabeled attachment score) – standard metric for evaluating dependency parsers UUAS (undirected unlabeled attachment score) – edge direction is disregarded (it is not a mistake if governor and dependent are switched) NED (neutral edge direction, Schwartz et al, 2011) which treats not only a node’s gold parent and child as the correct answer, but also its gold grandparent UAS < UUAS < NED

Evaluation on Czech Czech dependency treebank from CoNLL 2007 shared task Punctuation removed max 15-word sentences

Error analysis for Czech • Many errors are caused by the reversed dependencies • preposition – noun • subordinating conjunction – verb

Evaluation on 19 CoNLL languages We have taken the dependency treebanks from CoNLL shared tasks 2006 and 2007 POS tags from the fifth column were used The parsing was run on concatenated trainining and development sets Punctuation was removed Evaluation on the development sets only We compare our results with the state-of-the-art system, which is based on DMV (Spitkovsky et al, 2011)

Evaluation on 19 CoNLL languages

Conclusions • We introduced a new approach to unsupervised dependency parsing • Even though only a couple of experiments were done so far and only POS tags with no lexicalization are used, the results seem to be competitive to the state-of-the-art unsuperrvised parsers (DMV) • We have better UAS for 12 languages out of 19 • If we do not use noun-root dependency repression, which is useful only with supervised POS tags, we have better scores for 7 languages out of 19

Future work We would like to add: Word fertility model to model number of children for each node Lexicalization the word forms itself must be useful Unsupervised POS taging some recent experiments show that using word classes instead of supervised POS tags can improve the parsing accuracy

Thank you for your attention.

Gibbs Sampling with Treenes constraint in Unsupervised Dependency Parsing

Gibbs Sampling with Treenes constraint in Unsupervised Dependency Parsing

Presentation Transcript

unsupervised semantic parsing

Dependency Parsing: Machine Learning Approaches

Gibbs sampling

Dependency Parsing by Belief Propagation

Dependency Parsing

Partial Dependency Parsing for Irish

Unsupervised Dependency Parsing

Data-Driven Dependency Parsing

Dependency Parsing

Parsing Estonian with Constraint Grammar

Unsupervised Semantic Parsing

Dependency Parsing

Unsupervised Dependency Parsing

Dependency Parsing by Belief Propagation

Gibbs Sampling

An Attempt at Unsupervised Learning of Hierarchical Dependency Parsing

DEPENDENCY PARSING ， Framenet , SEMANTIC ROLE LABELING, SEMANTIC PARSING

Lexical Dependency Parsing

Exploiting Reducibility in Unsupervised Dependency Parsing

Motif finding with Gibbs sampling

Gibbs Sampling in Motif Finding

Unsupervised Dependency Parsing