500 likes | 920 Vues
Dependency Parsing. Some slides are based on: PPT presentation on dependency parsing by Prashanth Mannem Seven Lectures on Statistical Parsing by Christopher Manning. Constituency parsing. Breaks sentence into constituents (phrases), which are then broken into smaller constituents
E N D
Dependency Parsing Some slides are based on: PPT presentation on dependency parsing by PrashanthMannem Seven Lectures on Statistical Parsing by Christopher Manning
Constituency parsing • Breaks sentence into constituents (phrases), which are then broken into smaller constituents • Describes phrase structure and clause structure ( NP, PP, VP, etc.) • Structures often recursive
S NP VP VP NP mom is an amazing show
Dependency parsing • Syntactic structure consists of lexical items, linked by binary asymmetric relations called dependencies • Interested in grammatical relations between individual words (governing & dependent words) • Does not propose a recursive structure, rather a network of relations • These relations can also have labels
Dependency vs. Constituency • Dependency structures explicitly represent • Head-dependent relations (directed arcs) • Functional categories (arc labels) • Possibly some structural categories (parts-of-speech) • Constituency structure explicitly represent • Phrases (non-terminal nodes) • Structural categories (non-terminal labels) • Possibly some functional categories (grammatical functions)
Dependency vs. Constituency • A dependency grammar has a notion of a head • Officially, CFGs don’t • But modern linguistic theory and all modern statistical parsers (Charniak, Collins, …) do, via hand-written phrasal “head rules”: • The head of a Noun Phrase is a noun/number/… • The head of a Verb Phrase is a verb/modal/…. Based on a slide by Chris Manning
Dependency vs. Constituency • The head rules can be used to extract a dependency parse from a CFG parse (follow the heads) • A phrase structure tree can be got from a dependency tree, but dependents are flat Based on a slide by Chris Manning
Definition: dependency graph • An input word sequence w1…wn • Dependency graph G = (V,E) where • V is the set of nodes i.e. word tokens in the input seq. • E is the set of unlabeled tree edges (i, j) i, jєV • (ii, j) indicates an edge from i(parent, head, governor) to j (child, dependent)
Definition: dependency graph • A dependency graph is well-formed iff • Single head: Each word has only one head • Acyclic: The graph should be acyclic • Connected: The graph should be a single tree with all the words in the sentence • Projective: If word A depends on word B, then all words between A and B are also subordinate to B (i.e. dominated by B)
Non-projective dependencies Ram saw a dog yesterday which was a Yorkshire Terrier
Parsing algorithms • Dependency based parsers can be broadly categorized into • Grammar driven approaches • Parsing done using grammars • Data driven approaches • Parsing by training on annotated/un-annotated data
Unlabeled graphs • Dan Klein recently showed that labeling is relatively easy and that the difficulty of parsing lies in creating bracketing (Klein, 2004) • Therefore some parsers run in two steps: 1) bracketing; 2) labeling
Traditions • Dynamic programming • e.g., Eisner (1996), McDonald (2006) • Deterministic search • e.g., Covington (2001), Yamada and Matsumoto, Nivre (2006) • Constraints satisfaction • e.g., Maruyama, Foth et al.
Data driven • Two main approaches • Global, Exhaustive, Graph-based parsing • Local, greedy, transition-based parsing
Graph-based parsing • Assume there is a scoring function: • The score of a graph is • Parsing for input string x is All dependency graphs
MST algorithm (McDonald, 2006) • Scores are based on features, independent of other dependencies • Features can be • Head and dependent word and POS separately • Head and dependent word and POS bigram features • Words between head and dependent • Length and direction of dependency
MST algorithm (McDonald, 2006) • Parsing can be formulated as maximum spanning tree problem • Use Chu-Liu-Edmonds (CLE) algorithm for MST (runs in , considers non-projective arcs) • Uses online learning for determining weight vector w
Transition-based parsing • A transition system for dependency parsing defines: • a set C of parser configurations, each of which defines a (partially built) dependency graph G • a set T of transitions, each a function t:CC • for every sentence x = w0,w1, . . . ,wn • a unique initial configuration cx • a set Qxof terminal configurations
Transition sequence • A transition sequence Cx,m = (cx, c1, . . . , cm) for a sentence x is a sequence of configurations such that and, for every there is a transition such that • The graph defined by is the dependency graph of x
Transition scoring function • The score of a transition tin a configuration cs(c, t) represents the likelihood of taking transition t out of configuration c • Parsing is finding the optimal transition sequence ( )
Yamada and Matsumoto (2003) • A transition-based (shift-reduce) parser • Considers two adjacent words • Runs in iterations, continues as long as new dependencies are created • In every iteration, consider 3 different actions and choose one using SVM (or other discriminative learning technique) • Time complexity • Accuracy was shown to be close to the state-of-the-art algorithms (e.g., Eisner’s)
Y&M (2003) Actions • Shift • Left • Right
Y&M (2003) Learning • Features (lemma, POS tag) are collected from the context
Stack-based parsing • Introducing a stack and a buffer • The buffer is a queue of all input words (left to right) • The stack begins empty; words are pushed to the stack by the defined actions • Reduces Y&M complexity to linear time
2 stack-based parsers • Nivre’s (2003, 2006) arc-standard i doesn’t have a head already j doesn’t have a head already Stack Buffer
2 stack-based parsers • Nivre’s (2003, 2006) arc-eager
Example (arc eager) Red _ROOT_ figures on the screen indicated falling stocks S Q Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Shift Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Left-arc Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Shift Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Right-arc Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Shift Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Left-arc Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Right-arc Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Reduce Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Reduce Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Left-arc Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Right-arc Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Shift Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Left-arc Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Right-arc Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Reduce Borrowed from Dependency Parsing (P. Mannem)
Example Red _ROOT_ figures on the screen indicated falling stocks S Q Reduce Borrowed from Dependency Parsing (P. Mannem)
Graph (MSTParser) vs. Transitions (MaltParser) • Accuracy on different languages Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007
Graph (MSTParser) vs. Transitions (MaltParser) • Sentence length vs. accuracy Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007
Graph (MSTParser) vs. Transitions (MaltParser) • Dependency length vs. precision Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007
Known Parsers • Stanford (constituency + dependency) • MaltParser (dependency) • MSTParser (dependency) • Hebrew • Yoav Goldberg’s parser (http://www.cs.bgu.ac.il/~yoavg/software/hebparsers/hebdepparser/)