1 / 11

Dependency Parsing with Reference to Slovene, Spanish and Swedish

Dependency Parsing with Reference to Slovene, Spanish and Swedish. Simon Corston-Oliver Anthony Aue Microsoft Research. Noteworthy results. Slovene Labeled DA = 72.42% (second) Not significantly different from #1 (73.44%) Swedish #1 for unlabeled DA (89.54%)

devlin
Télécharger la présentation

Dependency Parsing with Reference to Slovene, Spanish and Swedish

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dependency Parsing with Reference to Slovene, Spanish and Swedish Simon Corston-Oliver Anthony Aue Microsoft Research

  2. Noteworthy results • Slovene • Labeled DA = 72.42% (second) • Not significantly different from #1 (73.44%) • Swedish • #1 for unlabeled DA (89.54%) • Much worse than #3 for labeled DA(79.69% vs 82.31%)

  3. Outline Two stage pipeline • Identify unlabeled directed dependencies • Label the dependencies

  4. Parser • Unlabeled directed dependencies • Discriminatively trained linear classifier • Projective dependencies only • Parse features • Case-normalized surface form and lemma • POS of each token • POS of intervening and neighboring tokens • Combinations of these • Direction and distance of attachment

  5. POS features • Use fine POS tags for all languages except Dutch and Turkish • Swedish: Normalize tags for auxiliaries • Orig: “vara” (be) = AV; “måst” (must) = MV • Replace with “aux” • Unlabeled DA: 89.23%  89.45%

  6. Root identification features • Many errors identifying root in periphrastic constructions with aux and participle • E.g. German aux/modal in second position in declarative main clause; • initial with subj-aux inversion • New features: • POS sequence to left of each token • “Leftmost finite verb and not preceded by subordinating conj or relative pron” • “Sentence does (not) contain finite verb”

  7. Root identification features • Danish improved • RA 94.12%  94.72% • Spanish improved • RA 80.08%  83.57%

  8. Labeling dependencies • Use a maximum entropy classifier (Berger et al 1996) • Fast to train • Good probability estimates • Intended to jointly model sets of labels • Actually labeled independently • Better results with SVM?

  9. Swedish using SVMs

  10. Japanese using SVMs

  11. Conclusion • Two stage pipeline • Feature engineering important • For predicting dependencies • For labeling dependencies • Replacing maxent classifier with SVM gave boost

More Related