1 / 18

Korean Treebank & Propbank

Korean Treebank & Propbank. Martha Palmer, Narae Han, Jinyoung Choi, Shijong Ryu University of Pennsylvania May 23, 2005. Outline. Status Report Korean Treebank Korean Propbank Frames Files lemma Split Argument. Korean Treebank - Done. Virginia Corpus

alain
Télécharger la présentation

Korean Treebank & Propbank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Korean Treebank & Propbank Martha Palmer, Narae Han, Jinyoung Choi, Shijong Ryu University of Pennsylvania May 23, 2005

  2. Outline • Status Report • Korean Treebank • Korean Propbank • Frames Files • lemma • Split Argument

  3. Korean Treebank - Done • Virginia Corpus • 54.5 thousand words (symbols tokenized) • Language training in a military setting • Newswire Corpus • 131.8 thousand words (symbols tokenized) • Korean Press Agency news articles from June 2, 1994, to March 20, 2000

  4. Korean Propbank – Current Status • First subtask: 54.5K Virginia corpus • 9,590 predicate tokens double-annotated (100%) • Second subtask: 131.8K Newswire corpus • 3,800 predicate tokens annotated out of 23,700 (15%) • Frames files • 1,800 predicates out of 2,800 (64%)

  5. Korean Frames files • Similar xml structure to English and Chinese Frames files to get compatibilty • Lemma of Korean Frames files is root, not stem • Stem = Root + Derivational suffix • Root has its own predicate argument structure • Derivational suffix has grammatical function

  6. Frames filse 1 – verb root • frameset meok.01 "eat": • Roleset: • ArgA: causer • Arg0: eater • Arg1: food • ‘meok-ta’: active form • Arg0: SBJ • Arg1: OBJ • ‘meok-hi-ta’ : passive form • Arg0: COMP • Arg1: SBJ • ‘meok-i-ta’: causative form • ArgA: SBJ • Arg0: COMP • Arg1: OBJ

  7. Frames files 2 – deverbal noun • frameset kong-keup.01 “supply": • Roleset: • Arg0: giver • Arg1: thing provided • Arg2: receiver • ‘kong-keup-ha-ta’: active form • Arg0: SBJ • Arg1: OBJ • Arg2: COMP • ‘kong-keup-toe-ta’ : passive form • Arg0: S • Arg1: SBJ • Arg2: COMP • ‘kong-keup-pat-ta’: recipient form • Arg0: COMP • Arg1: OBJ • Arg2: SBJ

  8. Split Arguments • Possessor & Possessee • Floating Quantifier • Small Clause • Deverbal Noun structure

  9. Possessor & Possessee 1 • kho-kki-ri-ka kho-ka kil-ta. Elephant’s trunk is long • a-peo-ci-ka ton-i phil-yo-ha-ta. Father needs money • (S (NP-SBJ kho-kki-ri-ka) elephant-nom (S (NP-SBJ kho-ka) trunk-nom (ADJP kil-ta))) long • (S (NP-SBJ a-peo-ci-ka) father-nom (S (NP-SBJ ton-i) money-nom (ADJP phil-yo-ha-ta))) need

  10. Possessor & Possessee 2 • kho-kki-ri-yi kho-ka kil-ta. Elephant’s trunk is long • *a-peo-ci-yi ton-i phil-yo-ha-ta. *Father’s money needs • (S (NP-SBJ (NP kho-kki-ri-yi) elephant-poss (NP kho-ka)) trunk-nom (ADJP kil-ta)) long • (S (NP-SBJ (NP a-peo-ci-yi)) father-poss (NP ton-i)) money-nom (ADJP (NP-COMP *pro*) phil-yo-ha-ta)) need

  11. Floating Quantifier • hak-saeng-i se myeong-i o-ass-ta. Three student came. • (S (NP-SBJ hak-saeng-i) student-nom (VP (NP-ADV se myeong-i) three-nom (VP o-ass-ta))) come-past • se myeong-yi hak-saeng-i o-ass-ta. • (S (NP-SBJ (NP se myeong-i) three-poss (NP hak-saeng-i)) student-nom (VP o-ass-ta)) come-past

  12. Small Clause 1 • na-neun keu-reul pa-po-ro saeng-kak-ha-eoss-ta. ‘I thought of him as a fool’ • na-neun keu-reul pan-cang-eu-ro ppop-ass-ta. ‘I elected him as the class president’ • (S (NP-SBJ na-neun) I-nom (VP (NP-OBJ keu-reul) him-acc (NP-COMP pa-po-ro) fool-abl saeng-kak-haess-ta)) think-past • (S (NP-SBJ na-neun) I-nom (VP (NP-OBJ keu-reul) him-acc (NP-COMP pan-cang-eu-ro) class president-abl ppop-ass-ta)) elect-past

  13. Small Clause 2 • na-neun keu-ka pa-po-ra-ko saeng-kak-ha-eoss-ta. • * na-neun keu-ka pan-cang-i-ra-ko ppop-ass-ta. • saeng-kak Arg0: thinker Arg1: thought • ppop- Arg0: voter Arg1: candidate Arg2: position

  14. Deverbal Noun structure 1 • na-neun eom-ma-e-ke-seo neuc-ke wa-to coh-ta-ko heo-rak-eul pat-ass-ta. ‘I had permission from mom that I can return home late’ • (S (NP-SBJ na-neun) (VP (NP-COMP eom-ma-e-ke-seo) (VP (S (S-SBJ (NP-SBJ *pro*) (VP (ADVP neuc-ke) (VP wa-to))) (ADJP coh-ta-ko)) (VP (NP-OBJ heo-rak-eul) pat-ass-ta))))

  15. Deverbal Noun structure 2 • na-neun eom-ma-e-ke-seo neuc-ke wa-to coh-ta-neun heo-rak-eul pat-ass-ta. • (S (NP-SBJ na-neun) (VP (NP-COMP eom-ma-e-ke-seo) (NP-OBJ (S (S-SBJ (NP-SBJ *pro*) (VP (ADVP neuc-ke) (VP wa-to))) (ADJP coh-ta-neun)) (NP heo-rak-eul)) pat-ass-ta)) • pat- • Arg0: receiver • Arg1: thing gotten • Arg2: giver

  16. Throughput • Creating Frames files • Approximately 70 predicates per week • Need 14 weeks to complete Frames files • Annotation • Approximately 1,600 predicate tokens per week • Need 14 weeks to complete annotation for the Newswire corpus

  17. To be done in future • Adjudicate & publish Korean Propbank • Revise Korean treebank guideline • Write Korean propbank guideline

  18. Thank You

More Related