180 likes | 287 Vues
Explore argument structure analysis of event-nouns in Japanese, unsupervised learning, event classification, argument identification using case markers and contextual patterns. Experiments show improved accuracy using machine learning and linguistic features.
E N D
Learning-Based Argument Structure Analysis of Event-Nouns in Japanese Mamoru Komachi, Ryu Iida, Kentaro Inui and Yuji Matsumoto Graduate School of Information Science Nara Institute of Science and Technology, JAPAN 19 September 2007
Our goal • Our city, destroyed by the atomic bomb • Our citywas destroyedby the atomic bomb • The atomic bombdestroyed our city • the destruction of our cityby the atomic bomb destroy CAUSE UNDERGOER Nominalization Our city The atomic bomb IE, MT, Summarization, …
Argument structure of event-nouns Kanojo-karadenwa-gaki-ta She-ABL phone-NOM come-PAST • Logical cases for event-nouns are often not marked by case markers (She phoned me.) NOM ABL NOM DAT phone come she phone (me) she
Task setting Tom-ga kinou denwa-o ka-tta Tom-NOM yesterday phone-ACC buy-PAST • Event classification (determine event-hood) • Argument identification (Tom bought a phone yesterday.) phone buy ? NOM ACC ? Tom phone
Outline • Introduction • Argument structure analysis of event-nouns • Event classification • Argument identification • Conclusion • Future work
Unsupervised learning of patterns Positive … persuasion destruction … Same phrase • Encode an instance in a tree and learn contextual patterns as sub-trees by Boosting algorithm called BACT (Kudo and Matsumoto, 2004) … conducted destruction of documents … Depends Common noun Verb Encode each instance in a flat tree Using surface text, POS, dependency relations, etc. Having eventhood Negative … chair desk … Same phrase … a little chair around… Adj Prep Not having eventhood
Experiments of event classification • Method: Classify eventhood of event-nouns by Support Vector Machines • Data: 80 news articles (800 sentences) • 1,237 event-nouns (590 have eventhood) • Features: • Grammatical features • HeadPOS: CommonNoun • Semantic features • SemanticCategory: Animate • Contextual features • FollowsVerbalNoun: 1
Results of event classification • Baseline: use the first sense determined by corpus statistics (NAIST Text Corpus) • Proposed: machine learning based classifier • Precision = correct / event-nouns which are classified as having event-hood by system • Recall: correct / all event-nouns in the corpus Outperform in precision and F by using contextual patterns Can improve more by adding more data
Outline • Introduction • Argument structure analysis of event-nouns • Event classification • Argument identification • Conclusion • Future work
Argument identification • Build a classifier using tournament model (Iida et al., 2006) R:政府 L:政府 支援(する) NOM L:民間 日本 政府 による 民間 支援 が 活性 化 した。 Japanesegovernment-BY private sector support-NOM activate -PAST The support for the private sector by the Japanese government was activated. training decoding 日本,政府 R 民間,活性 L:民間 政府,民間 L 政府,民間 L:政府 政府,活性 L 日本,政府 R:政府
Calculation of PMI using pLSI • Estimate point-wise mutual information using Probabilistic Latent Semantic Indexing (Hoffman, 1999) where noun n depends on verb v throughcase marker c (Fujita et al., 2004) Dimension reduction by a hidden class z … pay forthe shoes <pay,for> shoes
Case alternation kanojo-ga benkyo-sita her-NOM study-PAST (She studied.) Case alignment dictionary In NomBank, 20% of the arguments that occur outside NP are in support verb construction (Jiang and Ng, 2006) • (ACCevent, oshie-ru) = DATpred→NOMevent kare-ga kanojo-ni benkyo-o oshie-ta he-NOM her-DAT study-ACC teach-PAST (He taught a lesson to her.) (teach)
Experiments of argument identification • Method: Apply the Japanese zero-anaphora resolution model (Iida et al., 2006) to the argument identification task • Both tasks lack case marker • Event classification = anaphoricity determination task • Data: 137 articles for training and 150 articles for testing (event-nouns: 722, NOM: 722, ACC: 278, DAT: 72)
Features 日本 政府 による 民間 支援 が 活性化 した。 Japanese government-BY private sector support-NOM activate-PAST The support for the private sector by the Japanese government was activated.
Accuracy of argument identification • Case alignment dictionary and co-occurrence statistics improved accuracy SVC: support verb construction; COOC: co-occurrence
Related work • Jiang and Ng (2006) • Built maxent classifier for the NomBank (Meyers et al., 2004) based on features for PropBank (Palmer et al., 2005) • Xue (2006) • Used Chinese TB • Liu and Ng (2007) • Applied Alternating Structure Optimization (ASO) to the task of argument identification
Conclusion • Defined argument structure analysis of event-nouns in Japanese • Proposed an unsupervised approach to learn contextual patterns of event-nouns to the event classification task • Performed argument identification using co-occurrence statistics and syntactic clues
Future work • Explore semi-supervised approach to the event classification task • Use more lexical resources to the argument identification task