1 / 24

Japanese Dependency Structure Analysis Based on Maximum Entropy Models

Japanese Dependency Structure Analysis Based on Maximum Entropy Models. Kiyotaka Uchimoto † Satoshi Sekine ‡ Hitoshi Isahara † † Kansai Advanced Research Center, Communications Research Laboratory ‡ New York University. Outline. Background

hieu
Télécharger la présentation

Japanese Dependency Structure Analysis Based on Maximum Entropy Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Japanese Dependency Structure Analysis Based on Maximum Entropy Models Kiyotaka Uchimoto † Satoshi Sekine ‡ Hitoshi Isahara † † Kansai Advanced Research Center, Communications Research Laboratory ‡ New York University

  2. Outline • Background • Probability model for estimating dependency likelihood • Experiments and discussion • Conclusion

  3. dependency 太郎は 赤い 赤 い 太郎 は バラ を 買い ました。 バラを Taro_wa bara_wo kai_mashita Aka_i Taro red rose bought 買いました。 bunsetsu Background • Japanese dependency structure analysis 太郎は赤いバラを買いました。 Taro bought a red rose. • Preparing a dependency matrix • Finding an optimal set of dependencies for the entire sentence

  4. Background (2) • Approaches to preparing a dependency matrix • Rule-based approach • Several problems with handcrafted rules • Coverage and consistency • The rules have to be changed according to the target domain. • Corpus-based approach

  5. Background (3) • Corpus-based approach • Learning the likelihoods of dependencies from a tagged corpus (Collins, 1996; Fujio and Matsumoto, 1998; Haruno et al., 1998) • Probability estimation based on the maximum entropy models (Ratnaparkhi, 1997) • Maximum Entropy model • learns the weights of given features from a training corpus

  6. Probability model :bunsetsu dependency  or • Assigning one of two tags • Whether or not there is a dependency between two bunsetsus • Probabilities of dependencies are estimated from the M. E. model. • Overall dependencies in a sentence • Product of probabilities of all dependencies • Assumption: Dependencies are independent of each other.

  7. M. E. model

  8. Feature sets • Basic features (expanded from Haruno’s list (Haruno, 1998)) • Attributes on a bunsetsu itself • Character strings, parts of speech, and inflection types of bunsetsu • Attributes between bunsetsus • Existence of punctuation, and the distance between bunsetsus • Combined features

  9. a b c d Feature sets dependency • Basic features: a, b, c, d, e • Combined features • Twin: (b,c), Triplet: (b,c,e), Quadruplet: (a,b,c,d), Quintuplet: (a,b,c,d,e) 赤 い 太郎 は バラ を 買い ました。 Taro_wa bara_wo kai_mashita Aka_i Taro red rose bought Anterior bunsetsu Posterior bunsetsu e “Head” “Type” “Head” “Type”

  10. Algorithm • Detect the dependencies in a sentence by analyzing it backwards (from right to left). • Characteristics of Japanese dependencies • Dependencies are directed from left to right • Dependencies do not cross • A bunsetsu, except for the rightmost one, depends on only one bunsetsu • In many cases, the left context is not necessary to determine a dependency • Beam search

  11. Experiments • Using the Kyoto University text corpus (Kurohashi and Nagao, 1997) • a tagged corpus of the Mainichi newspaper • Training: 7,958 sentences (Jan. 1st to 8th) • Testing: 1,246 sentences (Jan. 9th) • The input sentences were morphologically analyzed and their bunsetsus were identified correctly.

  12. Results of dependency analysis • When analyzing a sentence backwards, the previous context has almost no effect on the accuracy.

  13. 0.8714 Relationship between the number of bunsetsus and accuracy • The accuracy does not significantly degrade with increasing sentence length.

  14. a b c d Features and accuracy • Experiments without the feature sets • Useful basic features • Type of the anterior bunsetsu (-17.41%) and the part-of-speech tag of the head word on the posterior bunsetsu (-10.99%) • Distance between bunsetsus (-2.50%), the existence of punctuation in the bunsetsu (-2.52%), and the existence of brackets (-1.06%) • preferential rules with respect to the features Anterior bunsetsu Posterior bunsetsu e “Head” “Type” “Head” “Type”

  15. Features and accuracy • Experiments without the feature sets • Combined features are useful (-18.31%). • Basic features are related to each other.

  16. Lexical features and accuracy • Experiment with the lexical features of the head word • Better accuracy than that without them (-0.84%) • Many idiomatic expressions • They had high dependency probabilities. • “応じて(oujite, according to)---決める(kimeru, decide)” • “形で(katachi_de, in the form of) ---行われる(okonawareru, be held)” • More training data • Expect to collect more of such expressions

  17. Number of training data and accuracy • Accuracy of 81.84% even with 250 sentences • M. E. framework has suitable characteristics for overcoming the data sparseness problem.

  18. Comparison with related works

  19. Comparison with related works (2) • Combining a parser based on a handmade CFG and a probabilistic dependency model (Shirai, 1998) • Using several corpora: the EDR corpus, RWC corpus, and Kyoto University corpus. • Accuracy achieved by our model was about 3% higher than that of Shirai’s model. • Using a much smaller set of training data.

  20. Comparison with related works (3) • M. E. model (Ehara, 1998) • Set of similar kinds of features to ours • Only the combination of two features • Using TV news articles for training and testing • Average sentence length = 17.8 bunsetsus • cf. 10 in the Kyoto University corpus • Difference in the combined features • We also use triplet, quadruplet, and quintuplet features (+5.86%). • Accuracy of our system was about 10% higher than Ehara’s system.

  21. Comparison with related works (4) • Maximum Likelihood model (Fujio, 1998) • Decision tree models and a boosting method (Haruno, 1998) • Set of similar kinds of features to ours • Using the EDR corpus for training and testing • EDR corpus is ten times as large as our corpus. • Accuracy was around 85%, which is slightly worse than ours.

  22. Comparison with related works (5) • Experiments with Fujio’s and Haruno’s feature sets • The important factor in the statistical approaches is feature selection.

  23. Future work • Feature selection • Automatic feature selection (Berger, 1996, 1998; Shirai, 1998) • Considering new features • How to deal with coordinate structures • Taking into account a wide range of information

  24. Conclusion • Japanese dependency structure analysis based on the M. E. model. • Dependency accuracy of our system • 87.2% using the Kyoto University corpus • Experiments without feature sets • Some basic and combined features strongly contribute to improve the accuracy. • Number of training data and accuracy • Good accuracy even with a small set of training data • M. E. framework has suitable characteristics for overcoming the data sparseness problem.

More Related