Japanese Dependency Structure Analysis Based on Maximum Entropy Models

Japanese Dependency Structure Analysis Based on Maximum Entropy Models Kiyotaka Uchimoto † Satoshi Sekine ‡ Hitoshi Isahara † † Kansai Advanced Research Center, Communications Research Laboratory ‡ New York University

Outline • Background • Probability model for estimating dependency likelihood • Experiments and discussion • Conclusion

dependency 太郎は赤い赤　い太郎　はバラ　を買い　ました。バラを Taro_wa bara_wo kai_mashita Aka_i Taro red rose bought 買いました。 bunsetsu Background • Japanese dependency structure analysis 太郎は赤いバラを買いました。 Taro bought a red rose. • Preparing a dependency matrix • Finding an optimal set of dependencies for the entire sentence

Background (2) • Approaches to preparing a dependency matrix • Rule-based approach • Several problems with handcrafted rules • Coverage and consistency • The rules have to be changed according to the target domain. • Corpus-based approach

Background (3) • Corpus-based approach • Learning the likelihoods of dependencies from a tagged corpus (Collins, 1996; Fujio and Matsumoto, 1998; Haruno et al., 1998) • Probability estimation based on the maximum entropy models (Ratnaparkhi, 1997) • Maximum Entropy model • learns the weights of given features from a training corpus

Probability model :bunsetsu dependency  or • Assigning one of two tags • Whether or not there is a dependency between two bunsetsus • Probabilities of dependencies are estimated from the M. E. model. • Overall dependencies in a sentence • Product of probabilities of all dependencies • Assumption: Dependencies are independent of each other.

M. E. model

Feature sets • Basic features (expanded from Haruno’s list (Haruno, 1998)) • Attributes on a bunsetsu itself • Character strings, parts of speech, and inflection types of bunsetsu • Attributes between bunsetsus • Existence of punctuation, and the distance between bunsetsus • Combined features

a b c d Feature sets dependency • Basic features: a, b, c, d, e • Combined features • Twin: (b,c), Triplet: (b,c,e), Quadruplet: (a,b,c,d), Quintuplet: (a,b,c,d,e) 赤　い太郎　はバラ　を買い　ました。 Taro_wa bara_wo kai_mashita Aka_i Taro red rose bought Anterior bunsetsu Posterior bunsetsu e “Head” “Type” “Head” “Type”

Algorithm • Detect the dependencies in a sentence by analyzing it backwards (from right to left). • Characteristics of Japanese dependencies • Dependencies are directed from left to right • Dependencies do not cross • A bunsetsu, except for the rightmost one, depends on only one bunsetsu • In many cases, the left context is not necessary to determine a dependency • Beam search

Experiments • Using the Kyoto University text corpus (Kurohashi and Nagao, 1997) • a tagged corpus of the Mainichi newspaper • Training: 7,958 sentences (Jan. 1st to 8th) • Testing: 1,246 sentences (Jan. 9th) • The input sentences were morphologically analyzed and their bunsetsus were identified correctly.

Results of dependency analysis • When analyzing a sentence backwards, the previous context has almost no effect on the accuracy.

0.8714 Relationship between the number of bunsetsus and accuracy • The accuracy does not significantly degrade with increasing sentence length.

a b c d Features and accuracy • Experiments without the feature sets • Useful basic features • Type of the anterior bunsetsu (-17.41%) and the part-of-speech tag of the head word on the posterior bunsetsu (-10.99%) • Distance between bunsetsus (-2.50%), the existence of punctuation in the bunsetsu (-2.52%), and the existence of brackets (-1.06%) • preferential rules with respect to the features Anterior bunsetsu Posterior bunsetsu e “Head” “Type” “Head” “Type”

Features and accuracy • Experiments without the feature sets • Combined features are useful (-18.31%). • Basic features are related to each other.

Lexical features and accuracy • Experiment with the lexical features of the head word • Better accuracy than that without them (-0.84%) • Many idiomatic expressions • They had high dependency probabilities. • “応じて(oujite, according to)---決める(kimeru, decide)” • “形で(katachi_de, in the form of) ---行われる(okonawareru, be held)” • More training data • Expect to collect more of such expressions

Number of training data and accuracy • Accuracy of 81.84% even with 250 sentences • M. E. framework has suitable characteristics for overcoming the data sparseness problem.

Comparison with related works

Comparison with related works (2) • Combining a parser based on a handmade CFG and a probabilistic dependency model (Shirai, 1998) • Using several corpora: the EDR corpus, RWC corpus, and Kyoto University corpus. • Accuracy achieved by our model was about 3% higher than that of Shirai’s model. • Using a much smaller set of training data.

Comparison with related works (3) • M. E. model (Ehara, 1998) • Set of similar kinds of features to ours • Only the combination of two features • Using TV news articles for training and testing • Average sentence length = 17.8 bunsetsus • cf. 10 in the Kyoto University corpus • Difference in the combined features • We also use triplet, quadruplet, and quintuplet features (+5.86%). • Accuracy of our system was about 10% higher than Ehara’s system.

Comparison with related works (4) • Maximum Likelihood model (Fujio, 1998) • Decision tree models and a boosting method (Haruno, 1998) • Set of similar kinds of features to ours • Using the EDR corpus for training and testing • EDR corpus is ten times as large as our corpus. • Accuracy was around 85%, which is slightly worse than ours.

Comparison with related works (5) • Experiments with Fujio’s and Haruno’s feature sets • The important factor in the statistical approaches is feature selection.

Future work • Feature selection • Automatic feature selection (Berger, 1996, 1998; Shirai, 1998) • Considering new features • How to deal with coordinate structures • Taking into account a wide range of information

Conclusion • Japanese dependency structure analysis based on the M. E. model. • Dependency accuracy of our system • 87.2% using the Kyoto University corpus • Experiments without feature sets • Some basic and combined features strongly contribute to improve the accuracy. • Number of training data and accuracy • Good accuracy even with a small set of training data • M. E. framework has suitable characteristics for overcoming the data sparseness problem.

Japanese Dependency Structure Analysis Based on Maximum Entropy Models

Japanese Dependency Structure Analysis Based on Maximum Entropy Models

Presentation Transcript

Maximum Entropy versus Random Utility Theory in Discrete Choice Models

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF )

Reduction of Maximum Entropy Models to Hidden Markov Models

Japanese Dependency Structure Analysis Based on Maximum Entropy Models

Maximum Entropy Model (I)

Maximum Entropy

MaxImum entropy

Maximum Entropy

Maximum Entropy Model (I)

Japanese Dependency Analysis using Cascaded Chunking

Maximum Entropy Model (II)

A Maximum Entropy-based Model for Answer Extraction

Maximum Entropy Model

Segmentation via Maximum Entropy Model

The Maximum-Entropy Stewpot

Maximum Entropy Discrimination

Maximum Entropy, Maximum Entropy Production and their Application to Physics and Biology

Maximum Entropy Model

MAXIMUM ENTROPY MARKOV MODEL

Maximum Entropy

Maximum Entropy, Maximum Entropy Production and their Application to Physics and Biology

Maximum Entropy Model (II)