Part-of-Speech Tagging for Bengali with Hidden Markov Model

Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat, Sudeshna Sarkar Department of Computer Science & Engineering Indian Institute of Technology Kharagpur

Machine Learning to Resolve POS Tagging • HMM • Supervised (DeRose,88; Mcteer,91; Brants,2000; etc.) • Semi-supervised (Cutting,92; Merialdo,94; Kupiec,92; etc.) • Maximum Entropy (Ratnaparkhi,96; etc.) • TB(ED)L (Brill,92,94,95; etc.) • Decision Tree (Black,92; Marquez,97; etc.)

Our Approach • HMM based • Simplicity of the model • Language Independence • Reasonably good accuracy • Data intensive • Sparseness problem when extending order We are adapting first-order HMM

POS Tagging Schema Language Model Raw text Disambiguation Algorithm Tagged text Possible POS Class Restriction … POS tagging

POS Tagging: Our Approach First-order HMM First order HMM: Current state depends on previous state Raw text Disambiguation Algorithm Tagged text Possible POS Class Restriction … POS tagging

POS Tagging: Our Approach µ = (π,A,B) Model Parameters First-order HMM Raw text Disambiguation Algorithm Tagged text Possible POS Class Restriction … POS tagging

POS Tagging: Our Approach {T} : Set of all tags TMA(wi) : Set of tags computed by Morphological Analyzer µ = (π,A,B) First-order HMM ti  {T} or ti  TMA(wi) Raw text Disambiguation Algorithm Tagged text … POS tagging

POS Tagging: Our Approach {T} : Set of all tags TMA(wi) : Set of tags computed by Morphological Analyzer µ = (π,A,B) First-order HMM ti  {T} or ti  TMA(wi) Raw text Viterbi Algorithm Tagged text … POS tagging

Disambiguation Algorithm Text: Tags: Where, ti{T} , wi{T} = Set of tags

Disambiguation Algorithm Text: Tags: Where, ti TMA(wi), wi{T} = Set of tags

Learning HMM Parameters • Supervised Learning ( HMM-S) • Estimates three parameters directly from the tagged corpus

Learning HMM Parameters • Semi-supervised Learning (HMM-SS) • Untagged data (observation) are used to find a model that most likely produce the observation sequence • Initial model is created based on tagged training data • Based on initial model and untagged data, update the model parameters • New model parameters are estimated using Baum-Welch algorithm

Smoothing and Unknown Word Hypothesis • All emission and transition are not observed from the training data • Add-one smoothing to estimate both emission and transition probabilities • Not all words are known to Morphological Analyzer • Assume open class grammatical categories

Experiments • Baseline Model • Supervised bigram HMM (HMM-S) • HMM-S • HMM-S + IMA • HMM-S + CMA • Semi-supervised bigram HMM (HMM-SS) • HMM-SS • HMM-SS + IMA • HMM-SS + CMA

Data Used • Tagged data: 3085 sentences ( ~ 41,000 words) • Includes both the data in non-privileged and privileged mode • Untagged corpus from CIIL: 11,000 sentences (100,000 words) – unclean • To re-estimate the model parameters using Baum-Welch algorithm

Tagset and Corpus Ambiguity • Tagset consists of 27 grammatical classes • Corpus Ambiguity • Mean number of possible tags for each word • Measured in the training tagged data (Dermatas et al 1995)

Results on Development set

Error Analysis

Results on Test Set • Tested on 458 sentences ( 5127 words) • Precision: 84.32% • Recall: 84.36% • Fβ=1 : 84.34% Top 4 classes in terms of F-measure

Results on Test Set • Tested on 458 sentences ( 5127 words) • Precision: 84.32% • Recall: 84.36% • Fβ=1 : 84.34% Bottom 4 classes in terms of F-measure

Further Improvement • Uses suffix information to handle unknown words • Calculates the probability of a tag, given the last m letters (suffix) of a word • Each symbol emission probability of unknown word is normalized

Further Improvement • Accuracy reflected on development set

Conclusion and Future Scope • Morphological restriction on tags gives an efficient tagging model even when small labeled text is available • Semi-supervised learning performs better compare to supervised learning • Better adjustment of emission probability can be adopted for both unknown words and less frequent words • Higher order Markov model can be adopted

Thank You

Part-of-Speech Tagging for Bengali with Hidden Markov Model