Distributional Part-of-Speech Tagging

Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA 94305-4115 , USA email: schuetze@csli.stanford.edu NLP Applications By Masood Ghayoomi Oct 15, 2007

Outline of the Talk • Introduction • Brief review on the literature • Presenting a hypothesis • Introducing induction experiments • Results • Conclusions • Discussions NLP Applications By Masood Ghayoomi Oct 15, 2007

Abstract of the Talk • This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. • This algorithm categorizes word tokens in context. NLP Applications By Masood Ghayoomi Oct 15, 2007

Introduction • Why is it needed? Increasing on line texts need to use automatic techniques to analyze a text. NLP Applications By Masood Ghayoomi Oct 15, 2007

Related Works • Stochastic Tagging: - Bigram or trigram models: require a relatively large tagged training text (Church, 1989; Charniak et al.,1993) - Hidden Markov Models: require no pretagged text (Jelinek, 1985; Cutting et al., 1991; Kupiec, 1992) • Rule-based Tagging: - Transformation-based tagging as introduced by Brill (1993): requires a hand-tagged text for training NLP Applications By Masood Ghayoomi Oct 15, 2007

Other Related Works • Using connectionist net to predict words by reflecting grammatical categories (Elman, 1990) • Inferring grammatical category from bigram statistics (Brill et al, 1990) • Using vector models in which words are clustered according to the similarity of their close neighbors in a corpus (Finch and Chater, 1992; Finch, 1993) • Presenting a probabilistic model for entropy maximization that relies on the immediate neighbors of words in a corpus (Kneser and Ney, 1993) • Applying factor analysis to collocations of two target words with their immediate neighbors (Biber, 1993) NLP Applications By Masood Ghayoomi Oct 15, 2007

Hypothesis for New Tagging Algorithm • The syntactic behavior of a word is represented with respect to its left and right context. Left neighbor  WORD  Right neighbor   Left context vector Right context vector NLP Applications By Masood Ghayoomi Oct 15, 2007

4 POS Tag Induction Experiments • Based on word type only • Based on word type and context • Based on word type and context, restricted to “natural” contexts • Based on word type and context, using generalized left and right context vectors NLP Applications By Masood Ghayoomi Oct 15, 2007

Word Type Only • A base line to evaluate the performance of distributional POS taggers • Words from BNC corpus clustered into 200 classes by considering left and right vector context similarities.All occurrences of a word assigned to one class. • Drawback: Problematic for ambiguous words;e.g. Work, Book NLP Applications By Masood Ghayoomi Oct 15, 2007

Word Type and Context • Dependency of a word’s syntactic role on: - the syntactic properties of its neighbors, - its own potential relationships with the neighbors. • Considering context for distributional tagging: - The right context vector of the preceding word. - The left context vector of w. - The right context vector of w. - The left context vector of the following word. • Drawback: fails for words whose neighbors are punctuation marks, since there are no grammatical dependencies between words and punctuation marks, in contrast to strong dependencies between neighboring words. NLP Applications By Masood Ghayoomi Oct 15, 2007

Word Type and Context,Restricted to “Natural” Contexts • For this drawback only for words with informative contexts were considered. • words next to punctuation marks, words with rare words as neighbors (less than ten occurrences) were excluded. NLP Applications By Masood Ghayoomi Oct 15, 2007

Word Type and Context,Using Generalized Left and Right Context Vectors • Generalization: The right context vector makes clear the classes of left context vectors which occur to the right of a word; and vice versa. • In this method the information about left and right context vectors of a word is kept separate in the computation.In the previous methods left and right context vectors of a word are always used. • This method is applied in two steps: - A generalized right context vector for a word is formed by considering the 200 classes - A generalized left context vectors by using word based right context vectors. NLP Applications By Masood Ghayoomi Oct 15, 2007

2 Examples • “seemed” and “would” have similar left contexts, and they characterize the right contexts of “he” and “the firefighter”. The left contexts are verbs which potentially belong to one syntactic category. • Transitive verbs and prepositions belong to different syntactic categories, but their right contexts are identical which they require a noun phrase. NLP Applications By Masood Ghayoomi Oct 15, 2007

Results • The Penn Treebank parses of the BNC were used. • The results of the four experiments are evaluated by forming 16 classes of tags from the Penn Treebank. • t tag • frequency the frequency of t in the corpus • # classes the number of induced tags i0, i1, . . . , il • correct the number of times an occurrence of t was correctly labeled as belonging to one of i0, i1, . . . , il • incorrect the number of times that a token of a different tag t’ was miscategorized as being an instance of i0, i1, . . . , il • precision the number of correct tokens divided by the sum of correct and incorrect tokens. • Recall the number of correct tokens divided by the total number of tokens of t • F an aggregate score from precision and recall NLP Applications By Masood Ghayoomi Oct 15, 2007

Result: Word Type Only Table 1: Precision and recall for induction based on word type. NLP Applications By Masood Ghayoomi Oct 15, 2007

Result: Word Type and Context Table 2: Precision and recall for induction based on word type and context. NLP Applications By Masood Ghayoomi Oct 15, 2007

Result:Word Type and Context;Generalized Left and Right Context Vectors Table 3: Precision and recall for induction based on generalized context vectors. NLP Applications By Masood Ghayoomi Oct 15, 2007

Result: Word Type and Context;Restricted to “Natural” Contexts Table 4: Precision and recall for induction for natural contexts. NLP Applications By Masood Ghayoomi Oct 15, 2007

Conclusions • Taking context into account improves the performance of distributional tagging,as F score increases: 0.49 < 0.72 < 0.74 < 0.79 • Performance for generalized context vectors is better than for word-based context vectors (0.74 vs. 0.72). NLP Applications By Masood Ghayoomi Oct 15, 2007

Discussions • “Natural” contexts’ performance is better than the other contexts (0.79), even though having low quality of the distributional information about punctuation marks and rare words are a difficulty for this tag induction. • Performing fairly good for typical and frequent contexts:prepositions, determiners, pronouns, conjunctions, the infinitive marker, modals, and the possessive marker • Failing tag induction for punctuations, rare words, and“-ing” forms of present participles and gerunds which are difficult as both exhibit verbal and nominal properties. NLP Applications By Masood Ghayoomi Oct 15, 2007

Thanks for your listening!

Distributional Part-of-Speech Tagging

Distributional Part-of-Speech Tagging

Presentation Transcript

Part of Speech Tagging (Chapter 8)

Part of Speech (POS) Tagging

Part-of-speech tagging

Part-of-Speech Tagging

CS4705 Part of Speech tagging

Part of Speech Tagging

Part-of-Speech (POS) tagging

Persian Part Of Speech Tagging

Part-of-Speech Tagging

Part of Speech Tagging

Part-of-Speech Tagging

Part-of-Speech Tagging

Part-of-Speech Tagging

Part-of-Speech Tagging

Part-of-Speech Tagging

Part of Speech Tagging

Part-of-speech Tagging

Part of Speech Tagging

Part-of-speech tagging

Part-of-Speech Tagging