1 / 24

Persian Part Of Speech Tagging

Persian Part Of Speech Tagging. Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran. Decision Trees. Decision Tree (DT): Tree where the root and each internal node is labeled with a question.

tuyet
Télécharger la présentation

Persian Part Of Speech Tagging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran

  2. Decision Trees • Decision Tree (DT): • Tree where the root and each internal node is labeled with a question. • The arcs represent each possible answer to the associated question. • Each leaf node represents a prediction of a solution to the problem. • Popular technique for classification; Leaf node indicates class to which the corresponding tuple belongs.

  3. Decision Tree Example

  4. Decision Trees • A Decision Tree Model is a computational model consisting of three parts: • Algorithm to create the tree • Algorithm that applies the tree to data • Creation of the tree is the most difficult part. • Processing is basically a search similar to that in a binary search tree (although DT may not be binary).

  5. Decision Tree Algorithm

  6. Using DT in POS Tagging • Compute Ambiguity classes • Each term may have different tags • Ambiguity class for each term: set of all possible tags • compute # of occurrence for each tag in each ambiguity class

  7. Using DT in POS Tagging • Create Decision Tree on Ambiguity classes • In each level delete tag with minimum occurrence a b c d 10 20 25 40 b c d 40 39 50 b d 60 55 b

  8. Using DT in POS Tagging • Advantage • Easy to understand • Easy to implement • Disadvantage • Context independent

  9. Using DT in POS Tagging • Known Tokens Results

  10. POS tagging using HMMs Let W be a sequence of words W = w1 , w2 , … , wn Let T be the corresponding tag sequence T = t1 , t2 , … , tn Task : Find T which maximizes P ( T | W ) T’ = argmaxT P ( T | W )

  11. POS tagging using HMMs By Bayes Rule, P ( T | W ) = P ( W | T ) * P ( T ) / P ( W ) T’ = argmaxT P ( W | T ) * P ( T ) Transition Probability, P ( T ) = P ( t1 ) * P ( t2 | t1 ) * P ( t3 | t1 t2 ) …… * P ( tn | t1… tn-1 ) Applying Tri-gram approximation, P ( T ) = P ( t1 ) * P ( t2 | t1 ) * P ( t3 | t1 t2 ) …… * P ( tn | tn-2 tn-1 ) Introducing a dummy tag, $, to represent the beginning of a sentence, P ( T ) = P ( t1 | $ ) * P ( t2 | $ t1 ) * P ( t3 | t1 t2 ) …… * P ( tn | tn-2 tn-1 )

  12. POS tagging using HMMs Smoothing Transition Probabilities Sparse data problem Linear interpolation method P'(ti | ti - 2 , ti - 1) = λ1 P( ti ) + λ2 P(ti | ti - 1 ) + λ3 P(ti | ti - 2 , ti - 1) such that the s sum to 1

  13. Calculation of λs POS tagging using HMMs

  14. POS tagging using HMMs Emission Probability, P(W | T ) ≈ P(w1 | t1) * P(w2 | t2) * . . . * P(wn | tn) Context Dependency To make more dependent on the context the emission probability is calculated as: P(W | T ) ≈ P(w1 | $ t1) * P(w2 | t1t2) ...* P(wn | tn-1tn)

  15. POS tagging using HMMs • Smoothing technique is applied P' (wi | ti-1ti) = θ1 P(wi | ti) + θ2 P(wi | ti-1ti) Sum of all θs is equal to 1 • θs are different for different words.

  16. POS tagging using HMMs 1) 2) 3) 4) 5) 6)

  17. POS tagging using HMMs

  18. POS tagging using HMMs

  19. POS tagging using HMMs • Lexicon generation probability

  20. POS tagging using HMMs

  21. POS tagging using HMMs P(N V ART N | files like a flower) = 4.37*10-6

  22. POS tagging using HMMs • Known Tokens Results

  23. Unknown Tokens Results

  24. Overall Results

More Related