1 / 45

Constraint Based Hindi Parser

Constraint Based Hindi Parser. LTRC, IIIT Hyderabad. Introduction. Broad coverage parser Very crucial IL-IL MT systems, IE, co-reference resolution, etc. Why Dependency ?. Phrase Structures Intrinsically presumes order

edan
Télécharger la présentation

Constraint Based Hindi Parser

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Constraint Based Hindi Parser LTRC, IIIT Hyderabad

  2. Introduction • Broad coverage parser • Very crucial • IL-IL MT systems, IE, co-reference resolution, etc.

  3. Why Dependency ? • Phrase Structures • Intrinsically presumes order • Context Free Grammar (CFG) not well-suited for free-word order languages (Shieber, 1985) • Particularly ill suited to Indian Languages • Dependency Structures • Gives flexibility • Common structures • With appropriate labels, closer to Semantics

  4. Computational Paninian Grammar (CPG) • Based on Panini’s Grammar (500 BC) • Inspired by Inflectionally rich language (Sanskrit) • A dependency based analysis

  5. Computational Paninian Grammar (The Basic Framework) • Treats a sentence as a set of modifier-modified relations • Sentence has a primary modified or the root (which is generally a verb) • Gives us the framework to identify these relations • Relations between noun constituent and verb called ‘karaka’ • karakas are syntactico-semantic in nature • Syntactic cues help us in identifying the karakas

  6. karta – karma karaka • The boy opened the lock • k1 – karta • k2 – karma • karta, karma usually correspond to agent, theme • But not always • karakas are direct participants in the activity denoted by the verb open k1 k2 boy lock

  7. Basic karaka relations • karta – agent/doer/force • Relation label – k1 • karma – object/patient • Relation label – k2 • karana – instrument • Relation label – k3 • sampradaan – beneficiary • Relation label – k4 • apaadaan – source • Relation label – k5 • adhikarana – location in place/time/other • Relation label – k7p/k7t/k7 • For complete list of dependency relations: (Begum et al., 2008)

  8. Basic karaka relations raama phala khaataa hai ‘Ram eats fruit’

  9. Basic karaka relations raama chaaku se saiv kaatataa hai ‘Ram cuts the apple with knife’

  10. Basic karaka relations raama ne mohana ko pustaka dii ‘Ram gave a book to Mohan’

  11. Why Paninian Labels • Other choices for labels could be • Grammatical relations • Subject, Object, etc. • Behavioral tests (Mohanan, 1994) • Thematic roles • Agent, patient, etc. • No concrete cues • Difficult to extract them automatically • Karakas can be computationally exploited • Syntactically grounded, Semantically loaded • Gives a level of interface

  12. Levels of Language Analysis • Morphological analysis (Morph Info.) • Analysis in local context (POS tagging) • Sentence analysis (Chunking, Parsing) • Semantic analysis (Word sense disambiguation, etc.) • Discourse processing (Anaphora resolution, Informational Structure, etc.)

  13. Example • rAma ne mohana ko puswaka xI |

  14. Example – Parsed Output xI ‘give’ k2 k1 k4 puswaka ‘book’ rAma mohana

  15. Parser • Two stage strategy • Appropriate constraints formed • Stage I (Intra-clausal relations) • Dependency relations marked • Relations such as k1, k2, k3, etc. for each verb • Stage II (Inter-clausal relations & conjunct relations) • Conjuncts, relative clauses, kriya mula, etc

  16. Demand Frame for Verb • A demand frame or karaka frame for a verb indicates the demands the verb makes • It depends on the verb and its tense, aspect and modality (TAM) label. • A mapping is specified between karaka relations and vibhaktis (post-positions, suffix).

  17. Karaka Frame • It specifies what karakas are mandatory or optional for the verb and what vibhaktis (post-positions) they take respectively • Each verb belongs to a specific verb class • Each class has a basic karaka frame • Each TAM specifies a transformation rule

  18. Example • rAma mohana ko puswaka xewA hE | xewA hE ‘give is’ k2 k1 k4 puswaka ‘book’ rAma mohana Parsed Dependency Tree

  19. Transformations • Based on the TAM of the verb • rAma ne mohana ko KilOnA xiyA | • rAma ko mohana ko KilOnA xenA padZA | • Appropriate transformation applied

  20. Example • rAma ne mohana ko puswaka xI |

  21. Karaka Frame – xe (give)

  22. Transformation Rule – yA (TAM)

  23. Karaka Frame yA TAM rAma ne mohana ko KilOnA xiyA | Transformed frame for xeafter applying the yAtrasformation ---------------------------------------------------------------------------------------- arc-label necessity vibhakti lextype src-pos arc-dir ---------------------------------------------------------------------------------------- k1 m ne n l c k2 m 0|ko n l c k3 d se n l c k4 d ko n l c ---------------------------------------------------------------------------------------- 0 ne

  24. Parsed Output xI ‘give’ k2 k1 k4 puswaka ‘book’ rAma mohana

  25. Other frames • Adjectives

  26. Steps in Parsing SENTENCE Morph, POS tagging, Chunking Identify Demand Groups Load Frames & Transform Find Candidates Apply Constraints & Solve Final Parse

  27. Example: • rAma ne mohana ko KilOnA xiyA |

  28. Identify the demand group,Load and Transform DF • xiyA • Only verb • Transformed frame • Use ‘yA’ TAM info. ---------------------------------------------------------------------------------------- arc-label necessity vibhakti lextype src-pos arc-dir ---------------------------------------------------------------------------------------- k1 m ne n l c k2 m 0|ko n l c k3 d se n l c k4 d ko n l c ----------------------------------------------------------------------------------------

  29. Candidates k1 • rAma nemohana koKilOnA xiyA _ROOT_ | main k2 k2 k4

  30. Constraints • C1: For each of the mandatory demands in a demand frame for each demand group, there should be exactly one outgoing edge labeled by the demand from the demand group. • C2: For each of the optional demands in a demand frame for each demand group, there should be at most one outgoing edge labeled by the demand from the demand group. • C3: There should be exactly one incoming arc into each source group.

  31. Constraints • A parse of a sentence is obtained by satisfying all the above constraints • Ambiguous sentences have multiple parses • Ill formed sentences have no parse.

  32. Parse - I k1 • rAma nemohana koKilOnA xiyA _ROOT_ | main k2 k4

  33. Parse - I _ROOT_ main xiyA k2 k1 k4 rAma mohana KilOnA

  34. Integer Programming Constraints • Xijkrepresents a possible arc from word group i to j with karaka label k • It takes a value 1 if the solution has that arc and 0 otherwise. It cannot take any other values. • The constraint rules are formulated into constraint equations.

  35. Constraint Equations C1: For each demand group i, for each of its mandatory demands k, the following equalities must hold: Mik :Sjxikj = 1 C2: For each demand group i, for each of its optional or desirable demands k, the following inequalities must hold: Oik:Sjxikj <= 1 C3: For each of the source groups j, the following equalities must hold: Sj :Sikxikj = 1

  36. Multiple Frames • If more than one karaka frame for a verb • Call Integer Programming package for each frame • If more than one demand groups (e.g., multiple verbs) in the sentence with multiple demand frames • Call Integer Programming package for each combination of such frames

  37. Other frames • Common karaka frame • Attached to each karaka frame • Preference given to main frame if there are clashes • Fallback karaka frame • required karaka frame is missing • Graceful degradation

  38. Stage I: Types being handled • Simple Verbs • Non-finite verbs • wA_huA • wA_hI • nA • kara • 0_rahe, etc. • Copula • Genitive

  39. Example (Complex Sentence) • rAma ne phala khaakara mohana ko Ram ‘ERG’ fruit ‘having eaten’ Mohan ‘DAT’ KilOnA xiyA toy gave ‘Having eaten the fruit Ram gave the toy to Mohan’

  40. Candidates X1: k1 • rAma nephala khaakaramohana ko KilOnAxiyA_ROOT_ | X8: main X4: k2 X7: vmod X6: k2 X2: k2 X3: k2 X5: k4

  41. Constraint Equations • Verb ‘xe’ • Mandatory Demands (C1) • k1  x1 = 1 • k2  x2 + x3 + x4 = 1 • Optional Demands (C2) • k4  x5 <= 1 • Verb ‘khaa’ • Mandatory Demands (C1) • k2  x6 = 1 • vmod  x7 = 1 • _ROOT_ • C1 • Main  x8 = 1

  42. Constraint Equations (contd.) • Incoming Arcs into Source (C3) • rAma • x1 = 1 • phala • x4 + x6 = 1 • khaa • x7 = 1 • mohana • x3 + x5 = 1 • KilOnA • x2 = 1 • xe • x8 = 1

  43. Solution Graph _ROOT_ main xiyA k2 k1 k4 vmod rAma mohana KilOnA khaakara k2 phala

  44. References • Akshar Bharati and Rajeev Sangal. 1993. Parsing free word order languages in Paninian Framework. ACL:93, Proc.of Annual Meeting of Association of Computational Linguistics, Association of Computational Linguistics, New Jersey. USA. • Akshar Bharati, Rajeev Sangal, T Papi Reddy. 2002. A Constraint Based Parser Using Integer Programming In Proc. of ICON-2002: International Conference on Natural Language Processing. • Rafiya Begum, Samar Husain, Arun Dhwaj, Dipti Misra Sharma, Lakshmi Bai and Rajeev Sangal. 2008. Dependency Annotation Scheme for Indian Languages. In Proceedings of The Third International Joint Conference on Natural Language Processing (IJCNLP). Hyderabad, India. • S. M. Shieber. 1985. Evidence against the context-freeness of natural language. In Linguistics and Philosophy, p. 8, 334–343. • Tara Mohanan, 1994. Arguments in Hindi. CSLI Publications.

  45. THANKS!!

More Related