1 / 35

Two-Stage Constraint Based Sanskrit Parser

Two-Stage Constraint Based Sanskrit Parser. Akshar Bharati, IIIT,Hyderabad. Brief outline. Dependency Paninian framework vibhakti-karaka correspondence karaka frames (basic + transformation) Source groups, demand groups Constraints Three basic constraints

carol
Télécharger la présentation

Two-Stage Constraint Based Sanskrit Parser

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

  2. Brief outline • Dependency • Paninian framework • vibhakti-karaka correspondence • karaka frames (basic + transformation) • Source groups, demand groups • Constraints • Three basic constraints • Constraints as Integer programming equations

  3. Notions from Paninian Framework – a)Karaka relations • It uses the notion of karaka relations between verbs and nouns in a sentence. • The notion of karaka relations is central to the Paninian model. • The karaka relations are syntactico-semantic (or semantico-syntactic) relations between the verbals and other related constituents in a sentence.

  4. Notions from Paninian Framework – Demand Frames • For the task of karaka assignment, the core parser uses the fundamental principle of ' akanksha' (demand unit) and ' yogyata' (qualification of the source unit) . • Ex: CAwraH vixyAlayam gacCawi • (student) (school) (go) Verb Frame for this form of “gacCawi”

  5. Demand Frame • Gam1: ------------------------------------------------------------------------------- arc-label necessity vibhakti lex-type src-pos arc-dir ----------------------------------------------------------------------------- K1 m 1 n l ds K2 m 2 n l ds K3 m 3 n l ds K5 m 5 n l ds

  6. Constraint Based Parsing • Computational Paninian Model • Integer Programming with basic constraints • For each mandatory karakas in a karaka chart there should be exactly one outgoing edge labelled by the karaka from the demand group • For each of the desirable or optional karakas in a karaka chart there should be at most one outgoing edge labelled by the karaka from the demand group • There should be exactly one incoming arc into each of the source group

  7. Parser Two stage strategy • Stage I (Intra-clausal relations) • Dependency relations marked • Relations such as k1, k2, k3, etc. for each verb • Stage II (Inter-clausal relations & conjunct relations) • Conjuncts and relative clauses

  8. Steps in Parsing SENTENCE Morph, POS tagging, Chunking Identify Demand Groups STAGE - II Load Frames & Transform YES Is Complex NO Find Candidates Apply Constraints & Solve Final Parse

  9. Morph,Chunked,Tagged data (( 1 (( NP <fs af='CAwra,n,m,sg,,o,1,1‘ '> 1.1 CAwraH NN <fs af='CAwra,n,m,sg,,o,1,1'> )) 2 (( NP <fs af='vixyAlaya,n,m,sg,,d,2,2’> 2.1 vixyAlayam NN <fs af='vixyAlaya,n,m,sg,,d,2,2'> )) 3 (( VGF <fs af='gam1,v,,sg,3,,karwari_lat, gaNaH='BvAxiH' paxI='parasmEpaxI' XAwuH='gamLz'> 3.1 gacCawi VM <fs af='gam1,v,,sg,3,,karwari_lat,' paxI='parasmEpaxI' gaNaH='BvAxiH' XAwuH='gamLz'> )) ))

  10. CAwraH <fs af='CAwra,n,m,sg,,o,1,1'> • vixyAlayam <fs af='vixyAlaya,n,m,sg,,d,2,2'> • gacCawi <fs af='gam1,v,,sg,3,,karwari_lat,' paxI='parasmEpaxI' gaNaH='BvAxiH' XAwuH='gamLz'>

  11. Demand Frame • Gam1: ------------------------------------------------------------------------------- arc-label necessity vibhakti lex-type src-pos arc-dir ----------------------------------------------------------------------------- K1 m 1 n l ds K2 m 2 n l ds K3 m 3 n l ds K5 m 5 n l ds

  12. k1 k2CAwraH vixyAlayam gacCawi

  13. Sanskrit Example • CAwraH vixyAlayam gacCawi

  14. Steps (Stage II) Identify New Demand Groups Load Frames & Transform Output of STAGE - I Find Candidates Repair Apply Constraints & Solve FINAL PARSE

  15. Example – Relative Clause • vaha puswaka jo rAma ne mohana ko xI hE prasixXa hE that book which Ram ERG. Mohana DAT. gave is famous is ‘The book which Ram gave to Mohana is famous’

  16. Output after Stage - I _ROOT_ main main hE xI k1 k1s prasixXa k2 puswaka k1 k4 vaha jo rAma mohana

  17. Identify the demand group • xiyA ‘give’ • Main verb of the relative clause

  18. Identify the demand group,Load and Transform DF • jo ‘which’ transformation (special) • Transforms the demand frame of the main verb of the relative clause -------------------------------------------------------------------------------------------------------------- arc-label necessity vibhakti lextype src-pos arc-dir oprt -------------------------------------------------------------------------------------------------------------- nmod__relc m any n r|l p insert --------------------------------------------------------------------------------------------------------------

  19. Karaka Frame Main verb of relative clause • vaha puswaka jo rAma ne mohana ko xI prasixXa hE| • that book which Ram ERG. Mohana DAT. gave famous is • ‘The book which Ram gave to Mohana is famous’ Transformed frame for xeafter applying the jotrasformation -------------------------------------------------------------------------------------------------------- arc-label necessity vibhakti lextype src-pos arc-dir oprt -------------------------------------------------------------------------------------------------------- nmod__relc m any n r|l p insert --------------------------------------------------------------------------------------------------------- New row inserted after transformation

  20. Possible candidates • vaha puswakajo rAma ne mohana koxI hE prasixXa hE | nmod__relc

  21. Output after Stage - II _ROOT_ main hE k1 k1s prasixXa vaha puswaka nmod__relc xiyA hE k1 k2 k4 rAma mohana jo

  22. Example II – Coordination • rAma Ora siwA kala Aye | Ram and Sita yesterday came ‘Ram and Sita came yesterday’

  23. Output of Stage - I _ROOT_ dummy main dummy rAma Aye Ora k1 k7t siwA kala

  24. For Stage – II (Constraint Graph) _ROOT_ main rAma Aye Ora k1 k7t ccof ccof siwA kala

  25. Candidate Arcs _ROOT_ main k1 rAma Aye Ora k1 k1 ccof ccof siwA kala

  26. Solution Graph _ROOT_ main k1 rAma Aye Ora k7t ccof ccof siwA kala

  27. Parse tree _ROOT_ main Aye k7t k1 Ora kala ccof ccof siwA rAma Output after Stage II

  28. Results for Hindi

  29. Results • CBP: Results when only the first parse is considered • CBP’’: When best parse of the first 25 parses are considered • CBP was tested on 220 sentences • These are the results published in IALP-2008

  30. Work Progress in Sanskrit • Existing Constraint Based parser for Sanskrit can parse simple sentences. • Over 2000 demand charts • Two stage parsing needs more development • Experiments performed with 268 simple sentences • Re-ranking of parses is not done,only the first parse is considered for results • Results not very accurate due to data problems

  31. Results in Sanskrit • Labelled attachment score: 540 / 1213 * 100 = 44.52 % • Unlabeled attachment score: 876 / 1213 * 100 = 72.22 % • Label accuracy score: 566 / 1213 * 100 = 46.66 %

  32. Treebank requirement • Proper Gold tagged,chunked and dependency marked data for Sanskrit will improve the efficiency of the parser • Annotation with proper tools • It will also help us in using machine learning methods to train statistical parsers for Sanskrit

  33. Further work on Constraint Based Parsing. • Extension of the parser using treebank data • Hybrid approaches • Soft Constraints • Pruning of the graph in data driven parsers using Constraint Graph • Allow learning of the parser from the treebank data • Better performance

  34. What we expect From Data (( 1 (( NP <fs af='CAwra,n,m,sg,,o,1,1' drel='k1:3' name='1'> 1.1 CAwraH NN <fs af='CAwra,n,m,sg,,o,1,1'> )) 2 (( NP <fs af='vixyAlaya,n,m,sg,,d,2,2' drel='k2:3' name='2'> 2.1 vixyAlayam NN <fs af='vixyAlaya,n,m,sg,,d,2,2'> )) 3 (( VGF <fs af='gam1,v,,sg,3,,karwari_lat,' name='3' gaNaH='BvAxiH' paxI='parasmEpaxI' XAwuH='gamLz'> 3.1 gacCawi VM <fs af='gam1,v,,sg,3,,karwari_lat,' paxI='parasmEpaxI' gaNaH='BvAxiH' XAwuH='gamLz'> )) ))

  35. THANKS!!

More Related