1 / 22

Correlates between Performance, Prosodic and Phrase Structures in Bangla and Hindi Insights from a Psycholinguistic Expe

Correlates between Performance, Prosodic and Phrase Structures in Bangla and Hindi Insights from a Psycholinguistic Experiment. Kalika Bali 1 , Monojit Choudhury 1 , Diptesh Chatterjee 2 , Sankalan Prasad 2 , Arpit Maheswari 3 1 Microsoft Research Lab India, 2 IIT Kharagpur, 3 IIT Bombay

traci
Télécharger la présentation

Correlates between Performance, Prosodic and Phrase Structures in Bangla and Hindi Insights from a Psycholinguistic Expe

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlates between Performance, Prosodic and Phrase Structures in Bangla and HindiInsights from a Psycholinguistic Experiment Kalika Bali1, Monojit Choudhury1, Diptesh Chatterjee2, Sankalan Prasad2, Arpit Maheswari3 1Microsoft Research Lab India, 2IIT Kharagpur, 3IIT Bombay Contact: monojitc@microsoft.com

  2. Syntactic Processing Pipeline शिमला से मनाली ना जाकर सीधे दिल्ली से विमान ले लो | POS/ Morphological Analysis शिमला\NP से\PP मनाली\NP ना\RP जाकर\PL सीधे\RB दिल्ली\NP से\PP विमान\NN ले\VM लो\VA| Parsing शिमला\NP से\PP मनाली\NP ना\RP जाकर\PL सीधे\RB दिल्ली\NP से\PP विमान\NN ले\VM लो\VA|

  3. Syntactic Processing Pipeline शिमला से मनाली ना जाकर सीधे दिल्ली से विमान ले लो | POS/ Morphological Analysis शिमला\NP से\PP मनाली\NP ना\RP जाकर\PL सीधे\RB दिल्ली\NP से\PP विमान\NN ले\VM लो\VA| Chunking [शिमला\NP से\PP] मनाली\NP[ना\RP जाकर\PL]सीधे\RB[दिल्ली\NP से\PP] विमान\NN[ले\VM लो\VA] Parsing शिमला\NP से\PP मनाली\NP ना\RP जाकर\PL सीधे\RB दिल्ली\NP से\PP विमान\NN ले\VM लो\VA|

  4. Chunking in Speech Technology • Chunks correspond to prosodic boundaries Therefore, useful for speech synthesis शिमला से मनाली ना जाकर सीधे दिल्ली से विमान ले लो| शिमला से - मनाली - ना - जाकर - सीधे - दिल्ली से - विमान - ले लो| शिमला - सेमनाली - नाजाकर - सीधेदिल्ली - सेविमान ले - लो| शिमला से मनाली - ना जाकर – सीधेदिल्ली से - विमानले लो|

  5. What is a Chunk? • Theoretical Perspective: • Chunk  {Phrase, Clause, …} • Chunk  {“Modifier + Modified”, “Main verb + Aux”, …} • Cognitive Perspective: • Realized in speech through Prosodic boundaries • Perceived by the speaker as “more connected” • Computational Perspective: • Easy to identify (local context) • Helps in parsing/speech processing

  6. Abney’s CHUNKS • “the non-recursive core of an intra-clausal constituent, extending from the beginning of the constituent to its head, but not including post-head dependents.” – Abney, 1995 • Maximal: a chunk that is not contained inside another chunk • Philosophy: linguistic theories should explain human intuition and performance • Based on the “performance structure” of the native speakers – Gee and Grosjean, 1983; Abney, 1991

  7. Objective of the present study • Empirical investigation of the nature of chunks in Indian languages from a cognitive perspective • Evidence from Prosody • Native speaker intuition • Compare with • Phrase structure • Other suggestions of chunks • Hindi and Bangla • Motivated by (Gee and Grosjean, 1983; Abney, 1991)

  8. Chunks in Indian Languages • Relatively free word order • मैंये काम खत्म कर लूँगा, नहीं तो बाद में समय न मिलेगा • ये काम मैंखत्म कर लूँगा, नहीं तो बाद में समय न मिलेगा • मैंये काम खत्म कर लूँगा, बाद में नहीं तो समय न मिलेगा • Consequences • No concept of verb phrase (Bharatiet al 1995) • Clausal connectors need not indicate clause boundary • Chunk = Local Word Groups (Bharatiet al 1995) • मैं [ये काम] खत्म [कर लूँगा], [नहीं तो][बाद में] समय न मिलेगा

  9. Chunks in Indian Languages (contd.) • LWG in agglutinative languages? • আমি [এই কাজটা] শেষ [করে ফেলবো], কারণ পরে সময় পাবনা৷ Alternatives Suggestions • Maximal recognizable phrases (Ray et al, 2003) • मैं [ये काम][खत्म कर लूँगा], [नहीं तो][बाद में] समय [न मिलेगा] • Nested chunking based on non-intrusive fragments (Das et al, 2005) • [मैं [ये काम][खत्म [कर लूँगा]]], [[नहीं तो][बाद में] समय [न मिलेगा]]

  10. Experimental Methodology • Subjects: 6 native speakers for each language • Subjects were given 10 sentences (text) • Read them out in natural way • Divide every sentence into two parts and then recursively each part into two parts, such that words in each partition are more related to each other. • ((खबर)(सुनते ही))((मैं)(तुंरत)((घर से )(भागा)))

  11. Sentence Selection • Near translations to facilitate comparison across languages • Coverage of various syntactic phenomena • Embedded/relative clauses • Sentence and phrase-level adverbs • Conjuncts • Noun Groups: • Compound Nouns, Named Entities and MWE • Qualifier + Adjectives* + Determiner + Noun • Noun + [complex] Postpositions • Verb Groups: • Polar +Vector + Auxiliaries + Particles • Noun + Verb

  12. Prosodic Structure • Identify major (> 7ms) and minor breaks (> 2.5ms) • Count the number of subjects having breaks शिमला से मनाली ना जाकर सीधे दिल्ली से विमान ले लो| - - - 3| - - 6| - - - 6| - - - সিমলা হয়ে মানালী না গিয়ে সোজা দিল্লী থেকেই ফ্লাইট নিয়ে যান৷ - 2*| - 6| - - - 6| - - - 6| - - -

  13. Performance Structure शिमला से मनाली ना जाकरसीधे दिल्ली से विमान ले लो| (शिमला से मनाली ना जाकर)(सीधे दिल्ली से विमान ले लो|) ((शिमला से मनाली)(ना जाकर))((सीधे दिल्ली से)(विमान ले लो|)) (((शिमला से) मनाली)(ना जाकर))((सीधे (दिल्ली से))(विमान (ले लो|))) शिमला3से2मनाली1ना 3जाकर0सीधे 2दिल्ली3 से1विमान2ले लो शिमला0से1मनाली2ना 0जाकर3सीधे 1दिल्ली0 से2विमान3ले लो

  14. Performance Structure शिमला से मनाली ना जाकरसीधे दिल्ली से विमान ले लो| (शिमला से मनाली ना जाकर)(सीधे दिल्ली से विमान ले लो|) ((शिमला से मनाली)(ना जाकर))((सीधे दिल्ली से)(विमान ले लो|)) (((शिमला से) मनाली)(ना जाकर))((सीधे (दिल्ली से))(विमान (ले लो|))) शिमला3से2मनाली1ना 3जाकर0सीधे 2दिल्ली3 से1विमान2ले लो शिमला0से1मनाली2ना 0जाकर3सीधे 1दिल्ली0 से2विमान1ले लो - - - *| - - | - - - *| - - - 3 2 2 1 1 1 0 0 0 0 शिमला से मनाली ना जाकरसीधे दिल्ली से विमान ले लो|

  15. Final Dataset

  16. Observations • Do the subjects agree on the boundaries? • Yes, always for clause boundaries, and often for phrase boundaries. • Lot of confusion within phrases • Are the prosodic and performance structure similar? • Both show major breaks at clause boundaries • If a major break in one structure, then at least a minor in the other • Are the structures in Hindi and Bangla similar? • Except for a few cases, they are indeed very similar

  17. Observations • Do the subjects agree on the boundaries? • Yes, always for clause boundaries, and often for phrase boundaries. • Lot of confusion within phrases • Are the prosodic and performance structure similar? • Both show major breaks at clause boundaries • If a major break in one structure, then at least a minor in the other • Are the structures in Hindi and Bangla similar? • Except for a few cases, they are indeed very similar दृढ किन्तु बहुत मृदु स्वरों में - 6| - 3| - 3| - 3| - -

  18. Observations • Do the subjects agree on the boundaries? • Yes, always for clause boundaries, and often for phrase boundaries. • Lot of confusion within phrases • Are the prosodic and performance structure similar? • Both show major breaks at clause boundaries • If a major break in one structure, then at least a minor in the other • Are the structures in Hindi and Bangla similar? • Except for a few cases, they are indeed very similar शिमला से मनाली ना जाकर सीधे दिल्ली से विमान ले लो | - - - 3| - - 6| - - - 6| - - - - - *| - *| - - | - *| - - | - *| - - সিমলা হয়ে মানালী না গিয়ে সোজা দিল্লী থেকেই ফ্লাইট নিয়ে যান ৷ - 2*| - 6| - - - 6| - - - 6| - - - - - *| - *| - - | - *| - - *| - - -

  19. Chunks are often larger than LWG [गमले के][टुकडों को] फैक मत देना | - - 4| - - 6| - - - - - *| - - | - | - - खबर सुनते ही मैं तुंरत [घर से] भागा | - - - 6| - - 6| - - - - *| - - | - - *| - - - खबर सुनते ही मैं [घर से] तुंरत भागा | - - - 6| - - - 2| - - - | - - | - *| - - *| - -

  20. Extra-Syntactic Factors Governing Chunk Boundaries • Chunk Length টবের 2*| ভাঙ্গা টুকরো গুলো 5| ফেলে দিয়ো না এক2*|বিশাল 2| পথ অবরোধের 3| আয়োজন করেছিলো • Familiarity to the text তৃণমূল কংগ্রেসের 2| সদস্যরা तृणमूल 3| कांग्रेस के सदस्यों1*|ने • Focus of the utterance • Phonology

  21. Conclusion • Cognitive reality of chunks • Agreement across speakers, structures, languages • Chunks are NOT • Local word groups • Phrases • Chunks are • Completely context dependent • Governed by syntactic plus extra-syntactic factors • A theory of chunks is necessary at least for speech applications

  22. Thank You for your kind attention Contact: monojitc@microsoft.com

More Related