220 likes | 408 Vues
Correlates between Performance, Prosodic and Phrase Structures in Bangla and Hindi Insights from a Psycholinguistic Experiment. Kalika Bali 1 , Monojit Choudhury 1 , Diptesh Chatterjee 2 , Sankalan Prasad 2 , Arpit Maheswari 3 1 Microsoft Research Lab India, 2 IIT Kharagpur, 3 IIT Bombay
E N D
Correlates between Performance, Prosodic and Phrase Structures in Bangla and HindiInsights from a Psycholinguistic Experiment Kalika Bali1, Monojit Choudhury1, Diptesh Chatterjee2, Sankalan Prasad2, Arpit Maheswari3 1Microsoft Research Lab India, 2IIT Kharagpur, 3IIT Bombay Contact: monojitc@microsoft.com
Syntactic Processing Pipeline शिमला से मनाली ना जाकर सीधे दिल्ली से विमान ले लो | POS/ Morphological Analysis शिमला\NP से\PP मनाली\NP ना\RP जाकर\PL सीधे\RB दिल्ली\NP से\PP विमान\NN ले\VM लो\VA| Parsing शिमला\NP से\PP मनाली\NP ना\RP जाकर\PL सीधे\RB दिल्ली\NP से\PP विमान\NN ले\VM लो\VA|
Syntactic Processing Pipeline शिमला से मनाली ना जाकर सीधे दिल्ली से विमान ले लो | POS/ Morphological Analysis शिमला\NP से\PP मनाली\NP ना\RP जाकर\PL सीधे\RB दिल्ली\NP से\PP विमान\NN ले\VM लो\VA| Chunking [शिमला\NP से\PP] मनाली\NP[ना\RP जाकर\PL]सीधे\RB[दिल्ली\NP से\PP] विमान\NN[ले\VM लो\VA] Parsing शिमला\NP से\PP मनाली\NP ना\RP जाकर\PL सीधे\RB दिल्ली\NP से\PP विमान\NN ले\VM लो\VA|
Chunking in Speech Technology • Chunks correspond to prosodic boundaries Therefore, useful for speech synthesis शिमला से मनाली ना जाकर सीधे दिल्ली से विमान ले लो| शिमला से - मनाली - ना - जाकर - सीधे - दिल्ली से - विमान - ले लो| शिमला - सेमनाली - नाजाकर - सीधेदिल्ली - सेविमान ले - लो| शिमला से मनाली - ना जाकर – सीधेदिल्ली से - विमानले लो|
What is a Chunk? • Theoretical Perspective: • Chunk {Phrase, Clause, …} • Chunk {“Modifier + Modified”, “Main verb + Aux”, …} • Cognitive Perspective: • Realized in speech through Prosodic boundaries • Perceived by the speaker as “more connected” • Computational Perspective: • Easy to identify (local context) • Helps in parsing/speech processing
Abney’s CHUNKS • “the non-recursive core of an intra-clausal constituent, extending from the beginning of the constituent to its head, but not including post-head dependents.” – Abney, 1995 • Maximal: a chunk that is not contained inside another chunk • Philosophy: linguistic theories should explain human intuition and performance • Based on the “performance structure” of the native speakers – Gee and Grosjean, 1983; Abney, 1991
Objective of the present study • Empirical investigation of the nature of chunks in Indian languages from a cognitive perspective • Evidence from Prosody • Native speaker intuition • Compare with • Phrase structure • Other suggestions of chunks • Hindi and Bangla • Motivated by (Gee and Grosjean, 1983; Abney, 1991)
Chunks in Indian Languages • Relatively free word order • मैंये काम खत्म कर लूँगा, नहीं तो बाद में समय न मिलेगा • ये काम मैंखत्म कर लूँगा, नहीं तो बाद में समय न मिलेगा • मैंये काम खत्म कर लूँगा, बाद में नहीं तो समय न मिलेगा • Consequences • No concept of verb phrase (Bharatiet al 1995) • Clausal connectors need not indicate clause boundary • Chunk = Local Word Groups (Bharatiet al 1995) • मैं [ये काम] खत्म [कर लूँगा], [नहीं तो][बाद में] समय न मिलेगा
Chunks in Indian Languages (contd.) • LWG in agglutinative languages? • আমি [এই কাজটা] শেষ [করে ফেলবো], কারণ পরে সময় পাবনা৷ Alternatives Suggestions • Maximal recognizable phrases (Ray et al, 2003) • मैं [ये काम][खत्म कर लूँगा], [नहीं तो][बाद में] समय [न मिलेगा] • Nested chunking based on non-intrusive fragments (Das et al, 2005) • [मैं [ये काम][खत्म [कर लूँगा]]], [[नहीं तो][बाद में] समय [न मिलेगा]]
Experimental Methodology • Subjects: 6 native speakers for each language • Subjects were given 10 sentences (text) • Read them out in natural way • Divide every sentence into two parts and then recursively each part into two parts, such that words in each partition are more related to each other. • ((खबर)(सुनते ही))((मैं)(तुंरत)((घर से )(भागा)))
Sentence Selection • Near translations to facilitate comparison across languages • Coverage of various syntactic phenomena • Embedded/relative clauses • Sentence and phrase-level adverbs • Conjuncts • Noun Groups: • Compound Nouns, Named Entities and MWE • Qualifier + Adjectives* + Determiner + Noun • Noun + [complex] Postpositions • Verb Groups: • Polar +Vector + Auxiliaries + Particles • Noun + Verb
Prosodic Structure • Identify major (> 7ms) and minor breaks (> 2.5ms) • Count the number of subjects having breaks शिमला से मनाली ना जाकर सीधे दिल्ली से विमान ले लो| - - - 3| - - 6| - - - 6| - - - সিমলা হয়ে মানালী না গিয়ে সোজা দিল্লী থেকেই ফ্লাইট নিয়ে যান৷ - 2*| - 6| - - - 6| - - - 6| - - -
Performance Structure शिमला से मनाली ना जाकरसीधे दिल्ली से विमान ले लो| (शिमला से मनाली ना जाकर)(सीधे दिल्ली से विमान ले लो|) ((शिमला से मनाली)(ना जाकर))((सीधे दिल्ली से)(विमान ले लो|)) (((शिमला से) मनाली)(ना जाकर))((सीधे (दिल्ली से))(विमान (ले लो|))) शिमला3से2मनाली1ना 3जाकर0सीधे 2दिल्ली3 से1विमान2ले लो शिमला0से1मनाली2ना 0जाकर3सीधे 1दिल्ली0 से2विमान3ले लो
Performance Structure शिमला से मनाली ना जाकरसीधे दिल्ली से विमान ले लो| (शिमला से मनाली ना जाकर)(सीधे दिल्ली से विमान ले लो|) ((शिमला से मनाली)(ना जाकर))((सीधे दिल्ली से)(विमान ले लो|)) (((शिमला से) मनाली)(ना जाकर))((सीधे (दिल्ली से))(विमान (ले लो|))) शिमला3से2मनाली1ना 3जाकर0सीधे 2दिल्ली3 से1विमान2ले लो शिमला0से1मनाली2ना 0जाकर3सीधे 1दिल्ली0 से2विमान1ले लो - - - *| - - | - - - *| - - - 3 2 2 1 1 1 0 0 0 0 शिमला से मनाली ना जाकरसीधे दिल्ली से विमान ले लो|
Observations • Do the subjects agree on the boundaries? • Yes, always for clause boundaries, and often for phrase boundaries. • Lot of confusion within phrases • Are the prosodic and performance structure similar? • Both show major breaks at clause boundaries • If a major break in one structure, then at least a minor in the other • Are the structures in Hindi and Bangla similar? • Except for a few cases, they are indeed very similar
Observations • Do the subjects agree on the boundaries? • Yes, always for clause boundaries, and often for phrase boundaries. • Lot of confusion within phrases • Are the prosodic and performance structure similar? • Both show major breaks at clause boundaries • If a major break in one structure, then at least a minor in the other • Are the structures in Hindi and Bangla similar? • Except for a few cases, they are indeed very similar दृढ किन्तु बहुत मृदु स्वरों में - 6| - 3| - 3| - 3| - -
Observations • Do the subjects agree on the boundaries? • Yes, always for clause boundaries, and often for phrase boundaries. • Lot of confusion within phrases • Are the prosodic and performance structure similar? • Both show major breaks at clause boundaries • If a major break in one structure, then at least a minor in the other • Are the structures in Hindi and Bangla similar? • Except for a few cases, they are indeed very similar शिमला से मनाली ना जाकर सीधे दिल्ली से विमान ले लो | - - - 3| - - 6| - - - 6| - - - - - *| - *| - - | - *| - - | - *| - - সিমলা হয়ে মানালী না গিয়ে সোজা দিল্লী থেকেই ফ্লাইট নিয়ে যান ৷ - 2*| - 6| - - - 6| - - - 6| - - - - - *| - *| - - | - *| - - *| - - -
Chunks are often larger than LWG [गमले के][टुकडों को] फैक मत देना | - - 4| - - 6| - - - - - *| - - | - | - - खबर सुनते ही मैं तुंरत [घर से] भागा | - - - 6| - - 6| - - - - *| - - | - - *| - - - खबर सुनते ही मैं [घर से] तुंरत भागा | - - - 6| - - - 2| - - - | - - | - *| - - *| - -
Extra-Syntactic Factors Governing Chunk Boundaries • Chunk Length টবের 2*| ভাঙ্গা টুকরো গুলো 5| ফেলে দিয়ো না এক2*|বিশাল 2| পথ অবরোধের 3| আয়োজন করেছিলো • Familiarity to the text তৃণমূল কংগ্রেসের 2| সদস্যরা तृणमूल 3| कांग्रेस के सदस्यों1*|ने • Focus of the utterance • Phonology
Conclusion • Cognitive reality of chunks • Agreement across speakers, structures, languages • Chunks are NOT • Local word groups • Phrases • Chunks are • Completely context dependent • Governed by syntactic plus extra-syntactic factors • A theory of chunks is necessary at least for speech applications
Thank You for your kind attention Contact: monojitc@microsoft.com