EXTRACTING COMPLEX PREDICATES IN HINDI ACROSS PARALLEL CORPORA
190 likes | 331 Vues
This guide provides a detailed methodology for detecting complex predicates (CPs) in Hindi through parallel English-Hindi corpora. A complex predicate is defined as a multi-word expression functioning as a single verb, offering expressive richness to the language. The work emphasizes identifying light verbs and their morphological forms, as well as the semantic distinctions necessary for successful CP detection. The significance of this task lies in enhancing linguistic analysis and resource creation for applications like WordNet construction.
EXTRACTING COMPLEX PREDICATES IN HINDI ACROSS PARALLEL CORPORA
E N D
Presentation Transcript
EXTRACTING COMPLEX PREDICATES IN HINDI ACROSS PARALLEL CORPORA GUIDE : Prof. AmitabhaMukerjeeBy : Amit Kumar (10074)AnkitModi (10104)
INTRODUCTION A Complex Predicate (CP) is a multi-word compound that functions as a single verbEx : उसनेकिताबवापसकरदियामुझेबच्चोंकेमाता-पिताओंकेसाथकाम करना भीअच्छालगताहैजो कि अक्सरसलाहलेने आते हैं - यह जानकार खुशी होती है कि आप किसी की मदद कर सकते हैं |
INTRODUCTION CP = Word+ Light VerbEx : उसनेकिताबवापसकरदिया करदिया (CP) = कर(W) + दिया (LV) A Light Verb is a verb that has little semantic content of its own and it therefore forms a predicate with some additional expression, which is usually a noun.Ex : देना, लेना, पाना, उठाना
PROBLEM STATEMENT Given a parallel EnglishHindi corpora, we have to detect complex predicates (CPs) Using the fact that a CP is a multiword expression with its meaning being distinct from the light verb (LV).
MOTIVATION CPs improve expressiveness of a language and Hindi is abundant in it
MOTIVATION CPs improve expressiveness of a language and Hindi is abundant in it Detection of CPs is a tough task
MOTIVATION CPs improve expressiveness of a language and Hindi is abundant in it Detection of CPs is a tough task Their detection provides important resource for tasks such as Wordnet construction, Linguistic analysis etc
Aligned English-Hindi corpus I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help Framework मुझेबच्चोंकेमाता-पिताओंकेसाथकाम करना भीअच्छालगताहैजो कि अक्सरसलाहलेने आते हैं - यह जानकार खुशी होती है कि आप किसी की मदद कर सकते हैं |
Aligned English-Hindi corpus Search for Hindi LV & its morphological forms I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help Framework मुझेबच्चोंकेमाता-पिताओंकेसाथकाम करना भीअच्छालगताहैजो कि अक्सरसलाहलेनेआते हैं - यह जानकार खुशीहोती है कि आप किसी की मदद करसकते हैं |
Aligned English-Hindi corpus Search for Hindi LV & its morphological forms Search for equivalent English meaning of LVs I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help Framework मुझेबच्चोंकेमाता-पिताओंकेसाथकाम करना भीअच्छालगताहैजो कि अक्सरसलाहलेनेआते हैं - यह जानकार खुशीहोती है कि आप किसी की मदद करसकते हैं |
Aligned English-Hindi corpus Search for Hindi LV & its morphological forms Search for equivalent English meaning of LVs I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help Framework Scan left of those LVs whose English meaning is not found मुझेबच्चोंकेमाता-पिताओंकेसाथकामकरना भीअच्छालगताहैजो कि अक्सरसलाहलेनेआते हैं - यह जानकार खुशीहोती है कि आप किसी की मददकर सकते हैं |
Aligned English-Hindi corpus Search for Hindi LV & its morphological forms Search for equivalent English meaning of LVs I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help Framework Collect the Hindi word (W) if it is not a stop word or else keep scanning Scan left of those LVs whose English meaning is not found मुझेबच्चोंकेमाता-पिताओंकेसाथकामकरना भीअच्छालगताहैजो कि अक्सरसलाहलेनेआते हैं - यह जानकार खुशीहोती है कि आप किसी की मददकर सकते हैं |
Aligned English-Hindi corpus Search for Hindi LV & its morphological forms Search for equivalent English meaning of LVs I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help Framework CP = W+LVunless W is an exit word Collect the Hindi word (W) if it is not a stop word or else keep scanning Scan left of those LVs whose English meaning is not found मुझेबच्चोंकेमाता-पिताओंकेसाथकामकरनाभीअच्छालगताहैजो कि अक्सरसलाहलेनेआते हैं - यह जानकार खुशीहोती है कि आप किसी की मददकर सकते हैं |
Sample Results As of now, we have extracted 10,000 CPs But we need to add more morphological forms in Hindi LV list.
Resources • English- Hindi parallel Corpora:http://ufal.mff.cuni.cz/euromatrixplus/downloads.html • List of Hindi Light Verbs : Reverse Complex Predicates by ShakthiPoornima, Department of Linguistics, SUNY university of Buffalo • Morphological forms of English verbs :http://www.englishpage.com/irregularverbs/irregularverbs.html • Morphological forms of Hindi verbs : Extracted from the large Hindi corpus (Blog Corpus)
References • [1] Mining Complex Predicates In Hindi Using A Parallel HindiEnglish Corpus, R. Mahesh K. Sinha, IIT Kanpur • [2] Detecting Complex Predicates in Hindi using POS Projection across Parallel Corpora, AmitabhaMukerjee, AnkitSoni and Achla M Raina, IIT Kanpur • [3] Complex Predicates in Indian Languages and wordnets. Pushpak Bhattacharyya, DebasriChkrabarti and Vaijayanthi M. Sarma. Language Resources and Evaluation 40(34): 331355 • Wikepedia: 1. http://en.wikipedia.org/wiki/Light_verb2. http://en.wikipedia.org/wiki/Compound_verb
Thank you Questions ?
Other Approaches [2] This problem was solved using word alignment and POS tagging of parallel sentences [3] Derivation of complex predicates has also been dealt with linguistically and computationally CPs had been mined using computational methods and then, were categorized using statistical analysis [Sriram and Joshi, 2005]. Chakrabarti et al (2008) present a method for automatic extraction of CPs only from a corpus based on linguistic features