1 / 17

Hindi Wordnet at IIT Bombay

Hindi Wordnet at IIT Bombay. Current Team: Pushpak Bhattacharyya, Prabhakar Pandey, Laxmi Kashyap, Salil Joshi, Arun Karthikeyan, Prachur Goel and many previous PhD, Masters and Bachelor Students and Research Staff. Great Language Diversity of India. Languages and the speaker population.

skule
Télécharger la présentation

Hindi Wordnet at IIT Bombay

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hindi Wordnet at IIT Bombay Current Team: Pushpak Bhattacharyya, Prabhakar Pandey, Laxmi Kashyap, Salil Joshi, Arun Karthikeyan, Prachur Goel and many previous PhD, Masters and Bachelor Students and Research Staff

  2. Great Language Diversity of India

  3. Languages and the speaker population

  4. Languages and the speaker population (contd.)

  5. Major Language Processing Initiatives • Mostly from the Government: Ministry of IT, Ministry of Human Resource Development, Department of Science and Technology • Recently great drive from the industry: NLP efforts with Indian language in focus • Google • Microsoft • IBM Research Lab • Yahoo • TCS IIT Bombay Natural Language Processing Group heavily supported by Government and Industry

  6. What is Hindi Wordnet • Wordnet – A lexical database • Hindi Wordnet Inspired by the English WordNet • Built conceptually • Synsets or the Synonymy Sets are the basic building blocks • Different organizing principles for different syntactic categories

  7. Example Entry in Hindi Wordnet • Synset {गाय,गऊ, गैया, धेनु} {gaaya ,gauu, gaiyaa, dhenu}, Cow • Gloss • Text definition सींगवाला एक शाकाहारी मादा चौपाया (siingwaalaa eka shaakaahaarii maadaa choupaayaa) (a horny, herbivorous, four-legged female animal) • Example sentence हिन्दू लोग गाय को गो माता कहते हैं एवं उसकी पूजा करते हैं। (hinduu loga gaaya ko go maataa kahate hain evam usakii puujaa karate hain) (The Hindus considers cow as mother and worship it.)

  8. Relations in Wordnet • Synonymy • Hypernymy / Hyponymy • Antonymy • Meronymy / Holonymy • Gradation • Entailment • Troponymy

  9. चौपाया,पशु (chaupaayaa, pashu) Four-legged animal शाकाहारी (shaakaahaarii) herbivorous Hypernym पूँछ (puunchh ) Tail गाय, गऊ (gaaya ,gauu) Cow Attribute meronym सींगवाला एक शाकाहारी मादा चौपाया (siingwaalaa eka sakaahaarii maadaa choupaayaa) A horny, herbivorous, four-legged female animal) Gloss थन (thana) udder Hyponym Ability Verb पगुराना( paguraanaa) ruminate Antonym कामधेनु kaamadhenu A kind of cow मैनी गाय mainii gaaya A kind of cow बैल (baila) Ox WordNet Sub-Graph: Hindi

  10. Statistics

  11. Impact, Use and Visibility of Hindi Wordnet Free download with API under GPL Available from LDC (linguistics data consortium), Upenn: topmost linguistic data repository in the worlds Commercial license purchased by Google for work on Indian language search engine To be available from ELRA: language data repository of Europe Available from LDC-IL: LDC of India

  12. Impact, Use and Visibility of created resources (continued) Daily reference form all over the world More than 2 Lakh hits so far since 2006 More than 3000 downloads Pivot for wordnets of many Indian languages Base resource used by many researchers for IL work on translation, summarization, cross lingual search

  13. Hindi Wordnet giving rise to other Indian Language wordnets Bengali Wordnet Dravidian Language Wordnet Sanskrit Wordnet Punjabi Wordnet Hindi Wordnet North East Language Wordnet Marathi Wordnet Konkani Wordnet English Wordnet

  14. Linked wordnets Immense Lexical Resource Great benefits to machine translation, cross lingual search Very useful for language teaching, pedagogy, comparative linguistics Akin to Eurowordnet, but critical differences due to typical Indian language characteristics

  15. Pan-India Dictionary Standard based on wordnet

  16. Recognition • P.K.Patwardhan Award of IIT Bombay, 2008 • Research Grant from Microsoft Research India for Multilingual database creation based on Hindi Wordnet • IBM India research grant for Unstructured Information Management with Hindi Wordnet as component

  17. International Global Wordnet Conference, Jan 31-Feb 4, 09 A major International Event Granted to IIT Bombay Because of The success Of Hindi Wordnet

More Related