1 / 36

Swaran Lata , Director and HoD slata@mit

Challenges of development of Language Technology and services in multicultural and multilingual Indian Scenario. Swaran Lata , Director and HoD slata@mit.gov.in Technology Development for Indian Languages Programme (TDIL) Dept of Information Technology , Govt. of I ndia.

rafiki
Télécharger la présentation

Swaran Lata , Director and HoD slata@mit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges of development of Language Technology and services in multicultural and multilingual Indian Scenario SwaranLata, Director and HoD slata@mit.gov.in Technology Development for Indian Languages Programme (TDIL) Dept of Information Technology , Govt. of India

  2. Organization of Presentation • India – cultural diversity • Linguistic Diversity in India • Present Knowledge Society and Indian Scenario • ICT scenario in India • Internet penetration – Haves & Have Not-s • Mind-set - Still an inhibition • Bridging the gap – Service delivery –reaching the citizens doorsteps • Localization – Key enabler • Challenges and Issues • TDIL’s efforts • National Roll Out Plan – A big Step forward • Localization of Applications • Putting Standards in place • Collaboration and Hand-holding

  3. India – A civilization of more than 5000 years old • Vast ancient knowledge base • Diverse culture and heritage –probably one of the most spectacular in the world • One of largest economy in the present world • Rapid strides in Information and communications technology • Yet .. Widening divide in terms of knowledge amongst various strata of citizens

  4. Linguistic Diversity in India • According to Census 2001 India has 122 major languages and 2371 dialects. • Out of 122 languages 22 are constitutionally recognized languages. • Linguistic Diversity is very rich and wide in India • One Language –many script • Many Language –one script • Culturally different depending on region though using same script for different languages. • Even wide difference for same language across different country

  5. Marathi Hindi Though same script – Devanagari – Content wise variation for Hindi and Marathi – Depicting cultural and linguistic difference

  6. Present ICT scenario in India • Despite a reputation as an emerging technology powerhouse, India’s scores on the 2009 Connectivity Scorecard are poor in the vital consumer and business segments. • These poor scores should not be surprising, since many of the individual metrics that we utilise are effectively measuring “penetration rates.” • This means that India is judged as a whole, and not by the pockets of ICT excellence that it undoubtedly possesses. • India scores especially low on broadband and Internet penetration rates. • Broadband penetration in India is below 2 percent of households compared to 20 percent of households or more in Turkey, Chile, and Mexico . • On the consumer usage front, India is not a strong performer in terms of Internet usage, with below 10 percent of the population regularly using the Internet. The country is hampered by a relatively low literacy rate

  7. Global Broadband divide India still in low broad-band penetration region http://www.itu.int

  8. Low Rural Tele-density . Compared to urban one

  9. Mind-set : Still favouring English as medium of excellence • English and Hindi serves and link languages • English Learning viewed as a passport to better economic and social prospects. - Even people from low income strata now considers this. • Due to surge in the ICT and ICT enabled services in recent time , English now has become 2nd highest medium of instruction from school level • Study by National University for Education Planning and Administration (NUEPA): -- In SarbaSikshaAbhiyan no of students opting for English grew by 150% between 2003-08 while the corresponding fig of Hindi is only 32% • Example : Uttar-Pradesh , West Bengal and .. Now using English medium of instruction for schools and colleges

  10. Result : • Though , Hindi (ranked 3rd) and Bengali (ranked 8th) are among the top 10 language spoken across the world- but, no Indian language is in the top 10 languages used in the Internet. • Minuscule Internet usage in Indian Languages • Confinement of Knowledge • Low usage of knowledge sources and applications

  11. UNESCO’s VISION for Multilingualism in Cyberspace • Language constitutes the foundation of communication and is fundamental to cultural and historical heritage. • Increasingly, knowledge and information are key determinants of wealth creation, social transformation and human development. • Language is the primary vector for communicating knowledge and traditions, thus the opportunity to use one’s language on global information networks such as the Internet will determine the extent to which one can participate in the emerging knowledge society. • Thousands of languages worldwide are absent from Internet content and there are no tools for creating or translating information into these excluded tongues. • Huge sections of the world’s population are thus prevented from enjoying the benefits of technological advances and obtaining information essential to their wellbeing and development.

  12. An uneven growth Indian Software Export Industry growing at a very fast pace in their global presence However , Root is not expanding its base within the country Fallout : Domestic requirement is not being looked into within the country using Indian Languages Result : Non-availability of Information and Knowledge to the vast section of the citizen Expanding Software Export Low penetration in Indian Market

  13. Requirements : Reaching out to the door steps of citizens offering better services for wider dissemination of knowledge . Localization of Software Solutions , contents and services as per local requirements .

  14. Common Services Centre –Its objectives • CSC is a strategic cornerstone of the National e-Governance Plan (NeGP) – Front end service Interface for major G2C services • CSC is one of the three infrastructure pillars of e-governance which the government is committed to building, to ensure “anytime anywhere” web enabled delivery of government services. • To provide e-governance services. • 100,000 CSCs for 600,000 village clusters • To cater to service needs of major rural areas • Being implemented in PPP Model

  15. Local Language Interface – Not a desirable but An essential Component • The success of CSC hinges upon effective delivery of the G2C applications to rural masses • Since most of the citizens communicate in their local languages – Local Language Interface to G2C solutions at CSC is essential • Hosting of content in local languages helps citizens to interact in a better way in today’s knowledge society • Thus , Local Language Interface is “Not a desirable but An essential Component”

  16. NeGP – Mission Mode Projects Education Gram Pts Munici palities National ID Pensions Central Excise Road Transport Police India Portal e-Posts EDI GIS e-Courts eBiz Land Records Land Regn e-Office Core Policies Banking Insurance Gateway e-Procure Common Service Centres Passport Visa MCA21 Income Tax Comrl Taxes Treasuries Employment Exchanges Civil Supplies Agriculture Initiatives already taken to enable G2C applications such as Land Records , Civil Supplies and Municipal applications with Indian Language Interface

  17. Service Delivery Model of CSC Requires Language Interface

  18. Localization Requirements for Service Delivery Applications • To ensure seamless access of services, language Component /Localization and interface requires at: • Storage level – Server end • Date Exchange – Traffic (Language tags needs to be properly embedded • Display & Rendering • Language Interface for differently -abled citizens for more inclusive societal benefits

  19. Globalization of IT

  20. Globalization & Localization

  21. Standards Key Enablers Localization Tools Locale Data Repository Localization Awareness Training Technologies Linguistic Resources Certification

  22. Complexities Quality Assurance • Testing methodologies • Metrics for Linguistic Testing • Certification by Government for • linguistic compliance Language Technologies Training • Machine Translation • Optical Character Recognition • Speech Technologies • Cross Lingual Information Retrieval • Certified Localization professionals • PG Specialization in Localization • PhD Programmes Locale Data • Presentation of dates, times, numbers, lists, and other values. • Collation and sorting • Alternate calendars, which may include holidays, work rules, weekday/weekend. • Currency • Tax or regulatory regime Standards • Encoding Standards • Multimodal input device standards • Fonts & Rendering Engines • Transliteration & Translation Education & Outreach • Guidelines • Best Practices • Case Studies • Consultancy • Showcasing of Tools • & Technologies Localization Tools • Project Management • Translation Memory • Translation Tools • Natural language for text processing: parsing, spell checking, and grammar checking etc • Automatic Testing Tools Linguistic resources Shipping issues • Parallel Corpora • Speech Corpora • Lexical resources • Ontologies • Dictionaries • Thesaurus • Reference Terminologies • Minimizing Time lag • Benchmarking w.r.t. English version • Political sensitivity • Pricing issues The Tree of Localization Complexities

  23. Globalization and Localization Issues Language Issues Language issues are the result of differences in how languages around the world differ in display, alphabets, grammar, and syntactical rules. • Bidirectional scripts • Capitalization, Uppercasing and Lowercasing • Code Pages • Complex Script Awareness • Fonts • Input Method Editors • Keyboards • Line and Word Breaks • Mirroring Awareness • Unicode

  24. Formatting Issues • From the user's perspective, formatting issues are the primary source of discrepancies when working with applications originally written for another language or culture/locale. • Developers should use the National Language Support (NLS) APIs in Windows or the System. • Globalization Namespace to handle most of these issues automatically. • Globalization Namespace. • Addresses • Currency • Dates • Numerals • Paper Sizes • Telephone Numbers • Time • Units of Measure

  25. Localization- Tool for increasing Financial Sustainability • Training of local youth in Localized Content Creation • Working with Self Help Groups to up-lift their business • Identify Dynamically changing Local Content which helps in their local professions • E-Tutor • Entertainment during non-official hours

  26. TDIL’s Efforts • More than a decade’s sustained and major national initiative • Leading to development and consolidation of various language Tools , resources and components • Continuous and untiring representation in various International and National Standards bodies- ISO ,UNICODE, W3C, IETF , ELRA and BIS • Represented and included 22Indian Languages in UNICODE • First time in India to launch consortium mode projects in the technology intensive areas of Machine Translation , Cross-lingual Information Access, Text to Speech etc - to develop state of the art technologies in Indian languages • Promotes futuristic research in Language Technology

  27. National Roll-Out Plan –A Big Step Forward • CDs containing Software Tools and Fonts for all 22 Officially Recognized Languages released in public domain for free use • Contains Fonts, Localized Open Office, Keyboard drivers, E-mail clients and Firefox browsers in Indian languages • Freely downloadable from Indian Language Data centre – http://www.ildc.gov.in • Already crossed ~ 41 lakhs downloads and 7.0 lakhs shipments • NASSCOM may take active role towards proliferating the benefits of these language CDs • These free CDs would also benefit NGOs and CSC operators for developing and promoting local language contents.

  28. CDs containing Indian Language Software Tools

  29. Putting Standards in place UNICODE • UNICODE – Default Text Encoding Standard. • Compatible with ISO 10646 • Seamless data storage and search if data is stored in UNICODE • All 22 Officially recognized Indian Languages including Vedic Sanskrit represented in UNICODE • Declared as Text Encoding Standard for All E-Governance Applications

  30. Extracting Knowledge from our vast ancient knowledge base UNICODE Encoding for Vedic Sanskrit , Grantha scripts : Key towards computerization of knowledge base

  31. Capturing Region Specific Requirements : Common Locale Data Repository (CLDR) • The Unicode CLDR provides key building blocks for software to support the world's languages. • CLDR is by far the largest and most extensive standard repository of locale data. • This data is used by a wide spectrum of companies for their software internationalization and localization: adapting software to the conventions of different languages for such common software tasks as formatting of dates, times, time zones, numbers, and currency values; sorting text; etc. • Locale Data for Indian Languages are in the process of modification • Six Languages CLDR Hindi , Nepali, Bengali , Assamese, Malayalam and Gujarati are finalized. • Other languages in process

  32. Example of CLDR: Hindi All Region specific requirements have been captured and put in Hindi Locale repository

  33. Putting Standards in place… Contd. W3C W3C • World-Wide –web Consortium (W3C) develops web standards for interoperable web solutions across platform, devices and access methodology • Ensures interoperability across major browsers, IE, Firefox, Opera etc. • Work already started to represent all Indian Language representation in W3C standards. • Desirable – Pro-active Industry & Industry Body like NASSCOM participation

  34. Putting Standards in place…Contd. • Keyboard Layouts • Open Type Fonts.. SakalBharti Fonts • Locale Data • Language Tag. (For Language Negotiation in Internet) • Domain Names in Indian Languages • IT Terminology … and Standards for major Linguistic Resources and Tools

  35. Collaboration and Hand Holding • Collaborative efforts required for wider proliferation and sustained initiatives. • Govt., Industry Bodies and Academia needs to join hand to address the challenges of Local Language Computing and to promote and bring services closer to doorsteps to millions of citizens in their own languages

  36. धन्यवाद Thank You SwaranLata, Director and HoD slata@mit.gov.in Contact:011-24364365

More Related