1 / 34

Infrastructures and plans boosting Language Technology Research and Innovation

Infrastructures and plans boosting Language Technology Research and Innovation. Stelios Piperidis Athena RC, Greece spip@ilsp.gr. Multilingual Europe.

saul
Télécharger la présentation

Infrastructures and plans boosting Language Technology Research and Innovation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Infrastructures and plans boostingLanguage Technology Research and Innovation Stelios Piperidis Athena RC, Greece spip@ilsp.gr

  2. Multilingual Europe Challenge: Providing each language community with the most advanced technologies for communication and information so that maintaining their mother tongue does not turn into a disadvantage. While research has made considerable progress in recent years, the pace of progress is not fast enough to meet the challenge within the next 10-20 years. All stakeholders – researchers, LT user and provider industries, language communities, funding programmes, policy makers – should team up for a major dedicated push.

  3. Objectives META-NET is a network of excellence dedicated to fostering the tech-nological foundations of the European multilingual information society.

  4. Four EU-Funded Projects http://www.meta-net.eu/members Initial project: T4ME (FP7; 13 partners, 10 countries) Three ICT-PSP consortia since Feb. 2011: CESAR, METANET4U, META-NORD All EU member states and several non-member states covered. META-NET in Nov. 2012: 60 members in 34 countries.

  5. META-VISION Language White Paper Series

  6. Language White Paper Series • Reports on the state of our languages inthe digital age and the level of support through language technology. • Series covers 30 languages. • Key communication instruments to address decision makers and journalists. • Inform about societal and technological problems and challenges as well as economic opportunities. • >2 years in the making. • >200 national experts as contributors. • >8.000 copies printed and distributed to politicians and journalists.

  7. 30 Languages Covered * = Official EU language • Basque • Bulgarian* • Catalan • Czech* • Danish* • Dutch* • English* • Estonian* • Finnish* • French* • Galician • German* • Greek* • Hungarian* • Icelandic • Irish* • Italian* • Latvian* • Lithuanian* • Maltese* • Norwegian • Polish* • Portuguese* • Romanian* • Serbian • Slovak* • Slovene* • Spanish* • Swedish* • Croatian

  8. Cross-Lingual Ranking • In four application areas, each language is assigned to one of five clusters, ranging from excellent LT support to weak/no support: • Machine Translation • Speech Processing • Text Analysis • Resources • Results finalised at a meeting in Berlin with representatives of all 30 languages (October 21/22, 2011).

  9. excellent good moderate fragmentary weakornosupport MT English French, Spanish Catalan, Dutch, German, Hungarian, Italian, Polish, Romanian Basque, Bulgarian, Croatian, Czech, Da-nish, Estonian, Finnish, Galician, Greek, Icelandic, Irish, Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian, Slovak, Slovene, Swedish excellent good moderate fragmentary weak or no support Text Analysis English Dutch, French, German, Italian, Spanish Basque, Bulgarian, Catalan, Czech, Danish, Finnish, Galician, Greek, Hungarian, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovene, Swedish Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese, Serbian excellent good moderate fragmentary weak or no support Speech English Czech, Dutch, Finnish, French, German, Italian, Portuguese, Spanish Basque, Bulgarian, Catalan, Danish, Estonian, Galician, Greek, Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene, Swedish Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian excellent good moderate fragmentary weak/no support Resources English Czech, Dutch, French, German, Hungarian, Italian, Polish, Spanish, Swedish Basque, Bulgarian, Catalan, Croatian, Danish, Estonian, Finnish, Galician, Greek, Norwegian, Portuguese, Romanian, Serbian, Slovak, Slovene Icelandic, Irish, Latvian, Lithuanian, Maltese

  10. Europe’s Languages and LT English Dutch French German Italian Spanish Basque Bulgarian Danish Galician Greek Norwegian Romanian Slovak Slovene Croatian Estonian Icelandic Irish Latvian Lithuanian Maltese Serbian Catalan Czech Finnish Hungarian Polish Portuguese Swedish good support through Language Technology weak orno support

  11. Key Observations When it comes to Language Technology support, there are massive differences between Europe’s languages and technology areas. LT support for English is ahead of any other language. Even support for English is far from being perfect. The gap between English and the other languages keeps widening! Several languages – Icelandic, Latvian, Lithuanian, Maltese – receive the weakest score in all four areas! At least 21 European languages in danger of digital extinction!(Languages put into the “weak or no support” category at least once.)

  12. META-VISION Strategic Research Agenda

  13. Three Ingredients Appropriate Programme Vision & Agenda Appropriate Actors Research & Commercialisation Appropriate Support Funding

  14. Strategic Research Agenda • META-NET Strategic Research Agenda for Multilingual Europe 2020. • Addresses the problems we found during the white paper study. • Three priority research themes and application/innovation scenarios. • Can put Europe ahead of its competitors in this technology area. • 190+ contributors. • Final version ready today! • SRA will be presented to the EC and national bodies.

  15. Strategic Research Agenda

  16. Priority Themes: 3 + 2 • Three Priority Research Themes: • Translation Cloud • Social Intelligence and e-Participation • Socially-Aware Interactive Assistant • Two additional themes: • European Language Technology Platform • Core Technologies for Language Analysis and Production

  17. META-SHARE Open Resource Infrastructure

  18. The power of data • Scientific data has the potential to transform and drastically improve our lives • Evidence from many domains – geo & earth sciences, biotechnology – shows data & tools become valuable through opening and sharing • Both for research and technology development & evaluation • Supporting innovative applications • Making the Human Genome Project results accessible, leveraged ~ €3 billion R&D investment, ~ €500 billion in economic activity • “Alzheimers’ researchers recently pooled genetic data and discovered 5 new genes and important evidence about the disease” • “Data is too valuable to be locked away” http://www.meta-net.eu 19

  19. Strategic Research Agenda http://www.meta-net.eu 21

  20. LRs in the SRA http://www.meta-net.eu 22

  21. LRs Discovery? Availability? • According to past and recent studies only a portion of language resources (LRs) is known/ announced / shared / traded / ... • … despite the fact that data collection, cleaning, annotation, curation and maintenance is a very costly business • To make any progress, enable the development of useful applications, we need all those scientific, technical, legal, organisational, societal mechanisms that enable the necessary resources to be shared, recycled, repurposed http://www.meta-net.eu 23

  22. META-SHARE rationale • Language resources (data and tools) are dynamic living entities • they evolve over time in various dimensions (quantity, annotation levels, conversion to new formats, addition of new languages) • they are usually the product of collaborative work • they may come with varying restrictions, ... • Need solutions that enable every language resource provider, at any granularity level (individual/lab/organisation), to • Create his own repository of LRs • Describe, document and update LR descriptions • Link to a network of repositories of other providers • Keep trackof the use of his LRs, tradeLRs, … • Need solutions that enable every language resource consumerto • Discover what LRs suitable for his/her purposes exist • Get information about, download / acquirethem http://www.meta-net.eu 24

  23. META-SHARE: what it is • META-SHARE tries to match LR providers and consumers needs and expectations by enhancing visibility, documentation, identification, availability, preservation of language data and (basic language processing) tools • It launches a long-term multidimensional endeavour by which language resources willcontribute toboosting research, technology and innovation through wide availability, pooling, openness and sharing http://www.meta-net.eu 25

  24. Inventory Inventory Inventory Inventory LR repo LR repo LR repo LR repo External repos META-SHARE inventory META-SHARE inventory META-SHARE inventory … META-SHARE architecture User oriented and support services META-SHARE portal Registration – authentication - authorisation Search / browse licence download statistics mappings reporting recommenders Billing / payment Resources provision services metadata harvesting http://www.meta-net.eu 26

  25. META-SHARE provider side • All facilities for creating your own META-SHARE-compliant repository and linking to the META-SHARE network : • Open source repository software • Functionalities for documenting, updating descriptions, storing/linking LRs • Provider support services (helpdesks, forum, knowledge base) • Each repository maintains an inventory with all LRs MD, exports MD for harvesting • Harvested MD are stored in synchronised central servers http://www.meta-net.eu 27

  26. META-SHARE user side • access the actual resources by visiting the respective repositories to get legally interoperable licence(s) to download and use them • get support through an online user forum and helpdesks dedicated to technical, metadata and legal issues • access a knowledge base • Users (LR consumers) can • search the central inventory • browse using multiple facets http://www.meta-net.eu 28

  27. Join META-SHARE as ... Third Party Consumers Associate members Depositing-only Members Repository Service Providers Local repositories Hosting (non-local) repositories Core and User Support Service Providers

  28. Legal provisions for LR sharing Language Resources Sharing Charter – high level principles Memorandum of Understanding – aka membership agreement Licensing templates and deposition agreements Inclusive mix of open and openness inspired models Creative Commons licences (starting with Creative Commons Zero (CC-0) and all possible combinations along the CC differentiation of rights of use) META-SHARE Commons licences, fully developed CC-based licensing tool that allows META-SHARE members to make their resources available inside the network only META-SHARE “No Redistribution” licences, allowing use and exploitation of the Resources while permitting the LR Owner to have full control over the Resource distribution. Software tools and web services are either provided though one of the standard Open Source licenses or under a custom commercial license. http://www.meta-net.eu 30

  29. META-SHARE today… • A network of 24 language resources repositories in 19 EU countries, with >1550 LRs • META-SHARE software, open source, under a permissive licence (BSD), to set up a language resource repository • Legal instruments catering for a range of uses • Software-based services for both LR providers and LR consumers • User support services • User Forum • helpdesks • Mapping services to big resource inventories (CLARIN, OLAC, …) http://www.meta-net.eu 32

  30. In the immediate future… • More META-SHARE nodes and respective language resources will be integrated – integration of ELRA supported initiatives, LRE Map, Language Library • Adoption of the META-SHARE platform and framework by ELRA • Full deployment of the services of META-SHARE members – from software availability, maintenance and technical assistance to language resources storage and preservation as well as support related to metadata and legal issues • Coordination with upcoming initiatives (iCordi, Research Data Alliance, …) • Official launch : 25 January 2013 http://www.meta-net.eu 33

  31. META-NET Conclusions

  32. Conclusions Our white paper press campaign shows that Europe is extremely interested in and passionate about its languages. Two Parliamentary Questions in the European Parliament on the “digital extinction of languages” topic. Now is the time to move forward with a continent-wide, systematic push and to invest in strategic research. A modest investment is required. This push will generate a countless number of opportunities. Horizon 2020 and Connecting Europe Facility can provide sufficient resources to make our visions for Europe’s citizens and economy a reality.

  33. http://www.meta-net.eu 36

  34. Q/A Thank you very much! office@meta-net.eu http://www.meta-net.eu http://www.facebook.com/META.Alliance

More Related