1 / 20

Cooperation for Arabic Language Resources and Tools – The MEDAR Project

Cooperation for Arabic Language Resources and Tools – The MEDAR Project. Bente Maegaard, Mohamed Attia, Khalid Choukri, Olivier Hamon, Steven Krauwer, Mustafa Yaseen Presented by: Bente Maegaard, University of Copenhagen, Co-ordinator of MEDAR. MEDAR: Background and mission. Mission

enye
Télécharger la présentation

Cooperation for Arabic Language Resources and Tools – The MEDAR Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cooperation for Arabic Language Resources and Tools – The MEDAR Project Bente Maegaard, Mohamed Attia, Khalid Choukri, Olivier Hamon, Steven Krauwer, Mustafa Yaseen Presented by: Bente Maegaard, University of Copenhagen, Co-ordinator of MEDAR

  2. MEDAR: Background and mission Mission • Support the development of language technology, language resources and tools for the Arabic language • Important for the people, the economy and the culture in the Arab countries But current efforts are too small and too fragmented • MEDAR is funded by the European Commission, and focuses on the Mediterranean area, but our scope for collaboration is much broader – all Arab countries, all continents – and we also want to include other Semitic languages in the future.

  3. University of Copenhagen, Denmark (coord.) ELDA, France University of Balamand, Lebanon Al-Ahlyya Amman University, Jordan Universiteit Utrecht, The Netherlands ILSP - Athena, Greece RDI, Egypt Birzeit University, West Bank and Gaza Strip ENSIAS, University of Mohammed V Soussi, Morocco CEA, France CNRS, France The Open University, United Kingdom Université Lumière Lyon 2, France IBM, Egypt Sakhr, Egypt MEDAR partners

  4. MEDAR Objectives and ‘streams’ 1) Technical stream • Survey of players, projects, products • BLARK for Arabic • Focus on multilingual tools, develop MT 2) Roadmap stream • Cooperation roadmap • Network creation 3) Dissemination stream

  5. Multilingual sub-project • Focus: Machine Translation • English-Arabic • Into Arabic • Important to use Open Source • Education and training

  6. MT system, corpora • MOSES was chosen as the MT system • Wide community • Already experiments English-Arabic • Previous experience of consortium partners • Basic MOSES system developed by Balamand • Enhanced system provided by IBM Cairo and Dublin City University. • Partners collected parallel corpus, monolingual corpora

  7. Evaluation - 1 Automatic evaluation • 10,000 words evaluation corpus • In 200,000 words masking corpus • Four human translations have been produced, validated Human evaluation

  8. Evaluation - 2 • Second evaluation campaign will take place in June • External participants have been invited and expressed interest

  9. Resources for the community • MT systems, the baselines developed in the project will be made publicly available according to the original licenses (MOSES, Giza++ ..) • Training data, through ELRA, fair conditions • Evaluation package, through ELRA, fair conditions

  10. Cooperation roadmap Roadmap concept • Set goals • Define the steps to get there • Define timeline The MEDAR roadmap covers 3 periods • 2010-2012 • 2012-2014 • 2013-2015

  11. Elements of the roadmap • Players and human resources, education • Technology and R&D • E-infrastructure: internet penetration, mobile penetration • Market A few examples are presented here, please refer to the booklet

  12. Players and human resources, Education Players need skilled work force - not enough HLT experts • We need HLT enabled professionals • Typically one could add • Linguistics, phonetics, language or speech processing – to engineers’ education • Computing, machine learning, language or speech processing – to linguists’ education • Do this in collaboration with other universities in the region, and with e.g. universities in Europe or the US

  13. Players and human resources, Education - 2 • Staff exchange • Student grants • Participation of (more) Arabic partners in EU funded projects MEDAR has chosen this as an area to investigate further Partners will elaborate a cooperation scheme

  14. Technology • BLARK - Basic building blocks: LR and tools • Reusable • Can be shared with other players • Follow standards • We need more resources and tools for Semitic languages, and they need to be shared. Free or cheap. • Essential for education, research and first development

  15. Technology - 2 Driving applications • Fight illiteracy through HLT – speech enabled software etc • Collaborate to make this happen • Governments could introduce eGovernment etc. • Many basic technologies are needed • Discussion ongoing with other parties • Agree what they are • Agree on distribution of tasks, if possible

  16. E-infrastructure - Internet users

  17. Penetration rates

  18. Market Important factors • Piracy (38% worldwide, 60% in Middle-East and Africa) • Fight piracy – this is ongoing • Provide IT services, not products which can be copied

  19. Conclusions • Long-term goal of MEDAR • Create better conditions for the development of language and speech technology for Arabic – in order to support the people, the culture, the economy • Through collaboration and networking • Therefore we welcome all comments and invite for a broad cooperation, • Not only for Arabic, also for other Semitic languages. • And also with partners outside the EU/Mediterranean Arabic countries

  20. MEDAR Acknowledgement: All MEDAR partners Mediterranean Arabic Language and Speech Technology See the full Roadmap report and other information at www.medar.info

More Related