1 / 16

WP2: Metadata generation and glossary creation in eLearning

WP2: Metadata generation and glossary creation in eLearning . Lothar Lemnitzer Review meeting, 29. August 2008 Luxemburg. Outline. Achievements at M24 Improvement of tools, second cycle. Achievements (1). Achievements reached in the first two years of the project:

tansy
Télécharger la présentation

WP2: Metadata generation and glossary creation in eLearning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP2: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, 29. August 2008 Luxemburg

  2. Outline • Achievements at M24 • Improvement of tools, second cycle

  3. Achievements (1) Achievements reached in the first two years of the project: • Corpora of learning objects, eight languages, linguistic annotation, keywords marked, definitions marked • Keyword extractor for metadata generation(KWE) • Definition extractor for glossary creation (GCD)

  4. Achievements (2) • Integration of the tools into ILIAS LMS via webservices • Quantitative evaluation of the corpora and tools • Validation of the tools in user-centered usage scenarios for all languages (first round)

  5. Activities of the final phase • Improve the performance of the tools • Implement / integrate linguistic processing chains • Proper embedding of tools in the context of teaching and learning ( WP5) • Documentation

  6. Key Word Extractor • Implemented an additional distribution measure (Averaged Reduced Frequency), which measures word commonness • Implemented a voting mechanism, where each method has a vote and keyword candidates are ordered by votes • Slight modifications to the language models

  7. Key Word Extractor – Lexical Chaining Outcome of the German pilot experiment: • Wordnets are too general to capture the LO-specific vocabulary • Domain-ontology is too specific to generate strong chains • Combination of both resources desirable However, the risk of failure to improve results lead us to the decision to abandon of these experiments

  8. Glossary Candidate Detector • Used machine learning to improve precision of the tool (4 languages) • Machine learning methods: Naive Bayesisan classifier, Balanced Random Forests • Machine learning as single method (Polish) vs. ML as post-processing step (Dutch, Portuguese) • ML for the most frequent definition types (Dutch Portuguese) vs. for all definition types (Polish)

  9. GCD Evaluation – Dutch

  10. GCD Evaluation – Polish

  11. GCD Evaluation – Portuguese, Copula-Definitions

  12. Glossary Candidate Detector - Conclusions • ML (in combination with rule-based grammars) achieve state-of-the art results for all languages • The approach to be chosen depends on whether high precision or high recall is preferred • The ML method and the careful choice of the features are critical to the success

  13. Linguistic Processing Chain • Languages: Czech, Dutch, English, Polish, Portuguese and Romanian • Rationale: enables us to add new documents in the mentioned languages • Based on linguistic processing tools of the partners • Integrated into ILIAS ( WP4)

  14. Documentation • Javadoc documentation of the code (available from the website and sourceforge) • Documentation of the command line interface of stand-alone tools • Documentation of integration procedure and integrated tools / interface ( WP4)

  15. Thank you for your attention

  16. Demo We simulate a tutor who adds a learning objects and generates and edits additional data

More Related