1 / 46

Prof . Robert Wyatt

Metadata domain-Knowledge Driven Search engine in “hypermanymedia” e-learning Resources University of Louisville Dept. of Computer Engineering and Computer Science & Western Kentucky University Office of distance Learning. Prof . Robert Wyatt. Overview. Motivation & Insight

faith
Télécharger la présentation

Prof . Robert Wyatt

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata domain-Knowledge Driven Search engine in “hypermanymedia” e-learning Resources University of LouisvilleDept. of Computer Engineering and Computer Science & Western Kentucky UniversityOffice of distance Learning Prof. Robert Wyatt

  2. Overview • Motivation & Insight • What is the problem? Why is it an Interesting problem? • Proposed Architecture • System Implementation • Evaluation Methodology • Evaluation measures • Results • Conclusion and Future works

  3. Motivation “Nearly 30% of all U.S. higher education students were taking at least one online course in the fall of 2007, a nearly 20% increase over the number reported the previous year” http://www.sloan-c.org/publications/survey/

  4. Insight of the Project

  5. Insight of the Project

  6. Insight of the Project

  7. Insight of the Project

  8. Insight of the Project

  9. Insight of the Project

  10. Insight of the Project

  11. Insight of the Project

  12. Insight of the Project

  13. Insight of the Project

  14. Insight of the Project

  15. What is the problem? Why is it an Interesting problem? • Western Kentucky University hosts a ”HyperManyMedia” open-source repository of lectures. • Thousands of online lectures are available in different formats: text, power-point, audio, video, podcast, vodcast, and RSS. • This web-based platform is a main medium of communication between WKU online faculty and online students.

  16. What is the problem? Why is it an Interesting problem? • Searching for a specific college, course name, topic, media format is time consuming, and the results are not always accurate. • Searching for combinations of results is impossible (e.g., finding all video lectures in the business college related to accounting).

  17. Platform: “HyperManyMedia”

  18. Platform: “HyperManyMedia”

  19. Motivation

  20. Motivation

  21. Proposed Architecture

  22. Proposed Architecture

  23. System Architecture

  24. Proposed Architecture

  25. Proposed Architecture

  26. System Implementation 1) Domain-knowledge Extraction • As of November 2007, more than 2400 lectures from 11 different colleges: “English”, “Social Work”, “History”,” Chemistry”, “Accounting”, “Math”, “Management”, “Consumer and Family Sciences”, “Architect and Manufacturing Sciences”, “Engineering "and “Communication Disorders" • Each lecture is delivered in six different media formats: • Text • Powerpoint • Audio • Video • Podcast • Vodcast

  27. System Implementation 2) Parsing Learning Objects (Lectures) and Adding Metadata • Parsing the webpages (lectures) • Adding metadata • college name • course name • professor name • lecture name • media format

  28. System Implementation • Nutch searches and indexes components with a powerful fetcher (crawler robot), which is designed to handle crawling, indexing, and searching of several billion frequently updated web pages. • Nutch search engine was implemented in two stages: • first as a ``Generic'' search engine; • second, as an enhanced ``Metadata'' search engine.

  29. System Implementation • Nutch Scoring is based on a combination of the Vector Space Model (VSM) and the Boolean Model. • It applies the Boolean Model first to select the most relevant documents for the query; then, it uses the Vector Space Model as a content-based ranking algorithm.

  30. System Implementation Search Engine Scoring Mechanism

  31. System Implementation 3) Re-conguring Search Engine Scoring • Modified Nutch Search Engine's Boosting Mechanism We changed Nutch's boosting algorithm to accommodate metadata, knowing that Nutch uses (3) • We modiedNutch's boosting score as shown in(4)

  32. System Implementation Designing and Embedding the Parser, Indexer, and Query Plugins

  33. System Implementation 4- Encapsulating the Metadata Search engine within the “HyperManyMedia" Platform

  34. System Implementation 4- Encapsulating the Metadata Search engine within the “HyperManyMedia" Platform

  35. Evaluation Methodology Research Questions: 1) Will there be an increase in precision when using the metadata search engine compared to the generic search engine? 2) Will relevant documents be ranked higher when using the metadata search engine?

  36. Evaluation Methodology • Selection of Queries: • A great deal of research on search engine queries has found that searchers rarely use Boolean operators ; typically, this usage is around 10% . Another study observed that the highest distribution of the number of terms in queries range between 1 and 3, and these are primarily noun phrases. • Accordingly, we ran our comparison between the two search engines (generic) and (metadata) based on ``single-term'', ``two-terms'', and ``three-term'' queries without Boolean operators. • query logs containing queries submitted to our “HyperManyMedia'' search engine during two semesters (fall & winter terms in 2007-2008).

  37. Evaluation Measures • Most of the ranking algorithms evaluate the ranking quality based on precision and recall • One of the limitations of the recall measure is the difficulty of counting the number of relevant documents in the corpus. • We used a new algorithm SEREET for ranking efficiency, which was recently proposed. • This algorithm evaluates the performance of search engines based on a comparison between the order of relevant documents and retrieved documents. • This algorithm starts at 100 points in the top of the rank and deducts points each time that a relevant document is not found.

  38. Evaluation Methodology 2) Precision: Precision is the ratio of the number of relevant documents to all retrieved documents (5)

  39. Evaluation measures 3) Selection of Ranking Algorithm:

  40. Precision Results Will there be an increase in precision when using the metadata search engine? We found that the metadata-driven search engine has a significant impact on the precision with overall precision values equal to 0.810 (for single-term queries), 0.856 (for two-term queries), and 0.925 (for three-term queries), compared to 0.619 (for single-term queries), 0.717 (for two-term queries), and 0.851 (for three-term queries) for the generic search engine.

  41. Precision Results

  42. SEREET Results (2) Will relevant documents be ranked higher when using a metadata search engine? We found that the metadata-driven search engine has a significant impact on the ranking performance with overall values of SEREET equals to 0.803 (for single-term queries), 0.846 (for two-term queries), and 0.914 (for three-term queries), compared to 0.597 (for single-term queries), 0.684 (for two-term queries), and 0.834 (for three-term queries)) for the generic search engine.

  43. SEREET Results

  44. Conclusion • In this work, we presented a metadata domain-knowledge driven search engine in ``HyperManyMedia'' E-learning resources. • Our results of Precision and SEREET ranking showed a significant improvement in retrieving relevant resources to the submitted queries when we used the metadata search engine.

  45. Future Work • Hybrid metadata and a semantically enriched search engine which will be built on top of the domain-knowledge (of E-learning) • Personalized Ontology learners' profiles • Visualize online students communities with their associated learning objects and their relationships.

  46. Questions?

More Related