1 / 26

AN ADAPTATION OF THE VECTOR-SPACE MODEL FOR ONTOLOGY-BASED INFORMATION RETRIEVAL

AN ADAPTATION OF THE VECTOR-SPACE MODEL FOR ONTOLOGY-BASED INFORMATION RETRIEVAL. Authors- Pablo Castells, Miriam Ferna´ndez , and David Vallet PRESENTED BY-AMALA RANGNEKAR. OVERVIEW. INTRODUCTION EARLIER MODELS’ ISSUES PROPOSED SYSTEM SEMI-AUTOMATIC ANNOTATION WEIGHING ANNOTATIONS

ham
Télécharger la présentation

AN ADAPTATION OF THE VECTOR-SPACE MODEL FOR ONTOLOGY-BASED INFORMATION RETRIEVAL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AN ADAPTATION OF THE VECTOR-SPACE MODEL FOR ONTOLOGY-BASED INFORMATION RETRIEVAL Authors- Pablo Castells, Miriam Ferna´ndez, and David Vallet PRESENTED BY-AMALA RANGNEKAR

  2. OVERVIEW • INTRODUCTION • EARLIER MODELS’ ISSUES • PROPOSED SYSTEM • SEMI-AUTOMATIC ANNOTATION • WEIGHING ANNOTATIONS • ANNOTATION ISSUES • QUERY PROCESSING • RANKING ALGORITHM • ISSUES, COMBSUM SOLUTION • EXPERIMENTS • FINAL OBSERVATIONS • COMPARISON WITH CONVENTIONAL SYSTEM • STRENGTHS • CURRENT ISSUES

  3. INTRODUCTION • Most search engines use keyword based techniques to return documents in response to user queries. • This approach is Boolean: ‘yes/no’ • A more intelligent IR using semantic search is necessary in combination with the present method. • Any reasons/examples as to why?

  4. EG. US POPULATION FIG.1

  5. EARLIER MODELS’ ISSUES • The Absence of a ‘weight’ for each term in the query. • ‘RELEVANCE’ of a term is not proportional to its ‘FREQUENCY’ . • Not making use of ‘RARITY’ of a term. HOW WOULD THIS HELP?? Eg. Arachnocentric (of spiders)

  6. PROPOSED SYSTEM • ‘Conceptual searching’ techniques for heterogeneous KB have drawbacks. Do you know KIM? (https://www.ontotext.com/sites/default/files/publications/KIM_SAP_ISWC168.pdf) • Ranking: our concern is to rank docs annotated by query answers and not the answers themselves.

  7. PROPOSED SYSTEM • Domain-Concept Superclass base concept(root). • Topic ‘Property’ of a class used for classification. • Document The proxy info. source to be searched upon.

  8. FIG.2

  9. SEMI-AUTOMATIC ANNOTATION • Domain Concept instances stores a multi-valued property called ‘label’ for every instance. (This is the most usual text form of the instance). • Whenever an occurrence is found, an annotation is created between the instance and the document. Instance Annotation Document FIG.3

  10. WEIGHING ANNOTATIONS • ‘Weight’ assigned to every annotation instead of doc. • Shows relevance of instance with doc. • Weight computed by adaptation of TF-IDF algo. • Weight ‘dx’ for any instance ‘x’ occurring in doc ‘d’:

  11. WEIGHING ANNOTATIONS Adaptation of the TF-IDF algorithm • freqx,d: of occurrences in d of the keywords attached to x • maxyfreqy,d: frequency of the most repeated instance in d • nx: #of documents annotated with x • D: the set of all documents in the search space

  12. ANNOTATION ISSUES • METONYMY(Table Tennis=Ping pong) SOLUTION?? • Extending labeling schemes UNRESOLVED ISSUE: • SYNECDOCHE (Picasso.. …The painter also…) • Counting imprecision

  13. QUERY PROCESSING RDQL queries are used to express: • Ontology instances • Document properties • Classification values Variables can be weighted: • Manually • Automatically

  14. QUERY PROCESSING FIG. 4

  15. RANKING ALGORITHM Semantic similarity value between Query and doc. • O: the set of all classes & instances in the ontology • D: the set of all documents • Qx: Extended query vector • Vq: the set of variables in the SELECT clause of q RANKING RETRIEVAL ANNOTATION

  16. RANKING ALGORITHM • w: weight vector (0-1) • T: Tuples in the query result set • D: Doc search space • dx: wt of annotation of doc ‘d’ with instance x • q €Q: an RDQL query • Similarity :

  17. ISSUES, COMBSUM SOLUTION • Normalizing required. • Incomplete KB results in lesser similarity value for even relevant docs. • Method needs to combined with keyword-based algo. Any suggestions for solutions?? • CombSUM

  18. EXPERIMENTS KIM domain ontology and KB Complete KB includes: • 281 classes • 138 properties • 35,689 instances Automatic generation of concept-keyword mapping • 3 * 106annotations • Average observed response time below 30 sec • Weight of query variables set to 1

  19. QUERY A: News about banks that trade on NASDAK, with fiscal net income > 2 million dollars Keyword-based: • Limited expressive power • Fails to express query condition Semantic Search: • Handles condition • Annotates relevant instances Ontology: • KB large, not massive. • KB doesn’t contain all banks hence precision is lesser at 100% recall FIG. 5

  20. QUERY B:News about telecom companies Keyword-based: • KB contains few instances Semantic: • Keyword-based better, so linear combination value better Ontology: • Low precision • KB incomplete FIG. 6

  21. QUERY C: News about insurance companies in USA. Ontology: • Performance is spoiled by incorrect annotations. (Kaye=company and person’s name) Semantic: • Since keyword-based result is better, the linear combination value is also better. FIG. 7

  22. FINAL OBSERVATION • An average comparison of system over 20 queries. • Results: Situations where ontology-only search performs bad are compensated on average. FIG. 8

  23. COMPARISON WITH CONVENTIONAL SYSTEM FIG.9

  24. STRENGTHS • Better recall: Query for specific instances using class hierarchies & rules. • Better precision: Using weights, reducing ambiguities (extending labels), using structured semantic queries. • Combination of conditions on concepts Better results: • With increase in the # of clauses in the formal query • With complete and high quality KB

  25. CURRENT ISSUES Further work neededas follows: • Automatic annotation. • Advanced NLP to replace human supervision. • Score combination strategy. • Model extension with profile of user interests for personalized search. ANY MORE?

  26. THANK YOU FOR LISTENING ANY QUESTIONS?

More Related