1 / 31

Semantic Search

Semantic Search. Andisheh Keikha Ryerson University Ebrahim Bagheri Ryerson University May 7 th 2014. Outline. Search Process Query Processing Document Ranking Search Result Clustering and Diversification What is the Goal Contributions. Search Process. Simple search

thao
Télécharger la présentation

Semantic Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Search Andisheh Keikha Ryerson University EbrahimBagheri Ryerson University May 7th 2014

  2. Outline Search Process Query Processing Document Ranking Search Result Clustering and Diversification What is the Goal Contributions

  3. Search Process • Simple search • Query: keywords • Find documents which have those keywords • Rank them based on query • Result: ranked documents

  4. Outline Search Process Query Processing Document Ranking Search Result Clustering and Diversification What is the Goal Contributions

  5. Query Processing • Query length • Correlated with performance in the search task • Query is small collection of keywords • Hard to find relevant documents only based on 2,3 words • Solution • Query reformulation • Query expansion

  6. Query Processing • Relevant documents • WordNet (Synonym, hyponym, …) • … • Disambiguation • Query Expansion • Selection of new terms

  7. Query Processing • Query Expansion • Selection of new terms • Weighting those terms

  8. Outline Search Process Query Processing Document Ranking Search Result Clustering and Diversification What is the Goal Contributions

  9. The event that the document is judged as relevant to query Document Ranking The document description • Probabilistic Methods • What is the probability that this document is relevant to this query?

  10. Document Ranking Maximum likelihood estimate of the probability • Language Models • What is the probability of generating query Q, given document d, with language model Md.

  11. Outline Search Process Query Processing Document Ranking Search Result Clustering and Diversification What is the Goal Contributions

  12. Search Result Clustering and Diversification

  13. Outline Search Process Query Processing Document Ranking Search Result Clustering and Diversification What is the Goal Contributions

  14. What is the Goal Searching on google

  15. What is the Goal I want all of these searches show the same results, since they have same meaning, and it is the intent of the user to know all of them, when searching for one. Searching on google

  16. Outline • Search Process • Query Processing • Document Ranking • Search Result Clustering and Diversification • What is the Goal • Contributions • Query Expansion • Query Expansion(Tasks to Decide) • Document Ranking

  17. Contributions • How? • New Semantic Query Expansion Method • New Semantic Document Ranking Method

  18. Outline • Search Process • Query Processing • Document Ranking • Search Result Clustering and Diversification • What is the Goal • Contributions • Query Expansion • Query Expansion(Tasks to Decide) • Document Ranking

  19. Query Expansion Mass Gain weight What are these relations? Fat Muscle Example: “Gain Weight” Desirable keywords in expanded query: “Gain, weight, muscle, mass, fat”

  20. Query Expansion http://en.wikipedia.org/wiki/Weight_gain http://dbpedia.org/page/Muscle http://dbpedia.org/page/Adipose_tissue Digging in dbpedia and wikipedia

  21. Outline • Search Process • Query Processing • Document Ranking • Search Result Clustering and Diversification • What is the Goal • Contributions • Query Expansion • Query Expansion(Tasks to Decide) • Document Ranking

  22. Query Expansion(Tasks to Decide) How to map query phrases into Wikipedia components? Which properties and their related entitles should be selected? Can those properties be selected automatically for each phrase? Or should it be fixed for the whole algorithm? If it’s automatic, what is the process?

  23. Query Expansion(Tasks to Decide) Is dbpedia and Wikipedia enough to decide, or should we use other ontologies? How should we weight the extracted entities (terms, senses) in order to select the expanded query among them.

  24. Outline • Search Process • Query Processing • Document Ranking • Search Result Clustering and Diversification • What is the Goal • Contributions • Query Expansion • Query Expansion(Tasks to Decide) • Document Ranking

  25. Document Ranking • Are the documents annotated? • Yes • Rank documents using the extracted entitles from the query expansion phase. • No • Rank the documents based on the semantics of the expanded query other than the terms or phrases. • Define probabilities over senses other than terms in the query and documents.

  26. Document Ranking Documents are not annotated, so how? • Are the documents annotated? • Yes • Rank documents using the extracted entitles from the query expansion phase. • No • Rank the documents based on the semantics of the expanded query other than the terms or phrases. • Define probabilities over senses other than terms in the query and documents.

  27. Document Ranking • Semantic Similarity between two non-annotated documents ( the expanded query and the document) • There are papers on using WordNet ontology, with “topic specific PageRank algorithm”, for similarity of two sentences (phrase or word). • The application on information retrieval has not been seen yet.

  28. Document Ranking Find the aspects of different algorithms which are more beneficial in the information retrieval domain (two large documents) • Semantic Similarity between two non-annotated documents ( the expanded query and the document) • There are papers on using WordNet ontology, with “topic specific PageRank algorithm”, for similarity of two sentences (phrase or word). • The application on information retrieval has not been seen yet.

  29. Document Ranking More reasonable is to apply the algorithm on dbpedia (instead of WordNet) in the entity domain (instead of sense domain) • Semantic Similarity between two non-annotated documents ( the expanded query and the document) • There are papers on using WordNet ontology, with “topic specific PageRank algorithm”, for similarity of two sentences (phrase or word). • The application on information retrieval has not been seen yet.

  30. Document Ranking Applying a search result clustering and diversification, based on the different semantics of the query.

  31. Reference 1. B. Selvaretnam, M. B. (2011). Natural language technology and query expansion: issues, state-of-the-art and perspectives. Journal of Intelligent Information Systems, 38(3), 709-740. 2. C. Carpineto, G. R. (2012). A Survey of Automatic Query Expansion in Information Retrieval. ACM Computing Surveys, 44(1), 1-50. 3.Hiemstra, Djoerd. "A linguistically motivated probabilistic model of information retrieval." In Research and advanced technology for digital libraries, pp. 569-584. Springer Berlin Heidelberg, 1998. 4.S. W. S. R. K. Sparck Jones, "A probabilistic model of information retrieval : development and comparative experiments Part 1," Information Processing & Management, vol. 36, no. 6, pp. 779-808, 2000. 5. SparckJones, Karen, Steve Walker, and Stephen E. Robertson. "A probabilistic model of information retrieval: development and comparative experiments: Part 2." Information Processing & Management 36.6 (2000): 809-840. 6. a. R. N. A. Di Marco, "Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction," Computational Linguistics, vol. 39, no. 3, pp. 709-754, 2013. 7. Di Marco, Antonio, and Roberto Navigli. "Clustering and diversifying web search results with graph-based word sense induction." Computational Linguistics 39, no. 3 (2013): 709-754. 8. Pilehvar, Mohammad Taher, David Jurgens, and Roberto Navigli. "Align, disambiguate and walk: A unified approach for measuring semantic similarity." InProceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013). 2013.

More Related