1 / 22

Problems in Semantic Search

Problems in Semantic Search. Krishnamurthy Viswanathan and Varish Mulwad {krishna3, varish1} AT umbc DOT edu. 1. Agenda. Introduction Swoogle Cool things others do Swoogle facts/figures Our ideas References. 2. Why is Semantic Search significant?. 3. Swoogle.

mindy
Télécharger la présentation

Problems in Semantic Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Problems in Semantic Search Krishnamurthy Viswanathan and Varish Mulwad {krishna3, varish1} AT umbc DOT edu 1

  2. Agenda • Introduction • Swoogle • Cool things others do • Swoogle facts/figures • Our ideas • References 2

  3. Why is Semantic Search significant? 3

  4. Swoogle • Swoogle is a search engine for Semantic Web (SW) documents • It offers the following services: • Search SW ontologies and documents • Search SW terms, i.e. URIs that have been defined as classes and properties • Provide metadata of SW documents and support browsing the Semantic Web 4

  5. Swoogle • Swoogle supports two relevant query types: • Ontology: Searches a small collection that consists only of Semantic Web Ontologies • Document: Searches all SW documents. This search space is much larger • Swoogle indexes only the document’s URL, the terms being defined in the document, explicit descriptions about the document, and the namespaces used by the document 5

  6. Swoogle capabilities • Web search: • Basic metadata: e.g. url, desc, ns etc. • Document metadata: hasEncoding, hasLength etc. • RDF metadata: hasGrammar, hasCntTriple etc. • Advanced search using Lucene features • REST based services: Compose an HTTP GET query and retrieve the results in the form of RDF/XML 6

  7. Examples of REST queries • A query is represented as a URL: • REST_QUERY ::= SERVICE_URI ? PARAMS • Example: search SW documents which are classified as ontologies (ontoRatio > 0) • queryType: e.g. search_swd_ontology • searchString: user constructed (see manual) • Key http://logos.cs.umbc.edu:8080/swoogle31/q?queryType=search_swd_ontology&searchString=person&key=demo 7

  8. Cool things other semantic search engines do … 8

  9. Sindice • Sindice is a Semantic Web search engine created at Digitial Enterprise Research Institute (DERI) • Interesting things to note about Sindice – • Architecture • Indexing 9

  10. Sindice • Sindice uses the paradigms of cloud computing for their architecture • Sindice uses Hadoop / Nutch to distribute crawling across multiple machines • Collected data is stored in a HBase – a distributed column store 10

  11. Sindice • Sindice indexes based on – • Inverse Functional Properties (IFP) • URI’s • Literals (Keywords) IFP – An OWL cardinality restriction • Benefits – Faster Retrieval 11

  12. Watson – A gateway to the Semantic Web • From the Knowledge Management Institute at the Open University in UK • Interesting things to note about Watson – • Consider implicit semantic relationships • Quality of Semantic documents • “Rich access” to semantic data 12

  13. Watson • Implicit relationships between semantic web documents • Equivalence (Duplicate detection) • Quality of Semantic Documents • “Richer” access to Semantic Data • Web Interface for Humans • SparQL end point • Java/SOAP and REST APIs 13

  14. Others • Semantic Web Search Engine (SWSE) • Pipelined architecture for crawling and indexing • Improved index and storage structure • Falcons • Class subsumption reasoning • Includes a Triple Store 14

  15. Power Aqua • Multi-ontology based QA system powered by PowerMap and Watson • Takes inputs in the form of NL queries • Factual queries that can be expressed as one or more linguistic triples • Common wh-questions 15

  16. Power Aqua • Key challenges in order to be able to answer NL-questions: • Locating the ontologies relevant to a particular query • Identifying semantically sound relationships • Combining information from multiple queries 16

  17. Swoogle facts/figures • The search engine components currently run on 4 machines • These machines host the crawler, the Lucene index, the MySQL database etc. and access the NFS • Approximately 20,000 pages are accessed by Swoogle everyday (which get queued) • About 1,731,371 pure SW documents have been discovered 17

  18. Swoogle facts/figures • Swoogle crawler has a large queue of documents to be crawled and indexed • Swoogle accesses metadata and index files over the NFS that makes information retrieval slower 18

  19. Our Ideas: Research and Engineering • Acquire new hardware • Parallelize Swoogle • Focus on a particular domain • Project Swoogle as a search engines for agents 19

  20. Our Ideas: Research and Engineering • Improve Swoogle’s indexing scheme • Analyze Swoogle’s ranking scheme • Use of Swoogle Metadata • Improve the usability of the website • Google like Services 20

  21. References • Li Ding et al., "Swoogle: A Search and Metadata Engine for the Semantic Web", Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, November 2004. • P. Mika, G. Tummarello “Web Semantics in the Clouds”, IEEE Intelligent Systems, Volume 23 , Issue 5 (September 2008) • E. Oren, R.Delbru, M. Catasta, R. Cyganiak, H. Stenzhorn, G. • Tummarello “Sindice.com: A document-oriented lookup index for open linked data.” In International Journal of Metadata, Semantics and Ontologies, 3(1), 2008. • Mathieu d’Aquin et al., “Watson: A Gateway for the Semantic Web” ,Poster session of the European Semantic Web Conference, ESWC 2007 • Gong Cheng, Weiyi Ge, Honghan Wu, Yuzhong Qu , “Searching Semantic Web Objects Based on Class Hierarchies” In WWW 2008 Workshop on Linked Data on the Web, 2008 21

  22. Questions ? 22

More Related