1 / 53

Search Engines for Semantic Web Knowledge

Search Engines for Semantic Web Knowledge. Tim Finin University of Maryland, Baltimore County Joint work with Li Ding, Anupam Joshi, Yun Peng, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott Cost and Vishal Doshi.

dyan
Télécharger la présentation

Search Engines for Semantic Web Knowledge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search Engines for Semantic WebKnowledge Tim Finin University of Maryland, Baltimore County Joint work with Li Ding, Anupam Joshi, Yun Peng, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott Cost and Vishal Doshi  http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F30602-97-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.

  2. This talk • Motivation • Semantic web 101 • Swoogle Semantic Websearch engine • Use cases and applications • State of the Semantic Web • Conclusions

  3. Google has made us smarter

  4. tell register But what about our agents? Agents still have a very minimal understanding of text and images.

  5. This talk • Motivation • Semantic web 101 • Swoogle Semantic Websearch engine • Use cases and applications • State of the Semantic Web • Conclusions

  6. XML helps “XML is Lisp's bastard nephew, with uglier syntax and no semantics. Yet XML is poised to enable the creation of a Web of data that dwarfs anything since the Library at Alexandria.” -- Philip Wadler, Et tu XML? The fall of the relational empire, VLDB, Rome, September 2001.

  7. Semantic Web adds semantics “The Semantic Web will globalize KR, just as the WWW globalize hypertext” -- Tim Berners-Lee

  8. <?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf=http://xmlns.com/foaf/0.1/ xmlns:uni=http//ebiquity.umbc.edu/ontologies/uni/> <uni:Student> <foaf:name>Li Ding</foaf:name> <foaf:mbox rdf:resource=“mailto:dingli1@umbc.edu”/> </uni:Student> </rdf:RDF> foaf:name Li Ding uni:Student rdf:type Semantic Web 101 • RDF/XML • rdf:RDF tag • namespaces  ontologies • Semantic graph, URIs as nodes & links • triples

  9. Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle tell register But what about our agents? A Google for knowledge on the Semantic Web is needed by software agents and programs

  10. This talk • Motivation • Semantic web 101 • Swoogle Semantic Websearch engine • Use cases and applications • State of the Semantic Web • Conclusions

  11. http://swoogle.umbc.edu/ • Running since summer 2004 • 1.4M RDF documents, 250M RDF triples, 10K ontologies

  12. Analysis … SWD classifier Ranking Index Search Services Semantic Web metadata IR Indexer Web Server Web Service SWD Indexer html rdf/xml Discovery the Web document cache SwoogleBot Semantic Web Candidate URLs Bounded Web Crawler Google Crawler human machine Legends Information flow Swoogle‘s web interface Swoogle Architecture

  13. A Hybrid Harvesting Framework true Swoogle Sample Dataset Manual submission Inductive learner would Seeds R Seeds M Seeds H Meta crawling Bounded HTML crawling RDF crawling google Google API call crawl crawl the Web

  14. This talk • Motivation • Semantic web 101 • Swoogle Semantic Websearch engine • Use cases and applications • State of the Semantic Web • Conclusions

  15. Applications and use cases • Supporting Semantic Web developers • Ontology designers, vocabulary discovery, who’s using my ontologies or data?, use analysis, errors,statistics, etc. • Searching specialized collections • Spire: aggregating observations and data from biologists • InferenceWeb: searching over and enhancing proofs • SemNews: Text Meaning of news stories • Supporting SW tools • Triple shop: finding data for SPARQL queries

  16. 80 ontologies were found that had these three terms By default, ontologies are ordered by their ‘popularity’, but they can also be ordered by recency or size. Let’s look at this one

  17. Basic Metadata hasDateDiscovered:  2005-01-17 hasDatePing:  2006-03-21 hasPingState:  PingModified type:  SemanticWebDocument isEmbedded:  false hasGrammar:  RDFXML hasParseState:  ParseSuccess hasDateLastmodified:  2005-04-29 hasDateCache:  2006-03-21 hasEncoding:  ISO-8859-1 hasLength:  18K hasCntTriple:  311.00 hasOntoRatio:  0.98 hasCntSwt:  94.00 hasCntSwtDef:  72.00 hasCntInstance:  8.00

  18. These are the namespaces this ontology uses. Clicking on one shows all of the documents using the namespace. All of this is available in RDF form for the agents among us.

  19. Here’s what the agent sees. Note the swoogle and wob (web of belief) ontologies.

  20. We can also search for terms (classes, properties) like terms for “person”.

  21. 10K terms associatged with “person”! Ordered by use. Let’s look at foaf:Person’s metadata

  22. UMBC Triple Shop • http://sparql.cs.umbc.edu/ • Online SPARQL RDF query processing basedon HP’s Jena and Joseki with several interesting features • Selectable level of inference over model • Automatically finds SWDs for give queries using Swoogle backend database • Provide dataset creation wizard • Dataset can be stored on our server or downloaded • Tag, share and search over saved datasets

  23. Web-scale semantic web data access data access service the Web agent Index RDF data ask (“person”) Search vocabulary Search URIrefs in SW vocabulary inform (“foaf:Person”) Compose query ask (“?x rdf:type foaf:Person”) Search URLs in SWD index Populate RDF database inform (doc URLs) Fetch docs Query local RDF database

  24. Who knows Anupam Joshi? Show me their names, email address and pictures

  25. The UMBC ebiquity site publishes lots of RDF data, including FOAF profiles

  26. No FROM clause! Constraints on wherethe data comes from

  27. PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT DISTINCT ?p2name ?p2mbox ?p2pix WHERE { ?p1 foaf:name "Anupam Joshi" . ?p1 foaf:mbox ?p1mbox . ?p2 foaf:knows ?p3 . ?p3 foaf:mbox ?p1mbox . ?p2 foaf:name ?p2name . ?p2 foaf:mbox ?p2mbox . OPTIONAL { ?p2 foaf:depiction ?p2pix } . } ORDER BY ?p2name

  28. Swoogle found 292 RDF data files that appear relevant to answering our query

  29. Let’s save the dataset before we use it

  30. And tag it so we and others can find it more easily.

  31. Here we are using it to get an answer to “Who knows Anupam Joshi”

  32. He has many friends!

  33. This talk • Motivation • Semantic web 101 • Swoogle Semantic Websearch engine • Use cases and applications • State of the Semantic Web • Conclusions

  34. Will it Scale? How? Here’s a rough estimate of the data in RDF documents on the semantic web based on Swoogle’s crawling We think Swoogle’s centralized approach can be made to work for the next few years if not longer.

  35. How much reasoning? • SwoogleN (N<=3) does limited reasoning • It’s expensive • It’s not clear how much should be done • More reasoning would benefit many use cases • e.g., type hierarchy • Recognizing specialized metadata • E.g., that ontology A some maps terms from B to C

  36. This talk • Motivation • Semantic web 101 • Swoogle Semantic Websearch engine • Use cases and applications • State of the Semantic Web • Conclusions

  37. Conclusion • The web will contain the world’s knowledge in forms accessible to people and computers • We need better ways to discover, index, search and reason over SW knowledge • SW search engines address different tasks than html search engines • So they require different techniques and APIs • Swoogle like systems can help create consensus ontologies and foster best practices • Swoogle is for Semantic Web 1.0 • Semantic Web 2.0 will make different demands

  38. For more information http://ebiquity.umbc.edu/ Annotatedin OWL

More Related