Introduction to Semantic Web Design B. Ramamurthy
Introduction • Web in its current form is an application on the internet that delivers information. Ex: browsing daily news • Current applications involving the web integrate data and information. Ex: online shopping • Next generation web is expected integrate a variety of resources and devices and support knowledge sharing among machines. • Exploit the economies of scale possible by machines processing of knowledge. • How to tell the machines about the resources and how to specify concepts? How can machines acquire knowledge? How to share knowledge among machines? How to enable them to make decisions based on these? • Need to specify resources, concepts, knowledge and other artifacts used in human decision making in a form usable by machines. • Machines can then integrate and analyze information and make decisions and collect knwolegde. • In this lecture we will examine technology, tools, frameworks, and applications enabling the next generation web, the semantic web. • We will also discuss an intelligent search engine serving municipal services in a real semantic web application (Chapter 4)
References for today’s discussion • W3C school’s tutorials (http://www.w3schools.com) • Taxonomies and the semantic web by Alistair Miles, CISTRANA workshop, Feb 2006, Rutherford Appleton Lab
HTML, XML, RDF, and OWL • HTML: • HTML stands for Hyper Text Markup Language • An HTML file is a text file containing small markup tags • The markup tags tell the Web browser how to display the page • XML: • XML stands for eXtensible Markup Language • XML is a markup language much like HTML • XML was designed to carry data, not to display data • XML tags are not predefined. You must define your own tags • XML is designed to be self-descriptive • XML is a W3C Recommendation
HTML, XML… RDF, .. • RDF: • RDF stands for Resource Description Framework • RDF is a framework for describing resources on the web • RDF provides a model for data, and a syntax so that independent parties can exchange and use it • RDF is designed to be read and understood by computers • RDF is not designed for being displayed to people • RDF is written in XML • RDF is a part of the W3C's Semantic Web Activity • RDF is a W3C Recommendation • Lets discuss the details.
HTML…OWL • OWL: • OWL stands for Web Ontology Language • OWL is built on top of RDF • OWL is for processing information on the web • OWL was designed to be interpreted by computers • OWL was not designed for being read by people • OWL is written in XML • OWL has three sublanguages • OWL is a web standard
Web ontology Natural language Ex: English Natural language Ontology Programming language Ex: Pascal Web ontology Programming language is a strict syntaxed language for expressing algorithms (steps) for execution by a computing device. Web ontology is for expressing web related concepts. Web ontology language (OWL) is a technology for accomplishing this. Protégé-OWL is a tool that implements OWL.
Taxonomy and web ontology • Taxonomy is a science of classification. F: Taxonomy • Ontology is specification of conceptualization. F: Ontology • XML allows for meaningful tags. T: XML • Resource Definition Framework is an XML language for defining resources on the web (www). T: RDF • Web Ontology Language (OWL) T:OWL • RDF is an assertional language intended to be used to express propositions using precise formal vocabularies, particularly those specified using RDFS [RDF-VOCABULARY], for access and use over the World Wide Web, and is intended to provide a basic foundation for more advanced assertional languages with a similar purpose. The overall design goals emphasize generality and precision in expressing propositions about any topic, rather than conformity to any particular processing model.
RDF and OWL • OWL is a semantic extension of RDF: it allows for specification of logical dependencies between information structures. (as defined by Miles: ref 2) • OWL works on structured information • RDF is for structuring information. • OWL is an information model.9
Semantic stack OWL Semantic web RDFS RDF URI XML
Intelligent Search Engine for online access to municipal services (Ch 4): problem definition • Citizens can perform 80% of the city services from home • When somebody is looking for a service one must be able to locate it easily. • You can collect, categorize and list all the services (.. Taxonomy) • However searching through this list may not yield expected results using traditional search engines. • Search results are based on the description of the services and co-occurrence of the words in the query. • Ex: A citizen want to dispose a washing machine should search for “special collection of large items” • Cannot force citizens to learn government language • When a service is looked upon a set of related services should be made available • Search engine is a first step in the roadmap to citizen self-service
Zaragoza Municipal services roadmap (Fig. 4.1) Positioning Intelligent search Engine Citizen channels Citizen self-service Interface Functionality Content Scope Technology
Application of semantic web • Three ways that Zaragosa used semantic web are: • Statistical approach to interpretation of citizen requests. (fig. 4.3) • Enhanced-keyword based approach to interpretation of citizen requests. (fig. 4.4) • Applying semantic distance to interpreting citizen requests. (fig. 4.5)
Usage of the three methods • First approach is cheapest and consumes less resources and the semantic web approach is the most expensive. • Zaragosa architecture arranges the three in a pipeline architecture where each stage is triggered only when previous stage did not result is satisfactory results.
How does it work? • Traditional search engines retrieve documents based on occurrences of keyboards vs. Zaragosa SOA (ZS) has understanding of its services, information and data. • ZS knows persons can change addresses, car owners pay taxes, construction work requires permits, building bars near schools is not good etc. • All this information is stored in an ontology: a computer understandable description of what e-services are. • This ontology allows ZS to understand citizens’s query and thus returns meaningful results. • ZS also uses natural language understanding software to translate free text queries of citizens into the ontology. (see fig. 4.6)
Citizen-city government interaction (Fig. 4.6 modified) Natural language query Semantic Query Result NLP Knowledge Tagger (KT) Semantic Distance Analyzer (SD) ZS domain ontology
Search for keywords Result in ranked list of documents Users need to invest time and effort to filter the right piece of information out of the overall results Search for keywords, semantic concepts. Results in actual relevant document Perceived as search engine that understands the user. Search vs. Intelligent Search
ZS Domain Ontology • Development of an ontology starts with detailed study of the services offered by the city. • Objective is to extract all relevant terms belonging to this domain from existing documents. • ZS ontology contains four main classes: agent, process, event, object
ZS Domain Ontology (contd.) • Agent: entity participating in an action • Process: A series of actions that a citizen can do using the online services offered by the city government. • Event: any social gathering or activity. • Object: any entity that exists in the city which can be used for or by a service offered by the city government.
Using the ontology • Approach is to establish a semantic similarity between a question provided by a citizen and the FAQs already available. • Ontology needs to be complete in order to contain all the necessary terms to satisfy the requests. • Ontology is completed with a number of thesauri to identify synonyms. Ex: baby and infant • Context information is used to tackle any ambiguity.
Natural Language Process for ZS • Knowledge tagger automatically annotates text according to domain ontology • Series of linguistic analyzers, sentence splitters, simple tokenizers, spell checkers and morphological databases. • Outcome of this analysis is a annotated text equivalent of the query. • Then the query is synthesized in terms of domain ontology: RDQL, SPARQL, … SQL
Semantic Annotation of city services • Collect and index the information about services • Semantic processing results in ontological entities: concepts, instances, attributes, and relations • Output of this process is semantically described services that can checked against citizen’s queries.
Overall Architecture of ZS Search clients Search Systems web services Ontology Systems Ontology cache Ontology Subsytem Web services NLP systems NLP cache NLP subsystem Web services Persistence RDBMS
Summary • Zaragosa is an powerful SOA that uses semantic knowledge to better serve its citizens. • Its roadmap is open with ability to extend the system through its WS interface.
Zaragosa Web site Services Customer Service ontology Customer ontology Basic layer Networked SOA for Zaragosa Enterprise Layer Intermediary layer Semantic Search