On-To-Knowledge IST-1999-10132 Content-driven Knowledge Management through Evolving Ontologies

On-To-Knowledge • IST-1999-10132 • Content-driven Knowledge Management through Evolving Ontologies • Rob Engels,Dieter Fensel, Frank van Harmelen, Victor Iosif, Arjohn Kampman, Uwe Krohn,Ulrich Reimer, Rudi Studer and York Sure • www.ontoknowledge.org

Contents • The overall goals • The overall architecture and language • Ontology building and instantiation • Storing and manipulating meta information • Querying the semantic web • Case Studies • Conclusions

1. The overall goals • The competitiveness of companies active in areas with high change rate depends heavily on how they maintain and access their knowledge. • Large Companies have intranets with several million pages. Finding, creating and maintaining information is a rather hard problem in this weakly structured representation media. • Knowledge Management is about acquiring, maintaining, and accessing knowledge of an organization.

The overall goals • With the large number of on-line documents several document management systems arose. However these systems have severe weaknesses: • Word matching as search method. • Information retrieval instead of query answering. • Document exchange between departments is only possible with severe effort. • Different views on documents are not supported. • Information maintenance is not supported.

The overall goals • Ontologies will allow structural and semantic definitions of documents providing completely new possibilities compared with existing document management systems: • Intelligent search instead of keyword matching. • Query answering instead of information retrieval. • Document exchange between departments via transformation operators. • Definition of views on documents. • Support of information maintenance.

The overall goals • The goal of the On-To-Knowledge project is to support efficient and effective knowledge management. It focuses on acquiring, representing, and accessing weakly-structured on-line information sources: • Acquiring: Text mining and extraction techniques are applied to extract semantic information from textual information. • Representing: XML, RDF, and OIL are used for describing syntax and semantics of semi-structured information sources. • Accessing: Novel semantic web search technology and knowledge sharing facilities.

User Expert Backbone 2. The overall architecture and language • The On-To-Knowledge tool suite consists of: • an Ontology-based knowledge sharing facility; • an Ontology-based presentation platform; • an Ontology-based search engine; • an Ontology editor and semi-automatic Ontology construction tools; • inference engines and query engine for meta data and schema information; • persistent storage of Ontologies and meta data; and • extraction tools for meta data.

The overall architecture and language • Open architecture, maximal reliance on existing standards • XML, RDF, HTTP, SOAP, JDBC, RQL, … • Client-server approach… • allows tools to be used over the Internet • requires minimal installation of tools locally • All tools use DAML+OIL as ontology language • OIL Core is the minimum requirement • Tool scalability is targeted at supporting • O(103) classes • O(105) data statements

OntoShare User RQL RDF Ferret Spectacle Knowledge Engineer OntoEdit OIL-Core OMM LINRO Sesame OIL-Core ontologyrepository Annotated Data Repository RDF RDF pers05 731 par05 car tel about OntoWrapper OntoExtract Data Repository (external) This text is about cars even though you can’t read it The overall architecture and language

The overall architecture and language: OIL (Ontology Inference Layer) • RDF Schema defines a simple ontology modeling language on top of RDF that can be used to define vcabulary and structure of meta information. • OIL adds a simple Description Logic to RDF Schema: It allows to define axioms that logically describe classes, properties, and their hierarchies. • Currently, a sub dialect of OIL called DAML+OIL is the starting point of a web ontology language standardization group of the W3C which should start soon. • RDF provides a simple data model for representing formal semantics of information, i.e. meta-information.

The overall architecture and language: OIL (Ontology Inference Layer) Credits:Thanks to Ian Horrocks from Manchester!

The overall architecture and language: OIL (Ontology Inference Layer) • OIL provides a layered architecture that offers different layers of complexity:

3. Ontology building and instantiation • OntoEdit: Manual building of Ontologies. • OntoExtract: Semi-automatic Ontology construction from natural language sources. • OntoWrapper: Semi-automatic Ontology construction from semi-structured and structured information sources.

Ontology building and instantiation: OntoEdit • OntoEdit is a graphical Ontology Engineering Environment: • Helps in creating, modifying and browsing of ontologies. • It is flexible and expandable through a plug-in framework.

Ontology building and instantiation: OntoEdit

Ontology building and instantiation: CORPORUM • Structured documents: Ontowrapper and screen-scraping extract information from places on specific sites (e.g. names, email addresses, telephone numbers). • Unstructured documents: OntoExtract extracts initial ontologies from natural language on web pages. OntoExtract is able to: • provide initial ontologies, • refine existing ontologies, • find relations between key terms in documents, • find instances of concepts within document.

Ontology building and instantiation: CORPORUM • CORPORUM’s linguistic functionality is based on a tokenizer, a morphologic component, and a relation determining engine. • This allows CORPORUM to extract concepts from texts that are more then just words, concepts can also be generated by the engine. • Relations between such concepts are defined (e.g. subClassOf relations, or InstanceOf relations). • Through semantic analysis of a domain, the tool can automatically generate thesauri of words within a domain. • Visualisation of such semantic structures can than be used for navigation and browsing through document sets.

Corporum-OntoExtract • OntoExtract allows for analysis of natural language. • The component exports its Central Concept Area in the form of a light-weight ontology (syntax: DAML+OIL). • It finds classes, sub-class relationships and instances. • Finally there are cross-taxonomic relations describing relations between concepts that are often not easily recovered from “standard” ontologies.

Corporum-OntoExtract • How does OntoExtract currently work: • parses, tokenizes and analyses text, • generates nodes and relations between them, • enhances specific aspects of the discovered knowledge item using a background repository (containing general knowledge of the world, represented in Sesame), • and the final analysis results are submitted to the RDFS server Sesame.

rdf:Class rdf:Class rdf:type rdf:type hasSize weaklyRelatedTo motorcycle holidays “long” rdf:type hasColor MC_001 “black” Sesame domain knowledge Sesame background knowledge

rdf:Class rdf:Class rdf:type rdf:type hasSize weaklyRelatedTo motorcycle holidays “long” rdf:type hasColor MC_001 “black”

4. Storing and manipulating meta information • Sesame: A repository and querying facility for RDF Schema. • Functionality: • Persistent storage of RDF data and RDF Schema. • Query engine for RDF Query Language (RQL). • Data upload and download in RDF format. • Communication over HTTP. World’s first! • Features: • Designed for scalability. • Independent of repository types(databases, files, in-memory data structures, ...). • Modular design allows for other functional modules. • Architecture allows support for other communication protocols.

Sesame: architecture HTTP Handler ??? Handler Protocol handlers Routes requests to modules Request Router Functional modules RDF Admin RQL Engine ??? Module Provides database independence Repository Abstraction Layer Repository Persistent storage

Sesame: RQL query engine • RQL: • tailored to theRDF graph model • currently the only query language forRDF Schemasemantics • based onOQLwith features like • Set of core queries (Class, Property, subClassOf, …) • Filters (select-from-where) • Boolean expressions • Functional composition of queries • supportspath expressions • For navigating the RDF graph model • Allowing mixed RDF data and schema queries

1. “Give me all instances of class Researcher” Researcher 2. “Give me all subclasses of Researcher” subClassOf(Researcher) More advanced queries: 3. “Give me all researchers whose last name starts with ‘P’ select R from Researcher{R}.last_name{N} where N like “P*” 4. “Give me all properties a Researcher can have, and their domain and range” select @P, domain(@P), range(@P) from {:Researcher} @P {} Sesame: RQL query examples • Basic queries:

5. Querying the Semantic Web • OTK provides as user access: • Querying the Semantic Web: • RDFferret • An Ontology-based presentation platform: Spectacle • An Ontology-based knowledge sharing facility: OntoShare

High ceiling Querying the Semantic Web: RDFferret • Impractical to create RDF annotations that exhaustively cover the content of a given document • RDF searches might produce low recall • RDF searches produce high precision • Search both: RDF annotations and text content • Use well proven IR techniques (ranking, stemming, ...) Low threshold

Querying the Semantic Web: RDFferret

Ontology-based presentation: Spectacle • Spectacle:personalized information disclosure • Personalization isontology-based • Spectacle can personalize: • thecontentitself (WHAT) • the contentpresentation(HOW) • the contentorganisation/navigation(WHERE) • Example personalizations based on: • Experience: beginner vs expert user • Role: maintainer vs end-user • Task: learning vs problem solving • Etc.

Ontology-based presentation: Spectacle information sources presentation data profiles navigation layout RDF classification classified data DBMS docs presentation ontology-based information presentation

Proactively sharing information:OntoShare • Sharing information through an organisation is a key knowledge management issue • OntoShare supports and encourages information sharing: • User requests OntoShare to share some information • On sharing, page is assigned to an ontological category (class) and matched against each user’s ontology-based profile. • OntoShare automatically extracts keywords & summaries from the information and suggests changes to user profile based on user activity • OntoShare proactively emails selected users when information of interest is shared

6. Case Studies • Large Intranets: Swiss Life • Customer Relationship Management: BT • Virtual Enterprise: Enersearch • A general Methodology: AIFB

Large Intranets: Swiss Life • Swiss Life is a large insurance company with a huge intranet and other distributed information sources. • Efficient knowledge management for this information is of high strategic importance. • OTK technology is applied in two case studies with Swiss Life: • Searching a Large Document on IAS (International Accounting Standard). • Skills Management (SkiM).

Searching a Large Document on IAS (International Accounting Standard) • Approach • Automatic generation of a light-weight ontology with weighted semantic associations between concepts • Use of that ontology to support query reformulationby adding relevant ontology terms • Goals • Fast and reliable access to relevant passages ofa large document on IAS

Skills Management (SkiM) • Goals • Access the knowledge and skills of employees • Identify, manage, use and advance skills • Approach • Use of manually built ontologies • - to describe skills, job functions, education in a controlled vocabulary • - to generate annotated homepages from the skills descriptions • Exploit ontologies for a more specific search on homepages for people with certain skills

Customer Relationship Management: BT • Disseminating Customer Handling Rules: • Offer a cost-effective channel for the dissemination of all sorts of rules and instructions. • Health & Safety. • Sales scripts for Telesales Representatives. • Timely information on new products and services. • Dissemination Best Practice: • Promote behaviours that are acceptably consistent across call centres. • Help managers to become more aware of Best Practice resources on the BT Intranet. • Help to build communities of best practice.

Customer Relationship Management: BT • Staying Alert-Interest Profiles: • Learn a user’s interests and preferences autonomously (with minimal feedback from the user). • Adapt to changing needs of the user over time. • Where possible user profiles should be acquired automatically with the users’ role being one of review to correct/refine their profile.

Virtual Enterprise: EnerSearch • Goals • Improve usefulness of EnS website by semantic methods • Especially for the shareholder representatives in virtual organizations • Approach • Ontology development • Manual (OntoEdit/AIFB) • Automatic (OntoExtract/CognIT) • Information modes: (i) key-word search (ii) semantic ontology-based search (RDF-ferret) (iii) browsing with knowledge visualization (spectacle) • Evaluation by end user studies: (i) pre-trial interviews (ii) end user test (iii) post trial studies

Baseline ontology Target ontology O-based Application GO! ONTOLOGY KMMethodology Maintenance & Evolution Feasibility study Ontology Kickoff Refinement Evaluation • Identify people • Focus domain • Select tools from OTK tool suite • GO / No GO decision • Requirement specification • Analyze knowledge sources • Develop baseline ontology • Knowledge elicitation with domain experts • Develop and refine target ontology • Check requirements • Test in target application • Analyze usage patterns • Deployment • Manage organizational maintenance process (Who is responsible? How is it done?)

7. Conclusions • The semantic web is based on machine-processable semantics of data. • It will significantly change our information access based on a higher level of service provided by computers. • It is based on new web languages such as XML, RDF, and OIL, and tools that make use of these languages. • Applications are in areas such as knowledge management and electronic commerce. • Many research projects have been started in the EU and US on these topics. • On-To-Knowledge is one of the first ones breaking the ice.

On-To-Knowledge IST-1999-10132 Content-driven Knowledge Management through Evolving Ontologies