1 / 80

Semantic Web & Semantic Web Processes

Semantic Web & Semantic Web Processes. A course at Universidade da Madeira, Funchal, Portugal June 16-18, 2005 Dr. Amit P. Sheth Professor, Computer Sc., Univ. of Georgia Director, LSDIS lab CTO/Co-founder, Semagix , Inc. Special Thanks: Cartic Ramakrishnan , Karthik Gomadam.

erling
Télécharger la présentation

Semantic Web & Semantic Web Processes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Web & Semantic Web Processes A course at Universidade da Madeira, Funchal, Portugal June 16-18, 2005 Dr. Amit P. Sheth Professor, Computer Sc., Univ. of Georgia Director, LSDIS lab CTO/Co-founder, Semagix, Inc Special Thanks: Cartic Ramakrishnan, Karthik Gomadam

  2. Agenda 1 Part I • What is Semantic Web? • What makes the Semantic web • Ontologies – importance of relationships and knowledge • Representation and Languages • Why XML is not enough • Describe semantic web resources- RDF and RDFS • OWL • Query processing and storage Part II • Metadata, Enabling techniques and technologies • Ontology and knowledge engineering: ontology design, ontology population maintaining, ontology freshness • Automated metadata extraction and annotation • Computation and reasoning with focus on relationships • Example commercial Semantic Web platform

  3. Agenda 2 Part III • Semantic web applications: search, integration, analysis • Pan-Web and consumer-centric • Enterprise Part IV • Semantic Web Services and Processes • What are Web Services ? • What are Web processes ? • Creating Web processes: Annotation, discovery, composition, etc. • Semantic Web Service/Process tools

  4. Part I • What is Semantic Web? • What makes the Semantic web • Ontologies – importance of relationships and knowledge • Types and examples of ontologies • Metadata and Semantic Annotation -- metadata classifications • Representation and Languages • Why XML is not enough • RDF - Describe semantic web resources and RDFS - RDF as a triple, RDF as a graph (show example RDF/S) • OWL • RDF Query processing and storage

  5. Semantics (Ontology, Context, Relationships, KB) Generation III 2000s MediaAnywhere InfoQuilt, OBSERVER Semagix Freedom , Semantic Web technologies and platforms Metadata (Domain model) VisualHarness InfoHarness AdaptX/Harness Generation II 1990s Metadata based integration, Mediator Systems, Digital Libraries Data (Schema, “semantic data modeling) Generation I 1980s Mermaid DDTS Intervisio Heterogeneous databases/ Federated Databases Research Three generation of Information Systems: Where we have come from, where we are going

  6. Broad Scope of Semantic (Web) Technology Current Semantic Web Focus Formal Semantic Web Processes Semi-Formal Degree of Agreement Agreement About Qos Informal Execution Scope of Agreement Function Common Sense Gen. Purpose,Broad Based Domain Industry Data/ Info. Task/ App Lots of Useful Semantic Technology (interoperability, Integration) Other dimensions: how agreements are reached, … Cf: Guarino, Gruber

  7. What is the Semantic Web? • "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." -- Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001 • Ontologies • RDF/RDFS or OWL Syntax – machine processable • Semantic Metadata – annotation of web resources

  8. “An ontology is a specification of a conceptualization” (T. Gruber) • A conceptualization is the way we think about a domain • A specification provides a formal way of writing it down Building Ontologies from the Ground Up When users set out to model their professional activity – Mark Mussen

  9. Everything that can be expressed in the language Ontology Constraining Possible Interpretations Of what can Be expressed Conceptualization and Ontology http://www.w3c.it/events/minerva20040706/guarino.pdf

  10. Central Role of Ontology • Ontology represents agreement, represents common terminology/nomenclature • Ontology is populated with extensive domain knowledge or known facts/assertions • Key enabler of semantic metadata extraction from all forms of content: • unstructured text (and 150 file formats) • semi-structured (HTML, XML) and • structured data • Ontology is in turn the center piece that enables • resolution of semantic heterogeneity • semantic integration • semantically correlating/associating objects and documents

  11. Types of Ontologies (or things close to ontology) • Upper ontologies: modeling of time, space, process, etc • Broad-based or general purpose ontology/nomenclatures: Cyc, CIRCA ontology (Applied Semantics), SWETO, WordNet ; • Domain-specific or Industry specific ontologies • News: politics, sports, business, entertainment • Financial Market • Terrorism • Pharma • GlycO, ProPreO • (GO (a nomenclature), UMLS inspired ontology, …), MGED • Application Specific and Task specific ontologies • Anti-money laundering • Equity Research • Repertoire Management • Financial irregularity Fundamentally different approaches in developing ontologies at the two end of the above spectrum

  12. Building ontology Three broad approaches: • social process/manual: many years, committees • Can be based on metadata standard • automatic taxonomy generation (statistical clustering/NLP): limitation/problems on quality, dependence on corpus, naming • Descriptional component (schema) designed by domain experts; Description base (assertional component, extension) using automated processes from trusted knowledge sources Option 2 is being investigated in several research projects; Option 3 is currently supported by Semagix Freedom

  13. SUMO -- http://ontology.teknowledge.com/

  14. Part of the CYC Upper Ontology http://www.cyc.com/cyc/technology/whatiscyc_dir/whatdoescycknow

  15. SWETO (Semantic Web Testbed Ontology) Current Status • Developed using Semagix technology for free non-commercial usage by the SW community; some initial users • V1.4 population includes over 800,000 entities and over 1,500,000 explicit relationships among them • Continue to populate the ontology with diverse sources thereby extending it in multiple domains, new smaller and larger release due soon; RDF and OWL versions • Significant information for provenance/trust support [UMBC partnership] • 97% of disambiguation performed automatically, 2% manually; not quite a high-quality as an evaluation testset (e.g., low connectivity) • Working on test harness, quality measures, and benchmarks

  16. TAMBIS BioPAX EcoCyc Expressiveness Range: Knowledge Representation and Ontologies KEGG Thesauri “narrower term” relation Disjointness, Inverse,part of… Frames (properties) Formal is-a CYC Catalog/ID DB Schema UMLS RDF RDFS DAML Wordnet OO OWL IEEE SUO Formal instance General Logical constraints Informal is-a Value Restriction Terms/ glossary GO SWETO GlycO SimpleTaxonomies ExpressiveOntologies Pharma Ontology Dimensions After McGuinness and Finin

  17. Gene Ontology (GO) • Comprises three independent “ontologies” • molecularfunction of gene products • cellularcomponent of gene products • biological process representing the gene product’s higher order role. • Uses these terms as attributes of gene products in the collaborating databases (gene product associations) • Allows queries across databases using GO terms, providing linkage of biological information across species http://www.geneontology.org/

  18. GO = Three Ontologies • Molecular Function • elemental activity or task • example: DNA binding • Cellular Component • location or complex • example: cell nucleus • Biological Process • goal or objective within cell • example: secretion http://www.geneontology.org/

  19. GlycO • GlycO: a domain Ontology embodying knowledge of the structure and metabolisms of glycans • Contains 770 classes – describe structural features of glycans • URL: http://lsdis.cs.uga.edu/projects/glycomics/glyco is a focused ontology for the description of glycomics • models the biosynthesis, metabolism, and biological relevance of complex glycans • models complex carbohydrates as sets of simpler structures that are connected with rich relationships

  20. GlycO statistics: Ontology schema can be large and complex • 770 classes • 142 slots • Instances Extracted with Semagix Freedom: • 69,516 genes (From PharmGKB and KEGG) • 92,800 proteins (from SwissProt) • 18,343 publications (from CarbBank and MedLine) • 12,308 chemical compounds (from KEGG) • 3,193 enzymes (from KEGG) • 5,872 chemical reactions (from KEGG) • 2210 N-glycans (from KEGG)

  21. GlycO taxonomy The first levels of the GlycO taxonomy Most relationships and attributes in GlycO GlycO exploits the expressiveness of OWL-DL. Cardinality constraints, value constraints, Existential and Universal restrictions on Range and Domain of properties allow the classification of unknown entities as well as the deduction of implicit relationships.

  22. Query and visualization

  23. N-glycan_beta_GlcNAc_9 N-glycan_alpha_man_4 GNT-Vattaches GlcNAc at position 6 N-acetyl-glucosaminyl_transferase_V UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=> UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021 A biosynthetic pathway GNT-Iattaches GlcNAc at position 2

  24. The impact of GlycO • GlycO models classes of glycans with unprecedented accuracy • Implicit knowledge about glycans can be deductively derived • Experimental results can be validated according to the model

  25. N-GlycosylationProcess (NGP) Cell Culture By N-glycosylation Process, we mean the identification and quantification of glycopeptides extract Glycoprotein Fraction proteolysis Glycopeptides Fraction 1 Separation technique I n Glycopeptides Fraction PNGase n Peptide Fraction Separation technique II n*m Peptide Fraction Mass spectrometry ms data ms/ms data Data reduction Data reduction ms peaklist ms/ms peaklist binning Peptide identification Glycopeptide identification and quantification N-dimensional array Peptide list Data correlation Signal integration

  26. ProPreO - Experimental Proteomics Process Ontology • ProPreO models the phases of proteomics experiment using five fundamental concepts: • Data: (Example: a peaklist file from ms/ms raw data) • Data_processing_applications: (Example: MASCOT* search engine) • Hardware: embodies instrument types used in proteomics (Example: ABI_Voyager_DE_Pro_MALDI_TOF) • Parameter_list: describes the different types of parameter lists associated with experimental phases • Task: (Example: component separation, used in chromatography) *http://www.matrixscience.com/

  27. Semantic Annotation of Scientific Data <ms/ms_peak_list> <parameter instrument=micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer mode = “ms/ms”/> <parent_ion_mass>830.9570</parent_ion_mass> <total_abundance>194.9604</total_abundance> <z>2</z> <mass_spec_peak m/z = 580.2985 abundance = 0.3592/> <mass_spec_peak m/z = 688.3214 abundance = 0.2526/> <mass_spec_peak m/z = 779.4759 abundance = 38.4939/> <mass_spec_peak m/z = 784.3607 abundance = 21.7736/> <mass_spec_peak m/z = 1543.7476 abundance = 1.3822/> <mass_spec_peak m/z = 1544.7595 abundance = 2.9977/> <mass_spec_peak m/z = 1562.8113 abundance = 37.4790/> <mass_spec_peak m/z = 1660.7776 abundance = 476.5043/> <ms/ms_peak_list> 830.9570 194.9604 2 580.2985 0.3592 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043 ms/ms peaklist data Annotated ms/ms peaklist data

  28. Semantic annotation of Scientific Data <ms/ms_peak_list> <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer” mode = “ms/ms”/> <parent_ion_mass>830.9570</parent_ion_mass> <total_abundance>194.9604</total_abundance> <z>2</z> <mass_spec_peak m/z = 580.2985 abundance = 0.3592/> <mass_spec_peak m/z = 688.3214 abundance = 0.2526/> <mass_spec_peak m/z = 779.4759 abundance = 38.4939/> <mass_spec_peak m/z = 784.3607 abundance = 21.7736/> <mass_spec_peak m/z = 1543.7476 abundance = 1.3822/> <mass_spec_peak m/z = 1544.7595 abundance = 2.9977/> <mass_spec_peak m/z = 1562.8113 abundance = 37.4790/> <mass_spec_peak m/z = 1660.7776 abundance = 476.5043/> <ms/ms_peak_list> Annotated ms/ms peaklist data

  29. Syntax for Onologies and Metadata • Why not use XML? • Why use OWL? • Or for that matter why RDF? • So many questions …

  30. From XML to OWL NO SEMANTICS • XML • surface syntax for structured documents • imposes no semantic constraints on the meaning of these documents. • XML Schema • is a language for restricting the structure of XML documents. • RDF • is a datamodel for objects ("resources") and relations between them, • provides a simple semantics for this datamodel • these datamodels can be represented in an XML syntax. • RDF Schema • is a vocabulary for describing properties and classes of RDF resources • with a semantics for generalization-hierarchies of such properties and classes. • OWL • adds more vocabulary for describing properties and classes: • relations between classes (e.g. disjointness), • cardinality (e.g. "exactly one"), • equality, richer typing of properties, • characteristics of properties (e.g. symmetry), and enumerated classes. Expressive Power Relationships as first class objects– key to Semantics SEMANTICS http://en.wikipedia.org/wiki/Semantic_web#Components_of_the_Semantic_Web

  31. From an alphabet to a Language • XML • “XML is only the first step to ensuring that computers can communicate freely. XML is an alphabet for computers and as everyone traveling in Europe knows, knowing the alphabet doesn’t mean you can speak Italian of French.” – Business Week, March 18th 2002 • Example cited by Nicola Guarino in http://www.w3c.it/events/minerva20040706/guarino.pdf • RDF/RDFS and OWL would therefore be akin to the language computers use to communicate • And ontologies represented in these languages would be akin to the exact interpretations of the concepts being communicated

  32. Syntax for Onologies and Metadata • RDF • A simple W3C standard used to describe Web resources • Relationships in RDF (Properties), are binary relationships between two resources or a resource and a literal • Resources take on the roles of Subject and Object respectively. • The Subject, Predicate and Object compose an RDF statement http://www.w3.org/RDF/

  33. What is RDF? • Resource Description Framework • Proposed as the base semantic web language • Data model for describing properties of resources • Statements about properties and values of web resources • Machine-understandable metadata

  34. RDF Elements • Resource: • Something that can be described/referenced • Identified by a URI • Property: • Relationship from a resource to a value: • Another resource • An atomic value/literal • Statement: • resource -> property -> value

  35. RDF Statement

  36. RDF Model • Formal Data Model • Directed labeled graph • Nodes: resources or literals • Edges: properties (relationships/attributes) • Labels: URIs of nodes and edges • Collection of triples • subject (resource) • predicate (property) • object (resource or literal) • W3C recommendation

  37. Graph Model

  38. Triple Model

  39. RDF Syntax • Formal syntax • Encoded in XML • Unambiguous property names and values • RDF adds rules for interpretation • W3C recommendation

  40. Example <sample:Athlete rdf:about="&sample;Kobe_Bryant"> <rdfs:label xml:lang="en">Kobe Bryant</rdfs:label> <sample:plays_for rdf:resource="&sample;LA_Lakers"/> </sample:Athlete> <sample:Athlete rdf:about="&sample;Shaquille_ONeal"> <rdfs:label xml:lang="en">Shaquille O'Neal</rdfs:label> <sample:plays_for rdf:resource="&sample;Miami_Heat"/> </sample:Athlete> <sample:Team rdf:about="&sample;LA_Lakers" <rdfs:label xml:lang="en">LA Lakers</rdfs:label> </sample:Team> <sample:Team rdf:about="&sample;Miami Heat" <rdfs:label xml:lang="en">Miami Heat</rdfs:label> <sample:competes_with rdf:resource="&sample;LA_Lakers"/> </sample:Team> <sample:Coach rdf:about="&sample;sample1_Instance_8" <rdfs:label xml:lang="en">sample1_Instance_8</rdfs:label> <sample:coaches rdf:resource="&sample;LA_Lakers"/> </sample:Coach>

  41. What is RDFS? • RDF Vocabulary Description Language • (RDF Schema) • Extension of RDF: same data model • graph or triples • A hierarchy of classes • A hierarchy of properties relating classes • W3C recommendation

  42. RDF Schema RDF Instances

  43. “Abdulaziz” “Marwan” “Alomari” “Al-Shehhi” typeOf(instance) String purchased Passenger Ticket subClassOf(isA) fname for String subPropertyOf number lname forflight String paidby purchased no creditedto Flight Bank Account String Customer Payment amount holder float ffid FFlyer fflierno FFNo String CCard Cash Client &r4 ffid “XYZ123” &r11 holder fflierno “M’mmed” fname purchased &r2 paidby &r3 &r1 “Atta” creditedto lname paidby purchased fname for &r5 &r6 lname fname paidby &r7 purchased &r8 &r9 holder lname

  44. RDFS Core Classes • rdfs:Class • Class of resources that are RDF classes • Instance of rdfs:Class • rdfs:Resource • All things being described • The class type of everything in RDF(S) • Instance of rdfs:Class • rdf:Property • Class of RDF properties • Instance of rdfs:Class http://www.w3.org/TR/rdf-schema/

  45. RDFS Core Properties • rdfs:type • A resource is an instance of a class • Instance of rdf:Property • rdfs:subClassOf • All instances of a class are also instances of another class • Instance of rdf:Property • rdfs:subPropertyOf • All resources related by one property are also related by another property • Instance of rdf:Property

  46. RDF Core Properties • rdfs:range • All values of a property are instances of one or more class • The value MUST be an instance of all range classes • Instance of rdf:Property • rdfs:domain • All resources with the given property are instances of one or more class • The resource MUST be an instance of all domain classes • Instance of rdf:Property

  47. OWL, W3C definition • “language for defining structured, Web-based ontology which enables richer integration and interoperability of data across application boundaries” http://www.w3.org/2004/OWL/

  48. OWL Use Cases • Web portals • Multimedia Collections • Corporate web site management • Design documentation • Agents and services • Ubiquitous computing

  49. OWL Design Goals • Shared ontologies • Ontology evolution • Ontology interoperability • Inconsistency detection • Expressivity vs. scalability • Ease of use • Compatibility with other standards • Internationalization

  50. What’s in OWL, but not in RDF • Ability to be distributed across many systems • By means of owl:imports (similar to ‘include’ in C/C++) • Scalable to Web needs (?) • Compatible with Web standards for: • accessibility, and • Internationalization • Open and extensible

More Related