260 likes | 356 Vues
Metadata. The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF. Sources of Knowledge for finding documents [DeRose99]. “ The user , including their current explicit query and any historical or profile information the system may have gained earlier.
E N D
Metadata The Semantic WebDirectories and Thesauri XML is not enough Topic maps RDF CS3352
Sources of Knowledge for finding documents [DeRose99] • “The user, including their current explicit query and any historical or profile information the system may have gained earlier. • The documents in the library or on the web, including their nominal "content" and whatever metadata has been attached • The world, about which the system may have certain information, such as dictionaries and thesauri of natural language terms; basic knowledge of object categories ("dog is-a animal"), and much more…” Text, image Mark-up, Links, Catalogue database Ontologies, Thesauri Knowledge CS3352
What is metadata? • Data cataloging resources • Administrative cataloguing: acquisition history, author… • Structural: size, image format… • Data describing the content and meaning of resources royal UK male trophy presenter, footballer trophy winner CS3352
Metadata Representation Expressive, so we can say what we want; Compositional, so that we can build complex terms out of simple pieces; Controlled,so we only say consistent and coherent things; Incremental, so we can keep adding descriptions CS3352
A standard for metadata defined by the digital library community Others: MARC, VRA… 15 Elements: Title Subject Description Creator Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights Dublin Core • Core elements defined in RFC 2413: • http://src.doc.ic.ac.uk/computing/internet/rfc/rfc2413.txt • http://www.ariadne.ac.uk • http://www.ukoln.ac.uk CS3352 From : Metadata for images, Michael Day http://www.ukoln.ac.uk
Metadata on the web yesterday • Meta tags CS3352
Metadata on the Web yesterday <?xml version="1.0" encoding="utf-8"?> <book isbn="0836217462"> <title>Being a Dog Is a Full-Time Job</title> <author>Charles M. Schulz</author> <character> <name>Snoopy</name> <friend-of>Peppermint Patty</friend-of> <since>1950-10-04</since> <qualification>extroverted beagle</qualification> </character> <character> <name>Peppermint Patty</name> <since>1966-08-22</since> <qualification>bold, brash and tomboyish</qualification> </character> </book> CS3352
World Wide Web • Tim Berners-Lee reprise… “... a goal of the Web was that, if the interaction between person and hypertext could be so intuitive that the machine-readable information space gave an accurate representation of the state of people's thoughts, interactions, and work patterns, then machine analysis could become a very powerful management tool, seeing patterns in our work and facilitating our working together through the typical problems which beset the management of large organizations.” Berners-Lee 1996 CS3352
Web = Data+Information-Knowledge Browse the Links Search using Words steamer steamer, tank Search using experience Link structure is content • rhetorical narratives Search using indexes Metadata and classifications CS3352
? Resource describing UK soccer players and their careers “Find a very successful European team-based sports person” • Metadata • Knowledge • Inference Resource listing sporting competitions including FA Cup and Superbowl Steve Redgrave’s home page Resource describing the Olympic Games Resource that lists teams that have won the FA Cup CS3352
Event nationality Country People win participates Competition holds Sport UK participants = 11 partof Europe Sports Person Tennis Tournament Rowing Soccer participants = 4 Sports Tournament Coxless Fours Soccer player Rower Olympic Games Rower win Olympic Games Soccer player wins FA Cup once Soccer Tournament Tennis Tournament UK Rower win Olympic Games > 2 times Wimbledon FA Cup CS3352
A Shared Understanding • Metadata • Data describing the content and meaning of resources • But everyone must speak the same language… • Terminologies • Shared and common vocabularies • For search engines, agents, curators, authors and users • But everyone must mean the same thing… • Ontologies • Shared and common understanding of a domain • Essential for exchange and discovery CS3352
Ontologies • “The [reusable] specification of conceptualizations, used to help programs and humans share knowledge”[Gruber93] • An ontology will include: • a vocabulary of terms, and • some specification of their meaning • structure on the domain and constrain the possible interpretations of terms [Uschold99] • precise notion of what meaning means Ontologies provide: • a shared and common understanding of a domain that can be communicated across people and applications CS3352
Ontology Precise notion of what meaning means • formal, explicit, rigour • unambigious • agents not just people • machine computable • from machine-readable to machine-understandable. • use knowledge representation and reasoning to supply the meaning CS3352
What is an Ontology? Thesauri “narrower term” relation Frames (properties) Formal is-a General Logical constraints Catalog/ ID Informal is-a Formal instance Disjointness, Inverse, part-of… Terms/ glossary Value Restrs. From Debbie McGuinness CS3352
Ontologies and E-Anything Simple ontologies provide: • Controlled shared vocabulary (search engines, authors, users, databases, programs all speak same language) • Organization (and navigation support) • Expectation setting (left side of many web pages) • Browsing support (tagged structures such as Yahoo!) • Search support (query expansion approaches such as FindUR, e-Cyc) • Sense disambiguation • Conflict detection • Structured, comparative search • Generalization/ Specialization • … From Debbie McGuinness CS3352
The Semantic Web • http://www.semanticweb.org CS3352
Metadata on the web tomorrow • Resources annotated with metadata using knowledge as a shared vocabulary • Metadata held outside the resource • Knowledge structures for holding the ontology • XML DTDs • Product classifications • Directories • Home > Recreation > Sports > Events > International Games > Olympic Games > • W3C: RDF and RDFS • Resource Description Framework • Topic maps • DAML+OIL CS3352
course title teacher students name http XML is not good for describing ontologies • XML defines grammars to verify and structure documents • The grammar enforces constraints on tags • Different grammars define the same content • XML lacks a semantic model – it only has a surface model which is a tree. <course date=“...”><title>...</title><teacher>...</teacher> <name>...</name> <http>...</http><students>...</students></course> • node = label + attr/values + contents CS3352
XML is not good for describing ontologies • Meaning of XML documents is intuitively clear • “semantic” markup tags are domain terms • But computers do not have intuition • Tag names per se do not provide semantics • The semantics are encoded outside the XML specification • XML makes no commitment on: • Domain specific ontological vocabulary • Ontological modelling primitives requires pre-arranged agreement on & Feasible for closed collaboration • agents in a small & stable community • pages on a small & stable intranet CS3352
XML DTDs and XML Schema • DTD does not distinguish between objects and relations • XML Schema’s type extension mechanism is a red herring – it can’t be used to model ontological subtypes • XML has been used as a serialisation syntax for other markup languages – e.g. SMIL, XOL <class> <name> person </name> </class> <slot> <name>year-of-birth</name> <domain.person</domain> <slot-cardinality>1</slot-cardinality> </slot> CS3352
Requirements for an Ontology-language • Well designed • Useful and proven modelling primitives • Intuitive to human users • Can say simple things simply • Expressive enough to capture many ontologies • Efficient, sound and complete reasoning support • Well defined • clear syntax - read ontologies • Formal semantics – understand (process) ontologies - to facilitate machine interpretation of that semantics; • Expressive enough to capture many ontologies • Compatible • Easy mapping to/from other ontology languages • Maximum compatibility with XML and RDF(S); CS3352
Sem Web Research Issues • Ontology creation • Millions of ontologies will be built • Ontology Engineering is difficult and time-consuming • Ontology Learning • Scalable RDF Repositories (all is built on top of the same data model !) • Infrastructure • Scalable reasoning services for different languages • Resource-ID Management • Versioning of ontologies and corresponding metadata CS3352
Sem Web Research Issues • Metadata Management • legacy data (HTML, XML, ...) -> legacy data migration: • Annotation of Web documents (HTML, PDF, ...) • Semi-automation using information extraction • XML-Wrapper / Transformer • Database Converter / Exporter • Maintenance of Metadata, ontologies and resources • sources, ontologies, and metadata have to be maintained in a consistent way • organizational process is needed • tools are needed • Metadata have to reflect changes of the sources • metadata have to reflect changes of the ontologies CS3352
Selected Semantic Web Projects • COHSE • http://inanna.ecs.soton.ac.uk/cohse/ • Ontobroker • http://ontobroker.aifb.uni-karlsruhe.de/ • SHOE • http://www.cs.umd.edu/projects/plus/SHOE/ CS3352