400 likes | 580 Vues
Max Völkel 6.9.2007 I-Semantics, Graz. A Semantic Web Content Model and Repository. Outline. Motivation Analysis: Web vs. Semantic Web Developing a unified Semantic Web Content Model In three easy steps Implementation. How to model structure + content in one model?. Background
E N D
Max Völkel 6.9.2007I-Semantics, Graz A Semantic Web Content Model and Repository
Outline • Motivation • Analysis: Web vs. Semantic Web • Developing a unified Semantic Web Content Model • In three easy steps • Implementation
How to model structure + content in one model? • Background • Wikis, Personal Semantic Wikis, Semantic Desktop, … • Two motivations: • Bring flexibility and expressivity of RDF to the end-user • Allow RDF to model and represent content as well – not only its metadata • Goal: Unified Model • As usable as the web • But how to represent semantics? Semantic queries? • As expressive and as flexible as the semantic web • How to represent binary data (desktop files, web resources) in RDF? • Unified search • “Give me all papers written by author X which contain Y”
Granularity: Gets smaller Web 1.0: homepages, portals Web 2.0: micro-content Renderable representations Freedom of formalisation Less semantic HTML is less portable, but works Analysis: The Web HTTP URI Representation Encoding meta-data ChangeDate MimeType HTML, JPG, CSS, JS, PDF, … HTTP Content
Flexible, very expressive Not expressive enough: Literals cannot be addressed Statements cannot be addressed (but reified) 10 different node types complex for end-users Exising formal knowledge can be re-used Analysis: The Semantic Web
Requirements for a SWCM • Content granularity • Expressivity • Binary Content • Freedom of formalisation • Human-usable • Renderable representations • Human-type- and memorizable names (e.g. like WikiWords) • Inverse Relations • Knowledge re-use • Standard CMS features: • Access rights addressable parts • Versioning addressable parts
Comparison Feature Web Sem. Web Desired • Content granularity mid/large small any • Goal: From small comments to full web pages/files • Expressivity - + ++ • Binary Content ++ ~ ++ • Freedom of formalisation + - + • Human-usable ++ - ++ • Renderable representations • Human-type- and memorizable names (e.g. like WikiWords) • Inverse Relations • Knowledge re-use - ++ + • Standard CMS features: • Access rights addressable parts + ~ + • Versioning addressable parts + ~ +
1 Creating the SWCM: Step 1: A Human-Usable RDF
Step 1: A Human-Usable RDF • Items have a URI and can have a Literal • Addressable Literals URI Literal 0..1 Item
Step 1: A Human-Usable RDF • Statements connect Items • Expressivity of RDF URI Literal 0..1 Item source target Statement relation
Step 1: A Human-Usable RDF • Addressable Statements • Syntactic sugar over reification URI Literal 0..1 Item source target Statement relation
Step 1: A Human-Usable RDF • Address Items via human-type-able name (e.g. WikiWords) • Human-usable naming URI Literal 0..1 Item source target Statement NameItem relation
Step 1: A Human-Usable RDF • Statements (Item, NameItem, Item) • Decision that relations should be human-name-able URI Literal 0..1 Item source target Statement NameItem relation
Step 1: A Human-Usable RDF • Relations have always an inverse • Item-centric rendering easier for tools URI Literal 0..1 Item source target Statement NameItem relation inverse Relation
Step 1: A Human-Usable RDF • A Model contains Items • A Model has a URI Model URI Literal 0..n 0..1 Item source target Statement NameItem relation inverse Relation
2 Creating the SWCM: Step 2: Include Binary Content
Step 2: Include Binary Content • From addressable literals to addressable representations URI Literal 0..1 Item
Step 2: Include Binary Content URI Representation 0..1 Item
Step 2: Include Binary Content • Representations on the web have some built-in properties • Metadata: Mime-type, encoding, change-date • Data: the actual content itself URI Representation 0..1 Item
Step 2: Include Binary Content • Representations on the web have some built-in properties • Metadata: Mime-type, encoding, change-date • Data: the actual content itself URI Representation Encoding 0..1 ChangeDate Item MimeType Content
Step 2: Include Binary Content • In SWCM, representations have an author • Like in wikis, blogs, web pages, … • Can be „anonymous“ URI Representation Encoding 0..1 ChangeDate Item MimeType Content
Step 2: Include Binary Content • In SWCM, representations have an author • Like in wikis, blogs, web pages, … • Can be „anonymous“ URI Representation Encoding author 0..1 ChangeDate Item MimeType Content
3 Creating the SWCM: Step 3: Merge Step 1 and Step 2
The Semantic Web Content Model Model URI Representation Encoding author 0..n 0..1 ChangeDate Item source MimeType target Content Statement NameItem relation inverse Relation Structure Content
The Semantic Web Content Model • We expect end-users to understand the circled parts Model URI Representation Encoding author 0..n 0..1 ChangeDate Item source MimeType target Content Statement NameItem relation inverse Relation Structure Content
Swecr is implemented in two layers swecr.model interface swecr.core interface
The swecr.model API (see www.swecr.org) www.swecr.org IMimeType 0..n IRepository IModel RDF2Go.URI author IContent ChangeDate 0..n IItem IBinContent INameContent source target 0..1 IContentItem INameItem 1. Content of a INameItem is unique within its IModel. 2. Mimetype always = „text/plain“ IStatement IRelaton inverse
Swecr.core: Some Content stored in RDF www.swecr.org :FZI a swcm:NameItem , swcm:Item ;swcm:hasChangeDate "2007-08-24T16:07:29Z"^^xsd:dateTime ;swcm:hasContent “FZI Forschungszentrum Informatik" . :employs a swcm:NameItem , swcm:Item , swcm:Relation ;swcm:hasAuthor swcm:anonymous-author ;swcm:hasChangeDate "2007-08-24T16:07:32Z"^^xsd:dateTime ;swcm:hasContent “employs" ;swcm:hasInverse :employedBy . :worksFor a swcm:NameItem , swcm:Item , swcm:Relation ;swcm:hasAuthor swcm:anonymous-author ;swcm:hasChangeDate "2007-08-24T16:07:33Z"^^xsd:dateTime ;swcm:hasContent “works for" ;swcm:hasInverse :employs .
Implemented in two layers www.swecr.org <urn:rnd:-1d72b0a2:11498a0d25f:-7fff>a swcm:Item , swcm:Statement ;swcm:hasChangeDate "2007-08-24T16:07:30Z"^^xsd:dateTime ;swcm:stmtRelation :employs ;swcm:stmtSource :FZI ;swcm:stmtTarget :Max . • Statements stored in two RDF models: user model and index model Query answering :FZI :employs :Max . :Max :worksFor :FZI . redundant
ModelSetImpl But where to store binaries? www.swecr.org swecr.model interface swecr.core interface RDF ModelSet ? user model index model
BinStore – a simple binary store www.swecr.org • Intuition: The simplest web-like API, that would possibly work (and allow random-access) • Data model: URI Metadata + InputStream / OutputStream • Simple implementation on files • Future: Consider JCR • getReadHandle • InputStream readStream(); • getMimeType(), getSize() • getWriteHandle • writeStream( InputStream, MimeType ) • setMimeType( MimeType ) • getRandomAccessHandle • delete( URI ) Binary Store BinStoreImpl API
BinStoreImpl ModelSetImpl Persistence in an RDF ModelSet and a Binary Store www.swecr.org • Full text queries need a full text index swecr.model interface swecr.core interface RDF ModelSet Binary Store user model index model
The complete swecr.core www.swecr.org swecr.model interface swecr.core interface RDF ModelSet Binary Store Query Engine BinStoreImpl IndexingModelSet IndexingBinStore ModelSetImpl TextIndexImpl Bin2Text(Aperture) AdapterServer Existing component In progress Download from www.swecr.org
Example: Wiki-page • Example: A wiki-page in SWCM • Title of wiki page NameItem • Content of wiki page Item • Relation between title and page content Statement • Who uses it? • WavesWiki (part of BMBF project, http://waves.fzi.de) • SemFS a Semantic File System (presented at I-KNOW in 2006) • Conceptual Data Structures (end-user personal KM tool) • Interest from XWiki and Cognium Systems
SWCM is a content management model combining the usability of the web with the expressivity and flexibility of the semantic web Summary Item source target Statement NameItem inverse relation Future Work: • Refactor core layer into smaller parts (services) • Create RDF with binaries – API • Unified queries (like the LuceneSAIL or LARQ) • Crawling of external resources (index localled, stored remote) • From structured text to SWCM models (see paper) Relation