1 / 31

Tools for Next Generation of CMS: XML, RDF, & GRDDL

Tools for Next Generation of CMS: XML, RDF, & GRDDL. Chimezie Ogbuji (chee-meh) ‏ Cleveland Clinic Foundation Cardiothoracic Surgery Research ogbujic@ccf.org / chimezie@gmail.com. Background (CT Research Roadmap) ‏. A large, relational registry for Cardiothoracic procedures

Télécharger la présentation

Tools for Next Generation of CMS: XML, RDF, & GRDDL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Tools for Next Generation of CMS: XML, RDF, & GRDDL • Chimezie Ogbuji (chee-meh)‏ Cleveland Clinic Foundation Cardiothoracic Surgery Research ogbujic@ccf.org / chimezie@gmail.com

  2. Background (CT Research Roadmap)‏ • A large, relational registry for Cardiothoracic procedures • Relatively small research department with very little software engineering experience • Traditional CMS and DBMS were insufficient • Initiated a large effort to convert to a metadata-driven XML / RDF repository (SemanticDB)‏ • Need to replace a productive, integrated research pipeline • Data entry, clinical Q&A, patient follow-up, concurrent study management,... • 100+ research papers per year

  3. Background (Institute of Medicine Proposal)‏ • The Computer-Based Patient Record: An Essential Technology for Health Care • ISBN: 0309055326 • Old but very relevant set of requirements by the IOM (still unfulfilled). • A comprehensive attempt to address all the requirements: technological, clinical, procedural, etc.. • Can be (completely) addressed with Semantic Web architecture, document processing, and “Web 2.0” architecture.

  4. CPR: Functional Requirements • Uniform, extensible record content • (Standard) record formats • System performance • Linkages • Intelligence • Reporting Capabilities • Security • Multi-views • Accessiblity

  5. Definitions: KR / CMS • What is Knowledge Representation (KR)? • What is a Knowledge Base (KB)?: • A database system which facilitates deductive reasoning over a KR • Commonly called Rule-based Systems • What are Expert Systems? • What is a Content Management System (CMS)?

  6. Knowledge Representation • Older ideas at corners, newer ideas along sides (Credit: Conrad Barski, M.D.)‏

  7. Content Management System:The What • The terms CMS and Content Repository are essentially interchangeable • Modern content repositories are best characterized by JSR 170 / 283 • “.. a high-level information management system that is a superset of traditional data repositories” • Integrated support for the XPath data model is the most prominent feature (native document management)‏

  8. Content Repository Feature Set • Modern CMS standards cover document management effectively • Read/write access • Versioning • Event monitoring • Document-level access control • Concurrent access • Cross-linking • Profiles and Document Types

  9. Anatomy of a JSR 170 Implementation • Jack Rabbit • Component-based • Content Applications • Content Repository API • Implementation

  10. Knowledge Bases and CMS • What of the requirements that Expert Systems meet? • Document management and knowledge management systems are historically isolated from each other • XML & RDF are contemporary manifestations of these methodologies • They have remained as isolated as their predecessors • They typically only coincide with regards to syntax

  11. XML & RDF:Eating and Having your Cake • Classic example of where the document-oriented approach falls short: • Modern EHR cannot facilitate dynamic research • Unified infrastructure for document and knowledge management is needed • One of the earliest examples: • 4Suite Server version 0.10.0 (December 2000)‏ • Current state of the art (GRDDL): • Gleaning Resource Descriptions from Dialects of Language

  12. GRDDL:The Elevator Pitch • Provides a way to normalize RDF concrete syntaxes • The problem: • Many RDF concrete syntaxes (RDF/XML,Trix, RDFa,..)‏ • The authoritative concrete syntax is not without issues • The solution: • Define mappings from XML dialects to RDF graphs • Use turing-complete XML pipelines • English as a second language analogy

  13. The GRDDL Picture

  14. GRDDL:The Components • Faithful Rendition • “By specifying a GRDDL transformation, the author of a document states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.” • Various Mechanism for nominating transformations: • Specific XML attribute, XML Namespaces, HTML Profiles, and XHTML links • GRDDL-aware agents compute GRDDL results (RDF graphs)‏

  15. The CMS Alternative:“Dual Representation” • Persist XML in synchrony with its faithful rendition • Changes to the XML trigger calculation and storage of corresponding RDF • “Dual Representation” • Implemented by 4Suite Server Document Definitions • The basis of how we capture patient records with maximum syntactic and semantic expressivity

  16. Document Definition • The document definition is the mapping • Usually an XSLT document

  17. Content Repository Architecture

  18. Overlap between Content Repository APIs

  19. Dual Representation:Advantages • Maximum expressiveness and versatility of content • Unified naming convention and access control (more on this later)‏ • Uniform, concrete RDF syntaxes • For systems which speak XML fluently (XForms, POX over HTTP, WS-*, etc..)‏ • Cheap support for XML & RDF content negotiation • Use of RDF as a semantic index for XML

  20. Document Definition:Similarities • GRDDL • RDDL • Resource Directory Description Language • Human-readable descriptive material about a target • A directory of individual resources related to a target • Nature and Purpose • Schema, stylesheet, etc. • Lives at a namespace URI • WXS's targetNamespace • Common theme is a set of definitions for a document or a class of documents

  21. Registering a Document to a Class • Namespace registration works well for the web (preferred approach of W3C TAG)‏ • What if you don't control the content served from the namespace of an existing vocabulary? • Atom, Docbook, etc. • A CMS is better suited for a 'closed' / 'controlled' approach • Persist membership metadata in the CMS

  22. SemanticDB and Dual Representation

  23. Document and Graph Granularity • Tying documents to graphs normalizes the content granularity • Documents and their RDF graphs can be treated uniformly: • Naming convention • Targeted querying • Access control management

  24. JSR Fine-Grained Control

  25. 'Controlled' Naming Convention

  26. Controlled Naming Convention:Continued • RDF Dataset (from SPARQL): • A collection of named graphs • The RDF is stored in a graph with the same URI as the XML source document • When RDF is used as the primary cross-document 'index' you can: • SELECT ?graph WHERE { GRAPH ?graph { ... } } • document($graph)/.. XPath .. • The space compromise (of dual representation) can be further mitigated by only extracting a minimal RDF graph

  27. Uniform Access Control for XML/RDF CMS • Traditionally, Access Control Lists are associated with an object • Example: a file or directory in a filesystem • Assign document / graph ACLs to a single URI • Certain users / groups can query the RDF but cannot read the XML • De-identification of EHR: HIPPA • The 4Suite repository supports unified XML/RDF ACL

  28. Going Forward • The SPARQL RDF dataset needs to be generalized • There is a long list of representation problems solved by a formal named graph specification • RDF graphs need to be first-class objects in CMS • Build a common Content Repository API for XML / RDF on the JSR 170 / 283 foundation • Where do the 4Suite Repository API and JSR 170 / 283 overlap? • How do we generalize Document Definitions?

  29. A Proposal for XML/RDF CMS

  30. Primary Takeaways • We need to stop thinking of XML & RDF as mutually exclusive solutions to similar problems • CMS standards are needed for the next generation of semantic / rich web applications • These standards can preemptively level the landscape of toolkits in this space

  31. References • D. Nuescheler et al, JSR 170: Content Repository for Java • http://jcp.org/en/jsr/detail?id=170 • D. Connolly, Gleaning Resource Descriptions from Dialects of Language • http://www.w3.org/TR/grddl/ • J. Borden, T. Bray, Resource Directory Description Language • http://www.rddl.org/ • E. Prud'hommeaux, A. Seaborne, SPARQL Query Language for RDF • http://www.w3.org/TR/rdf-sparql-query/ • Fourthought Inc., 4Suite • http://4Suite.org

More Related