220 likes | 328 Vues
Electronic Notebooks: An Interface Component for Semantic Records Systems. James D. Myers , Michael Peterson, K Prasad Saripalli, Tara Talbott Mathematics and Computational Science Directorate Pacific Northwest National Laboratory. Outline. Why have an electronic notebook?
E N D
Electronic Notebooks: An Interface Component for Semantic Records Systems James D. Myers, Michael Peterson, K Prasad Saripalli, Tara Talbott Mathematics and Computational Science Directorate Pacific Northwest National Laboratory
Outline • Why have an electronic notebook? • The changing science/IT landscape • Semantic repositories • Scientific Annotation Middleware • ENs on semantic repositories • The ELN on SAM 2
PNNL Electronic Laboratory Notebook (ELN)~1995+ • Secure shared WWW based space • Hierarchical Chapters/Pages/Notes • Add/View/Search Notes • File upload, sketch, text, equations, forms, image capture, … • Interactive views of data • Editor/Viewer APIs • Cross-out capability • Digital Signatures/Timestamps • Java Client, Perl and Java (2001+) servers • … 3
What distinguishes ENs from other tools? • Emphasis on multimedia human-entered information • Chronological, page-oriented display • Master/personal project record • Records functionality: • Non-repudiation - digital signatures and timestamps • Persistence/completeness - write-once/no deletions/audit trail • Standardized lifecycle – signing/witnessing policies, archiving, retention schedules, … 4
Community Resources Bi-directional flow/feedback of information Partial results being combined to produce new knowledge Experiment/Theory/Model comparisons Multiscale optimizations Rapid Evolution High Complexity Shifting/Emerging disciplinary boundaries Resources will be distributed With multiple curators The Systems Science Revolution 5
Advances in Problem Solving Environments/Grids/Semantic Technologies • Multiple Applications recording data Pedigree/Provenance • Experiment Metadata • Project Organization • Workflow • Categorization • Detected Features • Instrument logs • … • Replica Locations • Endorsements • Community Annotations • … • How do we provide EN capabilities in this larger context? 6
Semantic Repositories • Use self-describing metadata/relationships • Triple-stores • RDF • OWL • Aggregate information generated by multiple applications • Allows browsing, searching, reasoning across integrated information 7
Scientific Annotation Middleware (SAM) - 5 yr DOE funded research project • Develop middleware to create semantic repositories • Enable the sharing of this information among portals and problem solving environments, software agents, scientific applications, and electronic notebooks • With different levels of sophistication • Without global schema • Improve the completeness, accuracy, and availability of the scientific record. http://www.scidac.org/SAM/ 8
Database Web SAM Architecture Notebook Services Semantic Services DAV, JDBC, GridFTP DAV, DASL, JMS, SAM Extensions Metadata Services DataGrid 9
Web Distributed Authoring and Versioning (WebDAV) • An early web service • Put/Get data with arbitrary properties (dynamic) • Properties can be discovered and accessed independently • DASL, Versioning, Transactions, … • Widely supported (MS Office, databases, file system drivers,…) 10
File Format 1 BFD Parser BFD Description 1 XML Format XML Format XSLT 1 2 Processor XSL Stylesheet (reformat) Binary Format Description (BFD) Language • XML Language to describe ASCII, Binary, and XML data formats • Generic Parser to extract and semantically tag data in files/streams • The meaning of data can be captured, regardless of format, for future use • Data Format Description Language Standard <XSIL> <Param Name=“units”>meters</Param> <Param Name=“numColumns” Type="int“/> <Vector Name=“orbitData”> <Dim><XBFDvalue-of select="/XSIL/Param [@Name='numColumns']"/> </Dim> <Dim>4</Dim> </Vector> <Stream Type=“remote” XBFDStreamnumber=“0” Encoding = “biinary”/> </XSIL> 11
SAM Metadata Services Layer • Jakarta Slide DAV server plus configurable: • Mapping to Data Store(s) • Property Generation from binary/ASCII/XML files • Dynamic Virtual Translations • Server generated Properties and Relationships • Timestamp, size, CopyOf DAV+ RDF Export Fortran Application … ELNProp1 Prop2 Translated Content BFD Web Service XSLT Prop1 hastranslation ‘Local Disk’ Content DAV 12
SAM Semantic Services Layer • SAM Metadata Layer plus configurable: • Relation-scoped Queries • Translation of DAV Properties to RDF Triples • RDF/GXL Pedigree Generation • … 13
Back to ENs… • What is needed to be able to provide • Unstructured human entry of information? • Chronological, page-oriented display? • A master/personal project record? • Records functionality? 14
Creating Notes • A ‘standard’ ELN client can create notes • Stored as content with a hasNote relationship with pages, notes • Plus…any app can store notes the same way • Page generation – works as before 15
ENs as a Primary View? • Instruments, PSEs, etc. may organize parts of the experiment that an EN should not duplicate define other relationships as part of the EN chapter/page/note hierarchy: Notebook1 Chapter1 Chapter2 Page1 Page2 Project Experiment1 Experiment2 Data1 Data2 Defined by PSE Interpreted by EN 16
Records? • Digital Signatures, Timestamps, etc. are services that can be exposed as repository services and associated metadata • But • What do we sign (content/metadata)? • Where is the edge of the record? • How deep do we travel through the web of relations? • How do we stop other applications from changing/deleting signed content? 17
Multiple Options • Simple: • Sign content plus defined subset of metadata • Stop at edge of server • Treat relationship cycles as links • Lock content and metadata subset when signed • Advanced: • Multiple self-describing signatures (e.g. XMLSignature) • Allow records across servers via trust, cached metadata/data • Define fine-grained retention schedules 18
SAM Notebook Services Layer • SAM Metadata and Semantic Layers plus: • Notebook Management, Page Display, … • Digital Signatures • Canonicalization • Notarized Timestamps • Data/Signature Migration Capabilities • Notebook API, Notebook Components • Supports ELN 5.1, Annotation Applet, new portal-based EN client EN Portlets 19
Collaborations • Collaboratory for Multiscale Chemical Science (CMCS) • SAM as primary data system, pedigree, notebook • NEESgrid/CHEF Portal/NMI Grid User Computing Environments • ELN, SAM as a metadata/pedigree store? • Genomes-To-Life • SAM as annotation/metadata repository, notebook • Internal PNNL Projects • Concept Map Repository, Interface to Lustre, Biological Data Annotation • DOE2000 Notebook Community (1500+ email addresses) • Upgrades to DOE2K Notebooks • E.g. Columbia University Environmental Science Lab Notebooks 20
A Scientific Content Repository Vision • Notebooks become just one view of the scientific information • Applications contribute data, metadata, and relationships directly • Records functionality provided by middleware, available to multiple applications • Content is stored in multiple repositories managed independently • The scientific record becomes richer and re-integrated 21
Mathematical, Information and Computational Sciences Division of the Office of Science Acknowledgments • Carina Lansing, PNNL • Al Geist, Jens Schwidder, David Jung, ORNL • U.S. Department of Energy • Pacific Northwest National Laboratory • Pacific Northwest National Laboratory is a multiprogram national laboratory operated by Battelle Memorial Institute for the U.S. Department of Energy under Contract DE-AC06-76RL0 1830 • Oak Ridge National Laboratory • Oak Ridge National Laboratory is a multiprogram national laboratory operated by UT-Battelle, LLC for the U.S. Department of Energy under Contract DE-AC05-00OR22725 22