CASPAR All Hands #1 July 2006 Prague
Aims of the meeting • To think • To make sure we produce something good • If we aim only at perfection then we will fail • To try to enable us to do something great • To develop a common view of what needs to be done • To develop more detailed plans • To ensure the CASPAR team works together • Use OAIS terminology • Avoid using the word “Metadata” • To anticipate problems and find solutions
EU guidelines: What makes a well-managed project Keeping track • Your project should have been extensively planned at the proposal stage. However, it is in the nature of research that things will soon start to go off the planned track. Active monitoring is essential, and early decisions to take corrective action or amend plans must be agreed so that control of the project is not lost. There are various methodologies available to help, as well as commercial software and training courses. The 'Kick-off' meeting is a good time to establish positive working practices among all partners and set the tone for all future conduct. This important phase of the project should itself be well prepared and managed so that good practices are firmly established by consent. • Project management is not only about doing the work and getting paid, but also protecting, publishing, and utilising the knowledge generated. Management of intellectual property and exploitation of results, both the anticipated direct results and any unexpected spin-offs, is fundamental to achieving a highly rated project. A project Web site with both public and private areas can be a useful tool both for project management and or stimulating dissemination and use of your knowledge. Remedial actions • Project contractors must take responsibility for keeping each other in line. It is not the Commission's job to police your project. If one or more partners are not meeting their obligations the project must have the mechanisms to warn them, impose sanctions on them, and eventually to reject them from the partnership
Deliverables • Month 3 (end June) • Web site • Quality Plan • Quarterly report • Month 6 (end Sept) • Key Performance Indicators • Balanced Scorecard tool
Issues of Digital Preservation • Fundamental issues • OAIS Reference Model • Things outside/beyond OAIS
BITS are the important things • In the future there will be sequences of bits, plus some other things. • We need to ensure that those sequences of bits are understandable/useable • It will avoid confusion if we distinguish between those sequences of bits which are mostly: • Rendered • e.g. images, recorded sound/video • Not rendered – processed • e.g. science data • There is overlap
Information is the important thing Information: Any type of knowledge that can be exchanged. In an exchange, it is represented by data. An example is a string of bits (the data) accompanied by a description of how to interpret a string of bits as numbers representing temperature observations measured in degrees Celsius (the representation information). • What information? • Documents…… • Data……. • Original bits? • Look and feel? • Behaviour? • Performance?
Can one just guess? <VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.1 http://www.ivoa.net/xml/VOTable/v1.1" xmlns="http://www.ivoa.net/xml/VOTable/v1.1"> <RESOURCE> <TABLE name="6dfgs_E7_subset" nrows="875"> <PARAM arraysize="*" datatype="char" name="Original Source" value="http://www-wfau.roe.ac.uk/6dFGS/6dfgs_E7.fld.gz"> <DESCRIPTION>URL of data file used to create this table.</DESCRIPTION> </PARAM> <PARAM arraysize="*" datatype="char" name="Comment" value="Cut down 6dfGS dataset for TOPCAT demo usage."/> <FIELD arraysize="15" datatype="char" name="TARGET"> <DESCRIPTION>Target name</DESCRIPTION> </FIELD> <FIELD arraysize="11" datatype="char" name="DEC" unit="DMS"> <DATA> <FITS> <STREAM encoding='base64'> U0lNUExFICA9ICAgICAgICAgICAgICAgICAgICBUIC8gU3RhbmRhcmQgRklUUyBm b3JtYXQgICAgICAgICAgICAgICAgICAgICAgICAgICBCSVRQSVggID0gICAgICAg ICAgICAgICAgICAgIDggLyBDaGFyYWN0ZXIgZGF0YSAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgIE5BWElTICAgPSAgICAgICAgICAgICAgICAgICAgMCAv IE5vIGltYWdlLCBqdXN0IGV4dGVuc2lvbnMgICAgICAgICAgICAgICAgICAgICAg <family> <father>John</father> <mother>Mary</mother> <son>Paul</son> </family>
Can we just read/stare? representation information rules sfqsftfoubujpo jogpsnbujpo svmft You have a file JHOVE tells you it is WORD version 7
Can we specify: • What to preserve? • How to preserve it? • …… • No!
What is actually needed? • Can we simply ask people who run archives?
Unending supplies of …………….. MONEY Which is not realistic!!
So….. • Need to cover many issues including financial, scientific, technical, legal and sociological ones. • How to publish • How to preserve the bits • How to preserve information • Terminology • Conceptual foundations • … • While still being practical…….and with a hope of being testable Question: how can you tell if someone has done a really good job in preserving your digital information? Answer: live a very long time
What can we rely on in the Long Term? • The bits - let us for the moment put to one side the issue of BIT PRESERVATION (but it is an issue) • Paper documents that people can read • ISO standards • Additional information we collect – either now in the far future by our successors • Some kind of remote access • Some kind of computers • People?
Auxiliary data Space/Time Separation Application data Producer packaging DATA DATA + DOCUMENTATION DOCUMENTATION Local Notes & Conventions Lost Documentation Information leakage
OAIS • Enter Consultative Committee for Space Data Systems (CCSDS) and ISO TC 20/SC 13 • No framework widely recognized for developing specific digital archive standards • Begin by developing a ‘Reference Model’ to establish common terms and concepts • Ensure broad participation, including traditional archives (Not restricted to space communities; all participation is welcome!) • Focus on data in electronic forms, but recognize that other forms exist in most archives • Follow up with additional archive standards efforts as appropriate
What is a Reference Model? • A framework • for understanding significant relationships among the entities of some environment, and • for the development of consistent standards or specifications supporting that environment. • A reference model • is based on a small number of unifying concepts • is an abstraction of the key concepts, their relationships, and their interfaces both to each other and to the external environment • may be used as a basis for education and explaining standards to a non-specialist.
Organizational Approach • An “Open” process • Important to stimulate dialogue with broad archive/user communities • Results of US and International workshops put on WEB • Support e-mail comments/critiques • Broad international workshops held • Many workshops in USA sponsored by NASA plus workshops in UK and France • Contributions from many disciplines – Libraries, Commerce, Space Agencies, Users….. • Issue resolution at ISO/Consultative Committee for Space Data Systems international workshops
Technical Approach • Investigate other Reference Models. • ISO “Seven Layer”Communications Reference Model • ISO Reference Model for Open Distributed Processing • ISO TC211 Reference Model for Geomantics • Define what is meant by ‘archiving of data’ • Break ‘archiving’ into a few functional areas (e.g., ingest, storage, access, and preservation planning) • Define a set of interfaces between the functional areas • Define a set of data classes for use in Archiving • Choose formal specification techniques • Data flow diagrams for functional models and interfaces • Unified Modeling Language (UML) for data classes
Results • Reference Model targeted to several categories of reader • Archive designers • Archive users • Archive managers, to clarify digital preservation issues and assist in securing appropriate resources • Standards developers • Adopted terminology that crosses various disciplines • Traditional archivists • Scientific data centers • Digital libraries
Reference Model Status • Already widely adopted as starting point in digital preservation efforts • Digital libraries (e.g., Netherlands National Library) • Traditional archives (e.g., US National Archives) • Scientific data centers (e.g., National Space Science Data Center) • Commercial Organizations (e.g., Aerospace Industries Association preservation working team) • Approved for publication as final CCSDS and ISO (14721:2002) standards • CCSDS version is available at: • http://www.ccsds.org/documents/pdf/CCSDS-650.0-B-1.pdf • ISO version is CCSDS doc plus cover sheet
Reference Model for anOpen Archival Information System Technical Overview
Open Archival Information System (OAIS) • Open • Reference Model standard(s) are developed using a public process and are freely available • Information • Any type of knowledge that can be exchanged • Independent of the forms (i.e., physical or digital) used to represent the information • Data are the representation forms of information • Archival Information System • Hardware, software, and people who are responsible for the acquisition, preservation and dissemination of the information
Document Organization • Introduction • Purpose and Scope, Applicability, Rationale, Road Map for Future Work, Document Structure, and Definitions of Terms • OAIS Concepts and Responsibilities • High level view of OAIS functionality and information models • OAIS external environment • Minimum responsibilities to become an “OAIS” • Detailed Models • Functional model descriptions and information model perspectives • Preservation perspectives • Media migration, compression, format conversions, and access service preservation • Archive Interoperability • Criteria to distinguish types of cooperation among archives • Annexes • Scenarios of existing archives, compatibility with other standards
Purpose, Scope, and Applicability • Framework for understanding and applying concepts needed for long-term digital information preservation • Long-term is long enough to be concerned about changing technologies • Starting point for model addressing non-digital information • Provides set of minimal responsibilities to distinguish an OAIS from other uses of ‘archive’ • Framework for comparing architectures and operations of existing and future archives • Basis for development of additional related standards • Addresses a full range of archival functions • Applicable to all long-term archives and those organizations and individuals dealing with information that may need long-term preservation • Does NOT specify any implementation
Model View of an OAIS Environment • Producer is the role played by those persons, or client systems, who provide the information to be preserved • Management is the role played by those who set overall OAIS policy as one component in a broader policy domain • Consumer is the role played by those persons, or client systems, who interact with OAIS services to find and acquire preserved information of interest OAIS (archive) Producer Consumer Management
OAIS Responsibilities The OAIS must: • Negotiate for and accept appropriate information from information Producers. • Obtain sufficient control of the information provided to the level needed to ensure Long-Term Preservation. • Determine, either by itself or in conjunction with other parties, which communities should become the Designated Community and, therefore, should be able to understand the information provided. • Ensure that the information to be preserved is Independently Understandable to the Designated Community. In other words, the community should be able to understand the information without needing the assistance of the experts who produced the information. • Follow documented policies and procedures which ensure that the information is preserved against all reasonable contingencies, and which enable the information to be disseminated as authenticated copies of the original, or as traceable to the original. • Make the preserved information available to the Designated Community.
Designated Community • Designated Community: An identified group of potential Consumers who should be able to understand a particular set of information. The Designated Community may be composed of multiple user communities.
Designated Community examples • general English reading public educated to High School and above, with access to a Web Browser (HTML 4.0 capable) • GIS data: GIS researchers - undergraduates and above, having an understanding of the concepts of Geographic data; having access to current (2005, USA) GIS tools/computer software e.g. ArcInfo? (2005) • Astronomer (undergraduate and above) with access to FITS software such as FITSIO, familiar with astronomical spectrographic instruments • Student of Middle English with an understanding of TEI encoding and access to an XML rendering environment. • Variant 1: Cannot understand TEI • Variant 2: Cannot understand TEI and no access to XML rendering environment • Variant 3: No understanding of Middle English but does understand TEI and XML • A DC consisting of two groups: the publishers of scholarly journals and their readers, each of whom have different rights to access material and different services offered to them.
Outline of Talk • Problems which OAIS addresses • Reference Model overview • Issues & Problems • Some Applications • Follow-on Activities
Misunderstandings about the Designated Community • It is a homogeneous community • NO • Who defines it? • The archive
Misunderstandings… • “The OAIS Designated Community concept assumes a identifiable and relatively homogenous user community…” • WRONG • Just saying “we preserve for everyone” is not believeable/testable
OAIS Information Definition • Information is always expressed (i.e., represented) by some type of data • Data interpreted using its Representation Information yields Information • Information Objectpreservation requires clear identification and understanding of the Data Object and its associated Representation Information Interpreted Using its Yields Data Object Representation Information Information Object
Information Package Definition Preservation Description Information Content Information • An Information Packageis a conceptual container holding two types of information • Content Information • Preservation Description Information (PDI)
Information Package Variants • Submission Information Package • Negotiated between Producer and OAIS • Sent to OAIS by a Producer • Archival Information Package • Information Package used for preservation • Includes complete set of Preservation Description Information for the Content Information • Dissemination Information Package • Includes part or all of one or more Archival Information Packages • Sent to a Consumer by the OAIS
External Data Flow View Producer Submission Information Packages OAIS Archival Information Packages queries result sets orders Dissemination Information Packages Consumer
Detailed Models Overview
Overview of Detailed Models • It was decided to do both a functional and an information model of the OAIS • Both models were aimed at: • Using the models to better communicate OAIS Concepts • Using a well established, formal modeling technique • Staying as implementation independent as possible • Avoiding detailed designs
Detailed Models Information Model