1 / 42

APARSEN Metadata for preservation, curation and interoperability

APARSEN Metadata for preservation, curation and interoperability. Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and STFC. Digital Preservation. Ensure that digitally encoded information are understandable and usable over the long term

stella
Télécharger la présentation

APARSEN Metadata for preservation, curation and interoperability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. APARSENMetadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and STFC

  2. Digital Preservation • Ensure that digitally encoded information are understandable and usable over the long term • Long term could start at just a few years • Easy to make claims • Difficult to provide proof • Reference Model for Open Archival Information System (ISO 14721) • The basic standard for work in digital preservation • Defines terminology and compliance criteria

  3. Not just BIT preservation Definitions (OAIS) Not just rendering • Long Term Preservation:The act of maintaining information, Independently Understandable by a Designated Community, and with evidence supporting its Authenticity, over the Long Term. • Long Term:A period of time long enough for there to be concern about the impacts of changing technologies, including support for new media and data formats, and of a changing Designated Community, on the information being held in an OAIS. This period extends into the indefinite future. Information not just DATA or Documents Authenticity

  4. Basic concept • Digital preservation had been dominated by libraries and (state) archives • However there was a focus there on “rendered objects” and • Tendency to think data is an “easy” add-on HOWEVER • Need to deal with DATA – processed to new things, not just rendered • Need to follow OAIS – finer grained view • Need to test and prove that things work “metadata” “CASPAR banned the use of the term metadata unless absolutely necessary”

  5. Data… Level 2 GOME Satellite instrument data

  6. Contains numbers – need meaning

  7. ...to process to this

  8. ...or this

  9. ...through complex processing schemes

  10. Just Format? sfqsftfoubujpo jogpsnbujpo svmft You have a file JHOVE tells you it is WORD version 7

  11. ..with some extra information.. representation information rules Format Registries – useful but not enough: formats can be used for multiple purposes e.g. audio files used to store configuration parameters

  12. Examples (cont) • “504b0304140000000800f696….” • “This is a ZIP file which contains Word files, each of which contains an encoded message which needs the key ‘!D$G^AJU*KI’ to decode it using encryption method SHA7”

  13. Examples (cont) • LaTex file containing an EPS (Encapulated Postscript) version of an image • Web page containing Java Applet generating random numbers • SWISS-PROT data • Foreign Language emails

  14. XML enough? – can stare at this and probably understand it <family> <father>John</father> <mother>Mary</mother> <son>Paul</son> </family>

  15. ..but what about this? <VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.1 http://www.ivoa.net/xml/VOTable/v1.1" xmlns="http://www.ivoa.net/xml/VOTable/v1.1"> <RESOURCE> <TABLE name="6dfgs_E7_subset" nrows="875"> <PARAM arraysize="*" datatype="char" name="Original Source" value="http://www-wfau.roe.ac.uk/6dFGS/6dfgs_E7.fld.gz"> <DESCRIPTION>URL of data file used to create this table.</DESCRIPTION> </PARAM> <PARAM arraysize="*" datatype="char" name="Comment" value="Cut down 6dfGS dataset for TOPCAT demo usage."/> <FIELD arraysize="15" datatype="char" name="TARGET"> <DESCRIPTION>Target name</DESCRIPTION> </FIELD> <FIELD arraysize="11" datatype="char" name="DEC" unit="DMS"> <DATA> <FITS> <STREAM encoding='base64'> U0lNUExFICA9ICAgICAgICAgICAgICAgICAgICBUIC8gU3RhbmRhcmQgRklUUyBm b3JtYXQgICAgICAgICAgICAgICAgICAgICAgICAgICBCSVRQSVggID0gICAgICAg ICAgICAgICAgICAgIDggLyBDaGFyYWN0ZXIgZGF0YSAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgIE5BWElTICAgPSAgICAgICAgICAgICAgICAgICAgMCAv IE5vIGltYWdlLCBqdXN0IGV4dGVuc2lvbnMgICAgICAgICAgICAgICAgICAgICAg

  16. Performance Viewer: side-by-side comparison and validation of the transformation. From left to right: 3D visualization in Ogre3D, 3D model of the stage including the virtual dancer in VRML.

  17. Figure 8 Some aspects of acousmatic production

  18. Complex Simple Dynamic Static Complex Simple Static Dynamic Rendered Rendered Non-Rendered Non-Rendered

  19. Information Model & Representation Information Information Object 1+ interpreted interpreted using Data Representation 1+ using Object Information Physical Digital Object Object 1+ Bit Sequence The Information Model is key Recursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY (this knowledge will change over time and region)

  20. Representation Information Network

  21. Modules and Dependencies:defining the Designated Community FITS FILE MULTIMEDIA PERFORMANCE DATA FITS DICTIONARY FITS STANDARD DICTIONARY SPECIFICATION C3D DirectX MAX/MSP FITS JAVA s/w PDF STANDARD 3D scene data files 3D motion data files motion to music mapping strategy XML SPECIFICATION PDF s/w JAVA VM UNICODE SPECIFICATION README.txt ENGLISH LANGUAGE TEXT EDITOR WINDOWS XP

  22. FITS FILE DDL DESCRIPTION FITS STANDARD FITS DICTIONARY DDL SOFTWARE FITS JAVA SOFTWARE DICTIONARY SPECIFICATION DDL DEFINITION PDF STANDARD JAVA VM PDF SOFTWARE XML SPECIFICATION UNICODE SPECIFICATION

  23. In principle we could use this, plus the Dictionaries in order to understand the keywords in order to extract the numbers If we can run this then we can use this in a generic application to extract the numbers If we cannot run the Java Virtual Machine then we use this source code to re-write in another programming language such as C If we can run this then we can run the Java software to extract the numbers If we cannot run this then we can use an emulator or use its RepInfo to re-create a Java VM If we cannot run the DDL software then we can look at the DDL definition and write some software to extract the numbers

  24. Rep • Info • Virtualisation /DISCIPLINE

  25. Virtualisation

  26. Height Width Bits per Pixel 2-D array Height Width Bits per Pixel Co-ordinate system Time 2-D image Height Width Bits per Pixel Astronomical co-ordinate system Time – EPOCH Bandpass 2-D astronomical image

  27. Number of columns Names of columns Number of rows Value in cell at any row, column General Table Time series Science data table Number of columns Names of columns Number of rows Value in cell at any row, column Time corresponding to any row Number of columns Names of columns Number of rows Value in cell at any row, column Type of column value Column “metadata” Table “metadata”

  28. Root node Get the Root Get the number of children for a node Get child number “i” Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 6 Node 7 Node 8 Node 9

  29. Image Earth Observation Image Artistic Image Cultural Heritage Image Astronomical Image Optical Astronomical Image X-ray Astronomical Image

  30. described by delimited by Archival Information Package Package Description Packaging Information derived from identifies Preservation Description Information further described by Content Information

  31. Preservation Description Information Access Rights Information Reference Information Provenance Information Context Information Fixity Information

  32. has Representation Information Provenance has

  33. Cost sharing DRM • USE DATA • Use application to find data in Repository • Create DIP with enough RepInfo for the user (via DC profile) • Obtain more RepInfo from Registry if necessary Preservable infrastructure

  34. APARSEN Technical 2000 Spreading excellence 4000 Management 5000 Integration 1000 Economic/Legal 3000 3100: Digital Rights & access management 5100: Financial management 2100: Preservation Services 4100: External W/S & symposia 1100: Common Vision 2200: Identifiers & citabillity 4200: Formal qualifications 1200: Staff and experience exchange 3200: Cost /benefit data collection and modelling 5200: Technical co-ord. 2300: Storage solutions 4300: Training courses 1300: Common standards 5300: Evaluate impact of the Network of Excellence 3300: Peer Review & 3rd party Certification 2400: Authenticity & Provenance 4400: Awareness raising 1400: Common testing environments 3400: Brokerage services 2500: Interoperability & intelligibility 4500: Liaison with other stakeholders 3500: Data policies and governance 2600: Annotation, Reputation & data quality 1500: Internal W/S & symposia 4600: International liaison 2700: Scalability 3600: Business cases 1600: Common tools, software repository and market place JPA Spreading excellence JPA Research JPA Integration

  35. Technical 2000 Economic/Legal 3000 2100: Preservation Services 3100: Digital Rights & access management 2200: Identifiers & citabillity 3200: Cost /benefit data collection and modelling 2300: Storage solutions 2400: Authenticity & Provenance 3400: Brokerage services 2500: Interoperability & intelligibility 3300: Peer Review & 3rd party Certification 2600: Annotation, Reputation & data quality 3500: Data policies and governance 2700: Scalability 3600: Business cases JPA Research

  36. Persistent ID resolver RepInfo Registry Authenticity tools Processing Context Certification Orchestration/Brokering Knowledge Gap Manager Persistent ID resolver RepInfo Registry Authenticity tools Processing Context Certification Orchestration/Brokering Knowledge Gap Manager Storage Compute Resource Local Authentication Local Authorisation WAN LAN Router Switch Cable Interconnects Gateways Management WAN LAN Router Switch Cable Translators Thesauri Cross-references Discipline repositories Storage Compute Resource Local Authentication Local Authorisation Resource Registries Process ID Scheduler Shibboleth Repositories Users Automated systems Repositories Users Automated systems Discipline repositories Translators Thesauri Cross-references FUTURE • Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved • Non-maintainability of essential hardware, software or support environment may make the information inaccessible • The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity • Access and use restrictions may not be respected in the future • Loss of ability to identify the location of data • The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future • The ones we trust to look after the digital holdings may let us down

  37. Links • CASPAR – http://www.casparpreserves.eu • CASPAR Source code - http://sourceforge.net/projects/digitalpreserve/ • OAIS Reference Model -http://public.ccsds.org/publications/archive/650x0b1.pdf • and the updated draft is available from http://public.ccsds.org/sites/cwe/rids/Lists/CCSDS%206500P11/Overview.aspx • CASPAR Validation report http://www.casparpreserves.eu/Members/cclrc/Deliverables/caspar-validation-evaluation-report/at_download/file • PARSE.Insight: • www.parse-insight.eu • Alliance for Permanent Access: • www.alliancepermanentaccess.eu • Digital Curation Centre: • www.dcc.ac.uk

  38. END

More Related