FRBR Applied to Scientific Data
210 likes | 359 Vues
FRBR Applied to Scientific Data. Joseph A. Hourclé 2008-Sept-22 ASIS&T PVC. About Me. Functional Requirements for Bibligraphic Records (FRBR). Reference Model for the design of bibliographic catalog systems. Defines four different concepts of ‘book’ that might be cataloged. Work
FRBR Applied to Scientific Data
E N D
Presentation Transcript
FRBR Applied to Scientific Data Joseph A. Hourclé 2008-Sept-22 ASIS&T PVC
Functional Requirements for Bibligraphic Records (FRBR) • Reference Model for the design of bibliographic catalog systems. • Defines four different concepts of ‘book’ that might be cataloged. • Work • Expression • Manifestation • Item
FRBR Group 1 Entities • Work • A distinct intellectual or artistic creation • Expression • The intellectual or artistic realization of a work in the form of alpha-numeric, … sound, image, object, movement, etc … • Manifestation • The physical embodiment of an expression of a work • Item • A single exemplar of a manifestation
What questions can we ask of each level? • Work • Who wrote it? What is the subject? • Expression • What language is it in? • Manifestation • What size is the font or book? • Item • Is the individual copy available to me?
Why ask these questons? • Work • Who wrote it? What is the subject? • Determine interest / Applicability • Expression • What language is it in? • Usability / Acccessibility (of content) • Manifestation • What size is the font or book? • Usability / Accessibility (of content within carrier) • Item • Is the individual copy available to me? • Availability / Accessibility (of the carrier)
Two Extra Entities • Sensor • Converts information about its environment to a digital signal • Observation • Data created by the sensor • Necessary to unambiguously track if two works are different interpretations of the same data
In this model … • Item • Is a logical item that might be identified via a URL. • Two items of the same manifestation would be bytewise identical copies • Manifestation • A logical embodiment, to include aspects of the carrier • How each datum is organized within the package • File format and encoding • Typically contains multiple expressions • Two manifestations of the same expression contain identical values within each datum
In this model … • Work • Calibrated state of the data • Translation of the sensor output to remove sensor issues or to physical units • Two works of the same observation would be interpretations of the same raw sensor data • Also includes catalogs and metadata • But through other expressions, not directly derived from the observation • Expression • The numeric values encoded in the file • Two expressions of the same work would have been generated from the same calibration of the observation
Limitations • Scientific Discipline • Each discipline has different requirements for attributes describing their data • Digital Objects • Does not deal with digitization from analog sources or generation of physical items • Non-Human Workflow • May need to model software and other aspects of the data workflow
Limitations • Data Collection vs. Data Granule • Do we model each successive data object, or the full set of aggregated objects? • Similar to tracking journals vs. articles • Individual Objects vs. Dynamic Packaging • Scientific archives are moving to packaging on distribution, rather than storing the data in files • Data Archives Without Attached Metadata • Metadata is tracked as a supplementary work that may be contained in the same manifestation to prepare for this eventuality
Sunspot on 15 July 2002 from the Swedish 1-m Solar Telescope on La Palma
http://virtualsolar.org/ joseph.a.hourcle@nasa.gov
Different Observations 171Å 195Å 284Å 304Å
Downsampled data 2x2 binned 5-min averages 8bit vs. 16bit pixels Lossy compression JPEG / JPEG2000 Datum extrapolation to fit a different coordinate system Any form of data loss Any form of data ‘creation’ to fill in missing data Different Expressions
Different Manifestations • Changes in Carrier / Packaging: • Different metadata attached • Different file formats • FITS vs. CDF vs. HDF • Different aggregation • individual images vs. an hourly collection
Different Items • Bytewise identical • Stored in different locations