The Metadata Perspective

The Metadata Perspective Peter Kunszt CERN GGF10 PNPA Workshop, Berlin

Overview • Metadata – what is it? An overview. • Lessons learned • Requirements • Suggestions

Definition of Metadata Metadata is data too! ~ Descriptive data • Describe the data itself: what is the data about, parameters, characteristics, statistics, .. • Describe methods: algorithms, input/output parameters, .. • Describe middleware: service data, parameters, configuration, versions, owner, .. • Describe authentication and authorization: user lists, passwords, access control lists, tokens, .. • Describe modeling: UML diagrams, database schemata, .. • Describe history: provenance data, who has generated what data using what method, .. • Describe virtualization: virtual data generation parameters, pipelining • Describe operation: logging and monitoring, ..

Aspects of Metadata Bound to a context. • Semantics specific to the context. • Usage patterns specific to the semantics • Requirements specific to the context and semantics What is the context? • Application data – e.g. metadata on HEP events • Middleware specific – e.g. service description (storage, computing…) • Virtual Organization, resource provider – e.g. security policies • Logging and monitoring – e.g. LDAP, MDS, R-GMA, ..

Where is it? • Explicitly in the context (like the job description language, input output files, etc). Not the topic here, I talk about MD catalogs. • Dedicated catalog in application space. Examples • CMS RefDB • Atas Metdata Interface AMI • BaBaR Metadata Catalog • Dedicated catalog in middleware space • MCAT (virtual data catalog) • EDG Replica Metadata Catalog • VO management service • Metadata Grid Interface provisioning to existing catalogs • OGSA-DAI • Spitfire

How was it used to date? Lessons learned • Dedicated services like CMS RefDB work well. • Generic one-size-fits-all metadata catalogs are not used as much. (RMC) • Frameworks are hard to adopt and to use (Spitfire) • Lack of dedicated catalogs may lead to the abuse of monitoring and information services. • The boundary between application and middleware layer is blurred Conclusions • The narrower the context the better • Everyone doing their own metadata is good, BUT • Everyone defining a proprietary interface is bad • User controllable metadata is good

Requirements – ideas • Metadata catalogs must have a clear context • Differentiation between the grid middleware and application layer • Commonalities to be standardized on: • Common security mechanisms • Common exposure of interfaces (WSDL) • Common mechanism of describing the data content (like common methods to expose the schema) • Common query mechanisms • Common error reporting (SOAP Faults) • Catalogs should be able to call each other • Users should be able to store their own metadata (e.g. big success of SDSS SkyServer MyDB)

A Metadata Scenario • Virtual files / virtual collection concept (from HEPCAL) Query Interface needs standardization Metadata Catalog Virtual MD Query Result FileList File Catalog

Suggestions on how to proceed • Accept Web Service interfaces as the common base interface framework • Define interfaces inside application- and middleware-specific domains based on existing services and the specific needs of the given community. • Identify missing interfaces or required interfaces from clients and users • Compare the interfaces at a common forum (like this) • Define how to proceed: Factor out commonalities or standardize commonalities. The aim is to be interoperable. Propagate findings to groups in GGF wherever relevant, spawn new working group with a very specific focus! • Iterative process..

Conclusion • Metadata is closely tied to context and the semantics thereof. • Generic metadata services vs. specialized services: • Generic service to store key-value pairs might be useful to users to store their own data (exploit DAIS) • Try to use common mechanisms for security, discovery, query and error reporting. • Suggestion to work on specialized services, solving a well-understood problem of a user community. Identify commonalities as a second step (bottom up approach) • Maintain a good communication between metadata service providers – GGF can be the forum for this.

The Metadata Perspective

The Metadata Perspective

Presentation Transcript

Mind the Metadata

Metadata

The Metadata System

New Requirements for Dataset Metadata : a perspective from CODATA

DDI and Metadata from the Researcher's Perspective

Understand the Metadata… Be the Metadata…

Metadata

METADATA

European Metadata Initiatives: The METAe Metadata Engine

The Metadata Landscape:

Distributed Metadata with the AMGA Metadata Catalog

Metadata

Metadata

METADATA

Metadata

Metadata

METADATA

Metadata practice and direction: a community perspective

Metadata

Metadata

USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

Metadata practice and direction: a community perspective