100 likes | 232 Vues
This overview discusses the critical role of metadata in high energy physics (HEP), highlighting its various aspects, including descriptive data, methods, and contexts specific to applications and middleware. Key lessons learned indicate that specialized metadata services outperform generic catalogs, emphasizing the need for well-defined contexts and common standards. Suggestions for improvement focus on standardizing interfaces, enhancing interoperability, and ensuring user control over metadata. Collaboration in groups like GGF can further advance these initiatives for better metadata practices in scientific computing.
E N D
The Metadata Perspective Peter Kunszt CERN GGF10 PNPA Workshop, Berlin
Overview • Metadata – what is it? An overview. • Lessons learned • Requirements • Suggestions
Definition of Metadata Metadata is data too! ~ Descriptive data • Describe the data itself: what is the data about, parameters, characteristics, statistics, .. • Describe methods: algorithms, input/output parameters, .. • Describe middleware: service data, parameters, configuration, versions, owner, .. • Describe authentication and authorization: user lists, passwords, access control lists, tokens, .. • Describe modeling: UML diagrams, database schemata, .. • Describe history: provenance data, who has generated what data using what method, .. • Describe virtualization: virtual data generation parameters, pipelining • Describe operation: logging and monitoring, ..
Aspects of Metadata Bound to a context. • Semantics specific to the context. • Usage patterns specific to the semantics • Requirements specific to the context and semantics What is the context? • Application data – e.g. metadata on HEP events • Middleware specific – e.g. service description (storage, computing…) • Virtual Organization, resource provider – e.g. security policies • Logging and monitoring – e.g. LDAP, MDS, R-GMA, ..
Where is it? • Explicitly in the context (like the job description language, input output files, etc). Not the topic here, I talk about MD catalogs. • Dedicated catalog in application space. Examples • CMS RefDB • Atas Metdata Interface AMI • BaBaR Metadata Catalog • Dedicated catalog in middleware space • MCAT (virtual data catalog) • EDG Replica Metadata Catalog • VO management service • Metadata Grid Interface provisioning to existing catalogs • OGSA-DAI • Spitfire
How was it used to date? Lessons learned • Dedicated services like CMS RefDB work well. • Generic one-size-fits-all metadata catalogs are not used as much. (RMC) • Frameworks are hard to adopt and to use (Spitfire) • Lack of dedicated catalogs may lead to the abuse of monitoring and information services. • The boundary between application and middleware layer is blurred Conclusions • The narrower the context the better • Everyone doing their own metadata is good, BUT • Everyone defining a proprietary interface is bad • User controllable metadata is good
Requirements – ideas • Metadata catalogs must have a clear context • Differentiation between the grid middleware and application layer • Commonalities to be standardized on: • Common security mechanisms • Common exposure of interfaces (WSDL) • Common mechanism of describing the data content (like common methods to expose the schema) • Common query mechanisms • Common error reporting (SOAP Faults) • Catalogs should be able to call each other • Users should be able to store their own metadata (e.g. big success of SDSS SkyServer MyDB)
A Metadata Scenario • Virtual files / virtual collection concept (from HEPCAL) Query Interface needs standardization Metadata Catalog Virtual MD Query Result FileList File Catalog
Suggestions on how to proceed • Accept Web Service interfaces as the common base interface framework • Define interfaces inside application- and middleware-specific domains based on existing services and the specific needs of the given community. • Identify missing interfaces or required interfaces from clients and users • Compare the interfaces at a common forum (like this) • Define how to proceed: Factor out commonalities or standardize commonalities. The aim is to be interoperable. Propagate findings to groups in GGF wherever relevant, spawn new working group with a very specific focus! • Iterative process..
Conclusion • Metadata is closely tied to context and the semantics thereof. • Generic metadata services vs. specialized services: • Generic service to store key-value pairs might be useful to users to store their own data (exploit DAIS) • Try to use common mechanisms for security, discovery, query and error reporting. • Suggestion to work on specialized services, solving a well-understood problem of a user community. Identify commonalities as a second step (bottom up approach) • Maintain a good communication between metadata service providers – GGF can be the forum for this.