180 likes | 193 Vues
Explore requirements and data structures for the IMAS focusing on plasma physics, auxiliary systems, provenance, ontology, and data storage. Establish a unique data model, hierarchical tree structures, and modular design for efficient data management and access.
E N D
Integrated Modelling Analysis Suiterequirements towards Data Structures, Data Descriptions & Code/Component Interfaces F. Imbeaux IM Design Team, CEA IRFM 9 June 2011
Outline • Data categories • Requirements (Team’s recommendation on the requirements) • Provenance • Ontology • Data model • Data reference • Data storage and access
Strategic requirement • From Strategic Requirement IM1, the IMAS shall describe the ITER plasma and its interactions with structures and auxiliary systems the IMAS data model shall encompass plasma physics and the description / behavior of the auxiliary systems
Data Categories (1/3) • The IMAS data is categorized as follows
Data Categories (2/3) • The IMAS data is categorized as follows
Data Categories (3/3) • The IMAS data is categorized as follows
Requirement - Provenance • Owing to the variety and complexity of the workflows executed by the IMAS (IM1, Use Case categories), data description and provenance shall be recorded, together with information about its validity
Requirement - Ontology • The IMAS will have several components solving a given physical problem • A range of models of varying complexity are needed • Benchmarking exercises • New models will be regularly installed, from IO and from ITER Parties • Need to avoid binary interfaces (N2 problem) • A common ontology shall be developed in order to have standardized interfaces between IMAS physics components. Those standardized interfaces shall cover Physics Concepts data and Tokamak structure / Auxiliary system data • Relevant physics components from ITER Members Institutions shall be interfaced according to the IMAS Ontology in order to be used within the IMAS. The IMAS infrastructure design shall minimize the interfacing effort for new physics components.
Requirement – Unique data model within ITER • The Tokamak structure / Auxiliary systems Data Category is not restricted to IMAS, these data are in fact primarily used / defined by the group responsible of the structure / auxiliary system. • Team’s recommendation: There is a strong benefit in having a unique data model relevant to IMAS and outside the IMAS. This unique data model definition requires discussion and coordination between the various IO groups for harmonization. • The IMAS shall be able to access directly the data it needs in its original form without having to translate / duplicate data in an IMAS-specific format • Any exception to the previous requirement (e.g. CAD data) which cannot be exploited directly by common programming languages (standard API/get) requires a definition in the unique data model and automated procedures to put these data
Requirement – Unique data model for experimental and simulated data • The data model to be used by the IMAS shall encompass both experimental and simulated data • The data structures shall not depend on whether they represent experimental and/or simulated data. • A cataloguing system shall allow referencing the various instances of experimental and simulated data, and the results of analyses performed on them, in the ITER DataBase in an unambiguous and user-friendly way
Requirement – Modularity of the data model • The structure of the data model shall be modular in order to reflect • the logics of elementary physics concepts and subsystems • the logics of responsibilities of the various sub-systems composing ITER. • Helps maintenance and tracking provenance
Requirement – Hierarchical tree structures in the data model • From the team’s experience, it is critical to organize the data model as a hierarchical tree structure • Helps organizing the huge number of signals contained by the data model, ease the readability and thus maintainability of the data model • In view of coupling components on the IMAS, defining high level hierarchical objects allows expressing the interfaces between components in a much more compact and user-friendly way than having to list explicitly a large number of individual signals. • Helps tracking provenance • The data model shall be organized as a hierarchical tree structure
Requirement – Tokamak generic data model • The Model Improvement and Model Validation Use Cases, in particular, require the capability to access and model experiments from other facilities than ITER. Therefore • The physics and technology definitions used by the data model shall aim at maximal generality, i.e. avoid as much as possible ITER-specific definitions.
Requirement – Standards for coordinates and units • The part of the ITER data model which is used or produced by the IMAS shall respect the IO Integrated Modelling standards for coordinates and units (2F5MKL document)
Requirement – Data Types • The IMAS data model shall include the following data types: • Simple types: Integers, Strings, Reals, Complex types and multi-dimensional arrays of these types • Complex types: structured objects composed of the above Simple Types, as well as one-dimensional arrays of such hierarchical objects
Requirement – Data Reference • Access to the data shall be provided with all of the following options: • By pulse number + time within pulse (+ run number) (+ pulsetype) • By absolute time (+ run number) • By pulse events (+ run number) (L-H transition, ITB, 15th ELM, ...) • By pulse segment (+ run number) • Typically, all (raw) experimental data will be grouped in a single run number for a given pulse. If some data need a corrected/new calibration, they will be stored with a new run number. • Similarly, different simulations for a same "pulse" store their results in different run numbers (and a different pulsetype from experimental data). The default run number should point to the "best"/validated processed data for pulsetype=pulse and to the "best"/validated simulated data for pulsetype=simulation. • In addition, within a given IMAS workflow several instances of the same data structure may be involved, e.g. equilibria computed with different resolutions, or synthetic diagnostic calculated by different methods. These instances should belong to the same pulse, run number and pulsetype (corresponding to the results of the workflow) and thus another index is needed to reference them.
Requirement – Data Storage and Access • It is understood that data storage and access are the responsibility of CODAC/IT. It is assumed that CODAC/IT will use possibly generic storage and access methods for the whole ITER data model. It is recommended that the IMAS uses these generic methods, provided the IMAS-specific requirements are accommodated. Depending on the context/Use Case, the IMAS may use a generic API or require the implementation of Use Case - specific APIs. • The data storage and access methods shall be able to handle the IMAS Data Types and Hierarchical Structure
Requirement – Data Access Functionalities • For the access functionalities, the API shall • Be able to GET/PUT individual signals from/to the storage to/from all programming languages of the IMAS • Be able to GET/PUT high level data objects defined by the IMAS Ontology from/to the storage to/from all programming languages of the IMAS • Provide in-memory caching mechanisms to improve access performances between components executed on the same node • Manage time: for time-dependent data (data varying during a pulse), the GET/PUT functions should work on the whole time base (pulse) of the signals or perform a variety of operations, such as GET/PUT at time slice level (with or without interpolation), resampling on another time base (list to be refined later) • Provide a specific QRTGET to access quasi-real-time data for quasi real-time analyses (Live Display and Level 2 Reconstruction Use Cases) • Provide specific methods to access data of extreme size typically produced by HPC codes which may reside on a dedicated storage system outside ITER computing facilities • Plasma Research and Plasma Operation Use Cases require the need for access to ITER data from ITER Partners Institutions, therefore: • Use of the API to GET/PUT data remotely shall be transparent and symmetric