Event Model. Language Independent representation, common structures and approaches. Program of Work Meeting, June 28-19, 2000 P. Mato, CERN. Event Model: What is it?.
Event Model: What is it? • For us [LHCb] the Event Model is the structure of the transient event data and their relationships that is made available to the Algorithms. • It is intentional to limit the behavior of the data objects to the bear minimum (separation between data and algorithms) • The organization of the transient data store follows a tree of data objects (file system like) • Identification (name and type) • Navigability • Focus from now on the specific structure of the nodes P. Mato, CERN
Event Model Definition: Past practices • ALEPH: Data Definition Language, DDL files(ADAMO) • LHCb SICb: DDF files • Code generation: • Fortran statement functions • Documentation generation OBJECT # NAME: ATMC ! MC tracks bank FANOUT: AXRE AUTHOR: A.Tsaregorodtsev VERSION: 0 PARTITIONS: 10 ! each partition corresponds to a pile-up member NOBJECTS: 500 ! initial number of objects PARAMETERS: 14 ! number of parameters per object without ID and refs # # ( Type may be F - float, I-integer, B-bit pattern, H-Hollerith ) # Name Type Min Max Accuracy # XV F -500. 500. 0.01 ! X vertex position YV F -500. 500. 0.01 ! Y vertex position ZV F -100. 2000. 0.01 ! Z vertex position PX F -5000. 5000. 0.01 ! X component of the momentum PY F -5000. 5000. 0.01 ! Y component of the momentum PZ F -5000. 5000. 0.01 ! Z component of the momentum E F 0. 5000. 0.01 ! Energy NT I 0. 1000. 1.0 ! MC track number NV I 0. 1000. 1.0 ! MC vertex number IPRT I 0. 200. 1.0 ! particle ID NPTR I 0. 1000. 1.0 ! MC track number for the parent track IBOS I 0. 1. 1.0 ! Oscillation flag for the B-particle IFLH B 0. 0. 1.0 ! Flavor history code HEL F -1. 1. 0.0001 ! Helicity # REFERENCE: 13 # # Reference banks # xxHT AVMC VRPR WDRW …. # END OBJECT SUBSCHEMA X1glb : 'Trigger Level1 Global Section' AUTHOR 'B.Rensch' REVIEWER 'U.N. Known' VERSION '1.0' DATE 'January on the 13th year of KohlEra' DEFINE ATTRIBUTE Date = INTE [0,99999999]; Time = INTE [0,99999999]; Nam8 = CHA8; END ATTRIBUTE DEFINE ESET TrigType : 'Available TrigTypes; ID=0 if NOT defined/available' SIZE 20,20 = (TrigType = CHA8 : 'Name of TrigType', Title = CH32 : 'Short Explanation/Usage', LstMod = CH32 : 'Date of last Modification'); Info : 'Some general information for Trigger_Setup' SIZE 1,1 = (FillStatus = INTE : '.gt.0=Ok; X1TT,X1TH,X1CA filled', DaqStatus = INTE : 'To avoid calamities-->X1mist', TrigType = CHA8 : 'Current TrigType', Activity = CHA8 : 'Current Activity', FillNr = INTE : 'Current FillNumber', RunNr = INTE : 'Current RunNumber', GBXperSEC = INTE : 'GBX per second', SDnam(12) = CHA8 : 'SDnames known by Trigger', TrigMnem(32) = CHA8 : 'Mnemonic of TriggerBits', BitDetU(32) = BITP : 'Bit i on if SDnam(i) used', X1wtchNode = CH32 : 'AMSaddr of X1wtch', AutoDisab = INTE : 'AutoDisable by X1_DAQcntrl', HowSetup = INTE : 'Fork SetupTasks 0=VAX/1=FIC', Reserve(30) = INTE : 'One never knows..........'); END ESET END SUBSCHEMA P. Mato, CERN
Event Model Definition: Current practices • LHCb • Using handcrafted C++ class definitions following some rules (inheritance from some base class, factory declaration, relationships with smartpointers, …) • Independent of the storage/interactivity/networking technology • CMS • Using Objectivity DDL language (very close to C++). Special Objectivity data definition constructs (VArray, ooRef,…) • ATLAS • For the time being as LHCb • ALICE • Entirely base on ROOT and CINT. They claim not to distinguish data from algorithms. P. Mato, CERN
Current Problems • Handcrafting C++ class definitions with some rules is error prone • Header files are very long and difficult to understand • For example: MCVertex.h has 200 lines for only 4 data members • By hand Converters are a maintenance problem: • Persistency • Language conversions • No interactive manipulation of Event objects P. Mato, CERN
Is a DDL a solution? • We could describe the Event Model with a language independent DDL • From a single source we could generated: • C++ header files, • Other language header files • Data dictionaries (RTTI) • What about the object methods? .h .java User DDL Cnv .cpp Data Dictionary P. Mato, CERN
Data Definition Languages • IDL (Interface Definition Language) (OSF, OMG) • Used to define interfaces (CORBA, DCOM) in distributed applications. • ODL (Object Definition Language) (ODMG) • Language for describing database schemas. • The Objectivity DDL (Data Definition Language) is a variation based on C++ syntax • Both IDL & ODL are C-like languages with emphasis on interface(functionality) or database(storage and relations) • None of them are directly usable P. Mato, CERN
Defining the Data Model with C++ Classes • ROOT CINT • The C++ header files are pre-processed by CINT • Streamers methods generated (persistency) • Runtime Type Information (RTTI) generated (interactivity) • Limitations with the C++ features (references, templates,…) • Using Debug Information (by Expresso) • Typically, compiles are able to generate debug information that can be used to provide RTTI functionality • No limitation with the C++ features • Strong dependency with compilers and platforms P. Mato, CERN
For what do we want a DDL? • Object Persistency automation • Code generation: streamers (ROOT), converters (GAUDI) • Data Dictionary or RTTI: generic converters, simple schema evolution. • Interactivity • Interfacing scripting or GUI with Data Model • Data browsing capabilities • Distributed Computing • Support for distributed applications • Exchanging serialized Event Data objects with the network • Language Mixing • Object interaction across language boundaries • Conversion of objects from one language to another language (code generation) P. Mato, CERN
Do we need to impose any limitations? • Not every data model can be expressed by all languages • If we what to keep the door open to language interoperability or language replacement we need to put some restrictions. • Is it a good idea to have many ways to implement relationships, links to Monte Calo data, etc.? • Simplifications and uniformity are always welcome • These limitations have been introduced in the past (Fortran era) and experiments have succeeded to produce good physics ! • Should not be a problem P. Mato, CERN
Proposal • Start a serious investigating the possibilities that exist as data definition languages • IDL like • C++ (using a C++ parser) • Build a prototype with current event model to generate code in various languages, and converters (to java?) P. Mato, CERN