140 likes | 249 Vues
This guide outlines best practices for storing and accessing files within the CDF framework at Fermilab. It covers essential topics such as file description, dataset formulation, and recommendations for using SAM stations to streamline data storage. Detailed instructions on installing a SAM station and utilizing scripts for file management are provided. Additionally, it emphasizes the importance of creating specific datasets to facilitate unique access results based on user queries. Ensuring a strategic approach to data accessibility will enhance productivity across the various physics groups involved in CDF research.
E N D
MC MetaData Rick St. Denis – Glasgow University Rob Kennedy–FNAL For the CDF DH group
Outline • Storing Files from Anywhere • Describing the file you want to store • Accessing the information • Choosing files for a dataset • Recommendation
StoringFiles from any SAM Station • Install a Sam Station • Sam store –-descrip=ImportClasses.py -–source=. --dest= /pnfs/cdf/sam/SA/SA00/SA0001/ SA0001.0/
The Description File from import_classes import * appfamily=AppFamily('generator', '6.203', 'pythia') t = SAMMCFile(filename,Events(1, 10, 10), "generated", appfamily, "5/9/2003 13:48", "5/9/2003 13:48", 18,{ 'Global':{ 'ProducedByName':'lecci', 'OriginName':'GridKa', 'Phase':'unspecified', 'FacilityName':'cdf-fzkka', 'ProducedForName':'kerzel', 'RunType':'Monte Carlo', 'GroupName':'cdf', 'WorkRequestID':'03021105838', 'Stream':'m', 'Description':'10 events b bbar test MC',},
The Description File(more) 'Generated' : {#'CDFRelease':'4.9.1hpt1', 'AppFamily':'generator', 'AppName':'pythia', 'AppVersion':'6.203', #'UseQQ':'false', 'RanSeed1':'48572', #'RanSeed2':'64245', 'RunNumber':150110, 'FirstEvent':'1', 'LastEvent':'10', 'TotalEvents':'10', 'NumRecords':'10',
The Description File (YOU Decide) #'Pythia_MSel':'5', #'PtLt':'99999.0', #'PtGt':'5.0', #'EtaLt':'4.2', #'EtaGt':'-4.2', #'CardfileVersion':'v00-04-20', #'CardfileDir':'np', #'UseEvtGen':'null', #'UseComphep':'null', #'KinMassGt':'2.0', #'CollisionEnergy':'1960.0', #'MultiplePartonInteractions':'on', #'UseOnetop':'null', #'PDFLibFunc':'null', #'Production':'cc',
The Description File (YOU Decide) #'HppMass':'200.0', #'Decay':'incl', #'SVX':'enabled', #'COT':'enabled', #'Muon':'enabled', #'Calorimeter':'enabled', #'TOF':'enabled', #'PassiveMaterial':'enabled', #'BeampipeCentral':'true', #'TOFGeometyModel':'Survey', #'COTSuperLayer_DriftModel':'Garfield',
The Description File (YOU Decide) #'COTSuperLayer_CreateMCOT':'false', #'COTSuperLayer_COTM':'true', #'CDFHalfLadder_PickSVX_CDM':'geometric', #'CDFHalfLadder_SVX_CDM_noise':'hits', #'CDFHalfLadder_PropagatedSI':'true', #'CDFHalfLadder_CreateSIXD':'true', #'TOF3Pack_level':'usrtuned', #'TOFReco_Pulses':'simple', #'TOFReco_Extrapolator':'gemeometric', #'TOFReco_TZero':'NegLog', } } )
Accessing information • Access information to see how files were produced • Access information to make datasets based on the information stored with files sam get metadata --file=test1_rec.root File Type: SAMMC Data File File Name: test1_rec.root File ID: 1493397 File Size: 18 [KB] CRC Data: unknown crc value [unknown crc type] File Start Time: 05/09/2003 13:48:00 File End Time: 05/09/2003 13:48:00 Physical Stream: m File Format Info: unknown file format
Accessing information First Event: 1 Last Event: 10 Total Events: 10 Application Family: generator Application Name: pythia Application Version: 6.203 Import Process ID: 0 Node Name: cdf.fzk.de Work Group: cdf User Name: kerzel Produced For: kerzel Produced By: lecci Origin Location: gridka Origin Facility: cdf-fzkka Physics Channel: Description: 10 events b bbar test MC
Accessing information MC Phase: unspecified Run Number: 150110 Run Type: monte carlo Run Start Time: 05/09/2003 13:48:00 Run End Time: 05/09/2003 13:48:00 Run Description: 10 events b bbar test MC Run CM Energy: 0.0 Parent Files: [] Split: 0 Merge: 0
Accessing information by key Key: appfamily = generator (Category: generated) Key: appname = pythia (Category: generated) Key: appversion = 6.203 (Category: generated) Key: firstevent = 1 (Category: generated) Key: lastevent = 10 (Category: generated) Key: numrecords = 10 (Category: generated) Key: ranseed1 = 48572 (Category: generated) Key: runnumber = 150110 (Category: generated) Key: totalevents = 10 (Category: generated) Key: facilityname = cdf-fzkka (Category: global) Key: groupname = cdf (Category: global) Key: originname = GridKa (Category: global) Key: phase = unspecified (Category: global) Key: producedbyname = lecci (Category: global) Key: producedforname = kerzel (Category: global) Request ID: 0 Data Tier: generated
Creating a Dataset Sam create dataset definition --defname=mcbot3 --group=cdf --defdesc=‘bottom mc’ --dim=‘facilityname cdf-fzkka and groupname cdf and run number 150110 and appname pythia and appversion 6.203’ The hard part: anticipating how users can get unique results when they make a dataset! This then goes into the dataset=mcbot3 in your DHInput.
Recommendation • Each of the 5 physics group MC producers give thought to how data are to be accessed and develop a policy • Start with Saverio and B group to try this out and see how we want to specify the data