Doing analysis in a large collaboration: Overview • The experiment: • Collider runs for many weeks every year. • A lot of data to look at! • In 2007, ~70M minimum bias events. • Need computers to be able to analyze vast dataset. • Analysis Software: • Find relevant data • Develop analysis code • Interface to the data • Analysis code: plots and correlations of experimentally measured quantities • Submitting jobs to the batch farm • Plotting results
Computing resources • Facilities available for STAR • PDSF (in Berkeley, part of NERSC) • Parallel Distributed Systems Facility • RCF (in Brookhaven Lab) • RHIC Computing Facility • Log in to PDSF: pdsf.nersc.gov • Note for PDSF: make sure you have a .chos file. I selected /auto/redhat8 as my environment. • Without an environment properly set, perl won’t work! • Web page: • http://www.nersc.gov/nusers/resources/PDSF/ • FAQ • USER accounts • Monitoring of farm conditions • … • Log in to RCF: rssh.rhic.bnl.gov • Web page: • http://www.rhic.bnl.gov/RCF/
Doing Analysis 101 • For real or simulated data that has already been produced into a standard format: Plot results of analysis. Tools: ROOT classes. Find Data HPSS NFS local. Tools: FileCatalog Run analysis on data. Tools: MuDST StMcEvent
Tools: • FileCatalog (get_file_list.pl) • http://www.star.bnl.gov/STAR/comp/sofi/FileCatalog/ • Finding Files (of course) that satisfy certain conditions: • production library used, trigger setup used, run numbers, collision system, … • i.e. it is a database of our data. • Examples of usage found in link above. • Production Location • STAR → Computing → Production Location • http://www.star.bnl.gov/webdata/pub/overall.html#PLOC
Tools: • Scheduler • Used to submit jobs to the RCAS linux farm in batch mode. • RCAS: ~10 interactive nodes, ~150 batch nodes. • How to use it: • XML script that specifies • files to be used (e.g. using a catalog query) • macro to be executed (i.e. analysis to be done) • what to do with the output • STAR → Computing → Scheduler • http://www.star.bnl.gov/public/comp/Grid/scheduler/
Analyzing Example: Real Data, Step I • Find events of interest • Au Au collisions, 200 GeV. • p+p collisions, 200 GeV • Many collisions and triggers have been used. • Example: Looking at “Minimum bias” triggers • This trigger is meant to capture almost all interactions. • Every trigger detector introduces a bias, this trigger is meant to reduce the bias introduced as much as possible. • Trigger ID: • Each file can have events that were selected by various trigger conditions • ID picks out a given trigger condition. • 2004 list of triggers: • http://www.star.bnl.gov/protected/common/common2004/trigger2004/triggers2004.html • 2009 data: • http://www.star.bnl.gov/protected/common/common2009/trigger2009/trigidtable.html
A file catalog query… output: path and filename of file. query conditions: • Production: P10ic, production from 2010, real data, official production (pp2009 data) • Filetype: daq_reco_mudst: came from DAQ (real data), processed through the reconstruction chain, stored into a micro Data Summary Tape. • storage: NFS. Mounted on hard disks accessible through the Networked File System, as opposed to the files stored on tape drives in HPSS. ~/afsdir/wrk/jpsi/offline/> get_file_list.pl -keys 'path,filename' -cond 'production=P10ic,filetype=daq_reco_mudst,storage=NFS,sanity=1,filename~st_upsilon' -limit 10 /star/data81/reco/production2009_200Gev_Single/ReversedFullField/P10ic/2009/138/10138054::st_upsilon_adc_10138054_raw_8350001.MuDst.root /star/data76/reco/production2009_200Gev_Single/ReversedFullField/P10ic/2009/139/10139015::st_upsilon_adc_10139015_raw_7350001.MuDst.root /star/data81/reco/production2009_200Gev_Single/ReversedFullField/P10ic/2009/138/10138054::st_upsilon_adc_10138054_raw_5350001.MuDst.root
What’s in a MuDST? • The result of the reconstruction of an event. • Trigger information. • Signal in ZDC, BBC, VPD, EMC, … • Track information. • Number of points found by tracker and used in fit • Momentum at first point and last point (and helix parameters) • Covariance matrix of track fit • dE/dx • nSigma, PID probability • charge • id of track in event (unique number for a track in an event) • type (0=global, 1=primary) • c2 • Topology Map: bit pattern of hits in detector • Distance of closest approach to vertex (for global) • Position of first and last points • TBrowser, check contents of file. ROOT objects can be drawn quickly.
Define an Analysis Task • Examples: • Multiplicity Distribution • ~Probability to find events with Nch tracks. • Nch: number of charged particles in the event (typically, per unit rapidity at midrapidity). • pT distribution of charged tracks for all events. • ~Probability to find a track with a given pT.
The “Maker” framework • “Makers” are a way to standardize the way we do analysis: • All have to “prepare” or initialize • e.g. book histograms and trees • All do something every event • e.g. calculate or obtain distributions of interest • All have to clean up when the job is done • e.g. write the histograms
Example Maker code: • In PDSF: • /auto/pdsfdv39/starspec/pdsfdv34/starspec/calderon/tutorials/StRoot/StMuDstExampleAnalysisMaker • All Makers live under a directory called StRoot • Compilation of the analysis code is done in same directory where StRoot directory (or link) is found • cons +StMuDstExample • Running is done in same directory where compilation was done. Example in StRoot/macros/examples/ • root4star –b –q ‘RunMuDstExample.C(500)'
Plotting the results • Open the output file (dummyFile00.root) in root. Can issue C++ commands on the root classes interactively. • Set a sensible color scheme • gStyle->SetPalette(1,0); • Create canvases (TCanvas) • TCanvas* cnv1 = new TCanvas(“cnv1”,”multiplicity”,600,600); • For drawing histograms: • TH1::Draw() • mMult->Draw(); • Can change line color, width, style • mMult->SetLineColor(2); • mMult->SetLineWidth(3); • mMult->SetLineStyle(11); • Can draw markers • mMult->SetMarkerStyle(20); • mMult->Draw(“P”); • For reproducibility, can also put all the commands into a macro, and just execute the macro: • .x plotMultExample.C
Analyzing Example: Simulation, Step I • Generate events of interest • e.g. Lambda_c, Upsilon, J/Psi particles according to a distribution • e.g. use event generators • PYTHIA • HERWIG • HIJING • NEXUS • AMPT • MPC • For large datasets, request is done officially to STAR simulation (Maxim Potekhin, simulation leader)