260 likes | 435 Vues
User Community Report. Dimitri Bourilkov University of Florida UltraLight Visit to NSF Arlington, VA, January 4, 2006. Physics Analysis User Group Motivation and Mission. Establish a community of physicists - early adopters and users: first within UltraLight (expert users)
E N D
User Community Report Dimitri Bourilkov University of Florida UltraLight Visit to NSF Arlington, VA, January 4, 2006
Physics Analysis User GroupMotivation and Mission • Establish a community of physicists - early adopters and users: • first within UltraLight (expert users) • later outside users • This community uses the system being developed e.g. • starts actual physics analysis efforts exploiting the test-bed • provides a certain user perspective on the problems being solved User Community Report
Physics Analysis User Group • Organizes early adoption of the system • Identifies the most valuable features of the system from the users perspective, to be released early in production (or useful level of functionality) • This is "where the rubber meets the road" and will provide rapid user feedback to the development team User Community Report
Physics Analysis User Group • Evolving dialog with applications WG on: • the scope (what is most valuable for physics analysis) • priorities for implementing features • composition and timing of releases - aligned with the milestones of the experiments • Develops, in collaboration with the applications group, a suite of functional tests; can be used for: • measuring the progress of the project • educating new users and making it easier to pass the threshold for adopting the system • demonstrating the UltraLight services in action in education/outreach workshops User Community Report
Physics Analysis User Group • Studies in depth the software framework of HEP applications (e.g. ORCA/COBRA or the new Software framework for CMS, ATHENA for ATLAS), the data and metadata models and the steps to best integrate the systems • Maintains a close contact with people in charge of software developments in the experiments and responds to their requirements and needs • Provides expert help with synchronization and integration between UltraLight and the software systems of the experiments User Community Report
Physics Analysis User Group • Contributes to ATLAS/CMS Physics preparation milestones (UL members are already active in LHC physics and several analyses are officially recognized in CMS) • In the longer term enables physics analysis and LHC physics research • In the shorter term involved actively in SC|05 activities, culminating with the Bandwidth Challenge in Seattle in November • Prepared a tutorial on data analysis with ROOT for the E & O workshop in Miami, June 2005 User Community Report
CMS Data Samples for Testing • For initial testing generated seven samples of Z’ events with masses from 0.2 to 5 TeV: fully simulated with OSCAR and reconstructed with ORCA on local Tier2 resources at UF; in total 42k events, ~ 2 GB in root trees (ExRootAnalysis format); used for SC|05 • Additional data for different channels: QCD background, top events, bosons + jets, SUSY points; ~ 35 GB, same format • In addition ~ 100k single or di-muon events were generated over the summer at the FNAL LPC Tier1 resources User Community Report
Prototype CMS Analysis • Developed a stand-alone C++ code to analyze the ExRootAnalysis trees: • Lightweight, no external dependencies besides ROOT, used for iGrid2005 and SC|05 demos and by users at UF for analysis and CMS production validation • Some parts of the info e.g. trigger bits, harder to access than in CMS framework (need to load ORCA libraries) User Community Report
Visualization Before detector simulation: PYTHIA 4-vectors – CMKINViewer (DB) After reconstruction – COJAC (Julian Bunn) User Community Report
Collaborative Community Tools:CAVES / CODESH Projects • Concentrate on the interactions between scientists collaborating over extended periods of time • Seamlessly log, exchange and reproduce results and the corresponding methods, algorithms and programs Automatic and complete logging and reuse of work / analysis sessions: collect all user activities on the command line + code of all executed programs • Extend the power of users working / performing analyses in their habitual way, giving them virtual logbook capabilities • CAVES is used in normal analysis sessions with ROOT • CODESH is a UNIX shell with virtual logbook capabilities • Build functioning collaboration suites - close to users! • Formed a team: CODESH: DB & Vaibhav Khandelwal; CAVES: DB & Sanket Totala User Community Report
Case1: SimpleUser 1 : Does some analysis and produces a result with tag analX_user1.User 2: Browses all current tags in the repository and fetches the session stored with tag analX_user1. Case2: ComplexUser 1 : Does some analysis and produces a result with tag analX_user1. User 2: Browses all current tags in the repository and fetches the session stored with tag analX_user1.User 2: Does a modification in the program obtained from the session of user1 and stores the same along with a new result with tag analX_user2_mod_code.User 1: Browses the repository, finds that his program was modified and decides to extract that session using the tag analX_user2_mod_code.This scenario can be extended to include an arbitrary number of steps and users in a working group or groups in a collaboration. Choice of Scenarios User Community Report
CAVES / CODESH ArchitecturesScalable and Distributed First prototypes use popular tools: Python, ROOT and CVS; e.g. all ROOT commands and CAVES commands or all UNIX shell commands and CODESH commands available User Community Report
Working Releases - CODESH • Virtual log-book for “shell” sessions • Parts can be local (private) or shared • Tracks environment variables, aliases, invoked program code etc during a session • Reproduces complete working sessions • Complex CMS ORCA example operational • All CMS data generations for the community group done at the LPC are stored in CODESH and the knowledge is available User Community Report
Working Releases - CAVES Higgs W+W- User Community Report
Large Scale Data Transfers • Network aspect: Bandwidth*Delay Product (BDP); we have to use TCP windows matching it in the kernel AND the application • On a local connection with 1GbE and RTT 0.19 ms, to fill the pipe we need around 2*BDP 2*BDP = 2*1Gb/s*0.00019s = ~ 48 KBytes Or, for a 10 Gb/s LAN: 2*BDP = ~ 480 KBytes • Now on the WAN: from Florida to Caltech the RTT is 115 ms. So for 1 Gb/s to fill the pipe we need 2*BDP = 2*1Gb/s*0.115s = ~ 28.8 MBytes etc. • User aspect: are the servers on both ends capable of matching these rates for useful disk-to-disk? Tune kernels, get highest possible disk read/write speed etc. Tables turned: WAN outperforms disk speeds! User Community Report
bbcp Tests bbcp was selected as a starting tool for WAN tests: • Supports multiple streams, highly tunable (window size etc), peer-to-peer type • Well supported by Andy Hanushevsky from SLAC • Is used successfully in BaBar • I have used it in 2002 for CMS production: massive data transfers from Florida to CERN; the only limit observed at the time was disk writing speed (LAN), network (WAN) • Starting point Florida Caltech: < 0.5 MB/s on the WAN, very poor performance User Community Report
Evolution of Tests Leading to SC|05 • End points in Florida (uflight1) and Caltech (nw1): AMD Opterons over UL network • Tuning of kernels and bbcp window sizes – coordinated iterative procedure • Current status (for file sizes ~ 2GB): • 6-6.5 Gb/s with iperf • up to 6 Gb/s memory to memory • 2.2 Gb/s ramdisk remote disk write • the speed was the same writing to SCSI disk which is supposedly less than 80 MB/s or writing to a raid array, so de facto it always goes first to memory cache (the Caltech node has 16 GB ram) • Used successfully with up to 8 bbcp processes in parallel from Florida to the show floor in Seattle; CPU load still OK User Community Report
bbcp Examples Florida Caltech [bourilkov@uflight1 data]$ iperf -i 5 -c 192.84.86.66 -t 60 ------------------------------------------------------------ Client connecting to 192.84.86.66, TCP port 5001 TCP window size: 256 MByte (default) ------------------------------------------------------------ [ 3] local 192.84.86.179 port 33221 connected with 192.84.86.66 port 5001 [ 3] 0.0- 5.0 sec 2.73 GBytes 4.68 Gbits/sec [ 3] 5.0-10.0 sec 3.73 GBytes 6.41 Gbits/sec [ 3] 10.0-15.0 sec 3.73 GBytes 6.40 Gbits/sec [ 3] 15.0-20.0 sec 3.73 GBytes 6.40 Gbits/sec bbcp: uflight1.ultralight.org kernel using a send window size of 20971584 not 10485792 bbcp -s 8 -f -V -P 10 -w 10m big2.root uldemo@192.84.86.66:/dev/null bbcp: Sink I/O buffers (245760K) > 25% of available free memory (231836K); copy may be slow bbcp: Creating /dev/null/big2.root Source cpu=5.654 mem=0K pflt=0 swap=0 File /dev/null/big2.root created; 1826311140 bytes at 432995.1 KB/s 24 buffers used with 0 reorders; peaking at 0. Target cpu=3.768 mem=0K pflt=0 swap=0 1 file copied at effectively 260594.2 KB/s bbcp -s 8 -f -V -P 10 -w 10m big2.root uldemo@192.84.86.66:dimitri bbcp: uflight1.ultralight.org kernel using a send window size of 20971584 not 10485792 bbcp: Creating ./dimitri/big2.root Source cpu=5.455 mem=0K pflt=0 swap=0 File ./dimitri/big2.root created; 1826311140 bytes at 279678.1 KB/s 24 buffers used with 0 reorders; peaking at 0. Target cpu=10.065 mem=0K pflt=0 swap=0 1 file copied at effectively 150063.7 KB/s User Community Report
Data Transfers and Analysis • CMS service challenges • Phedex CMS system for data transfers Tier0 Tier1 Tier2 • Get expertise with the system • Provide user feedback • Integrate Storage/Transfer (SRM/Dcache/Phedex) with network • Analysis of data from the cosmic runs in collaboration with FNAL Tier1 (muons, calorimetry) User Community Report
Outlook on Data Transfers • The UltraLight network is already very performant • The hard problem from the user perspective now is to match it with servers capable of sustained rates for large files > 20 GB (when the memory caches are exhausted); fast disk writes are key (raid arrays) • To fill 10 Gb/s pipes we need several pairs (3-4) of servers • In ramdisk tests we achieved 1.2 GB/s on read and 0.3 GB/s on write (cp, dd, bbcp) • Next step: disk-to-disk transfers between Florida, Caltech, Michigan, FNAL, BNL, CERN User Community Report
UltraLight Analysis Environment • Interact closely with the application group on integration of UltraLight services in the experiments’ software environments • Clarens web services oriented framework • MCPS job submission • Grid Analysis Environment etc. See talk by Frank van Lingen – Application group User Community Report
Align with ATLAS/CMS/OSG Milestones • ATLAS/CMS Software stacks are complex and still developing • Integration work is challenging and constantly evolving • Data and Service Challenges 2006 • Exercise computing services together with LCG + centers • System scale: 50% of single experiment’s needs in 2007 • Computing, Software, Analysis (CSA) Challenges 2006 • Ensure readiness of software + computing systems for data • 10M’s of events through the entire system (incl. Tier2) • Extensive needs for Tier2 to Tier2 data exchanges; collaboration with DISUN User Community Report
Outlook • Dedicated groups of people (expert users) for data transfers and analysis tasks available • Excellent collaboration with the networking and application groups • Team for developing collaboration tools formed • Explore commonalities and increase the participation of ATLAS at the analysis stage • SC|05 was a great success, laying a solid foundation for the next steps • We are involved actively in LHC physics preparations e.g. the CMS Physics TDR • The Physics Analysis group will play a key role in achieving successful integration of UltraLight applications in the experiments’ analysis environments User Community Report
Backup slides User Community Report
Linux Kernel Tunings • Edit sysctl.conf to add the following lines • net.core.rmem_default = 268435456 • net.core.wmem_default = 268435456 • net.core.rmem_max = 268435456 • net.core.wmem_max = 268435456 • net.core.optmem_max = 268435456 • net.core.netdev_max_backlog = 300000 • net.ipv4.tcp_low_latency = 1 • net.ipv4.tcp_timestamps = 0 • net.ipv4.tcp_sack = 0 • net.ipv4.tcp_rmem = 268435456 268435456 268435456 • net.ipv4.tcp_wmem = 268435456 268435456 268435456 • net.ipv4.tcp_mem = 268435456 268435456 268435456 • Enable on the fly the changes in sysctl.conf by executing: sysctl -p /etc/sysctl.conf • Sizes ~ 256 MB worked best (bigger were not helpful) User Community Report
bbcp Examples Caltech Florida [uldemo@nw1 dimitri]$ iperf -s -w 256m -i 5 -p 5001 -l 8960 ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 512 MByte (WARNING: requested 256 MByte) ------------------------------------------------------------ [ 4] local 192.84.86.66 port 5001 connected with 192.84.86.179 port 33221 [ 4] 0.0- 5.0 sec 2.72 GBytes 4.68 Gbits/sec [ 4] 5.0-10.0 sec 3.73 GBytes 6.41 Gbits/sec [ 4] 10.0-15.0 sec 3.73 GBytes 6.40 Gbits/sec [ 4] 15.0-20.0 sec 3.73 GBytes 6.40 Gbits/sec [ 4] 20.0-25.0 sec 3.73 GBytes 6.40 Gbits/sec bbcp -s 8 -f -V -P 10 -w 10m big2.root uldemo@192.84.86.179:/dev/null bbcp: Sink I/O buffers (245760K) > 25% of available free memory (853312K); copy may be slow bbcp: Source I/O buffers (245760K) > 25% of available free memory (839628K); copy may be slow bbcp: nw1.caltech.edu kernel using a send window size of 20971584 not 10485792 bbcp: Creating /dev/null/big2.root Source cpu=5.962 mem=0K pflt=0 swap=0 File /dev/null/big2.root created; 1826311140 bytes at 470086.2 KB/s 24 buffers used with 0 reorders; peaking at 0. Target cpu=4.053 mem=0K pflt=0 swap=0 1 file copied at effectively 263793.4 KB/s User Community Report