150 likes | 254 Vues
Site Lightning Report: MWT2. Mark Neubauer University of Illinois at Urbana-Champaign US ATLAS Facilities Meeting @ UC Santa Cruz Nov 14, 2012. Midwest Tier-2. 0101001011110…. The Team:
E N D
Site Lightning Report: MWT2 Mark Neubauer University of Illinois at Urbana-Champaign US ATLAS Facilities Meeting @ UC Santa Cruz Nov 14, 2012
Midwest Tier-2 0101001011110… The Team: Rob Gardner, Dave Lesny, Mark Neubauer, Sarah Williams, IllijaVukotic, Lincoln Bryant, Fred Luehring Three site Tier-2 consortia
Midwest Tier-2 Focus of this talk: Illinois Tier-2
Tier-2 @ Illinois History of the project: • Fall 2007−: Development/operation of T3gs • 08/26/10: Torre’s US ATLAS IB talk • 10/26/10: Tier2@Illinois Proposal submitted to US ATLAS Computing Mgmt • 11/23/20: Proposal formally accepted • 10/5/11: First successful test of ATLAS production jobs run on Campus Cluster(CC) • Jobs read data from our Tier3gs cluster
Tier-2 @ Illinois History of the project (cont): • 03/1/12: Successful T2@Illinois Pilot • Squid proxy cache, Condor head node job flocking from UC • 4/4/12: First hardware into Taub cluster • 16 compute nodes (dual x5650, 48 GB memory, 160 GB disk, IB) 196 cores • 60 2TB drives in DDN array 120 TB raw • 4/17/12: PerfSONAR nodes online
Illinois Tier-2 History of the project (cont) • 4/18/12: T2@Illinois in production T2onTaub
Illinois Tier-2 Stable operation: Last two weeks
Illinois Tier-2 Last day on MWT2:
NCSA Building ACB NPCF Why at Illinois? • National Center for Supercomputing Applications (NCSA) • National Petascale Computing Facility (NPCF): Blue Waters • Advanced Computation Building • 7000 sq. ft with 70” raised floor • 2.3 MW of power capacity • 250 kW UPS • 750 tons of cooling capacity • Experience in HEP computing
Tier-2 @ Illinois • Pros (ATLAS perspective) • Free building, power, cooling, core infrastructure support w/ plenty of room for future expansion • Pool of Expertise, heterogeneous HW • Bulk Pricing important given DDD (Dell Deal Demise) • Opportunistic resources • Challenges • Constraints on hardware, pricing, architecture, timing • Deployed in a shared campus cluster (CC) in ACB • “Taub” first instance of CC • Tier2@Illinois on Taub in production within MWT2
Tier-2 @ Illinois Current CPU and disk resources: • 16 compute nodes (taubXXX) • dual x5650, 48 GB memory, 160 GB disk, IB) 196 cores ~400 js • 60 2TB drives in Data Direct Networks (DDN) array 120 TB raw ~70 TB usable
Tier-2 @ Illinois • Utility nodes / services (.campuscluster.illinois.edu): • Gatekeeper (mwt2-gt) • Primary schedd for Taubcondor pool • Flocks other jobs to UC and IU Condor Pools • Condor Head Node (mwt2-condor) • Collector and Negotiator for Taub condor pool • Accepts flocked jobs from other MWT2 Gatekeepers • Squid (mwt2-squid) • Proxy cache for CVMFS, Frontier for Taub (backup for IU/UC) • CVMFS Replica server (mwt2-cvmfs) • CVMFS replica server for Master CVMFS server • dCache s-node (mwt2-s1) • Pool node for GPFS data storage (installed, dCache in progress)
Next CC Instance (to be named) Overview • Mix of Ethernet-only and Ethernet + InfiniBand connected nodes • assume 50-100% will be IB enabled • Mix of CPU-only and CPU+GPU nodes • assume up to 25% of nodes will have GPUs • New storage device and support nodes • added to shared storage environment • Allow for other protocols (SAMBA, NFS, GridFTP, GPFS) • VM hosting and related services • persistent services and other needs directly related to use of compute/storage resources
Next CC Instance (basic configuration) • Dell PowerEdge C8220 2-socket Intel Xeon E5-2670 • 8-core Sandy Bridge processors @ 2.60GHz • 1 “sled” : 2 SB processors • 8 sleds in 4U : 128 cores Dell C8220 compute sled • Memory configuration options: • 2GB/core, 4GB/core, 8GB/core • Options: • InfiniBand FDR (GigE otherwise) • NVIDIA M2090 (Fermi GPU) Accelerators • Storage via DDN SFA12000 • can add in 30TB (raw) increments
Summary and Plans • New Tier-2 @ Illinois • Modest (currently) resource integrated into MWT2 and in production use • Cautious optimism: Deploying an Tier-2 within a shared campus cluster a success • Near term plans • Buy into 2nd campus cluster instance • $160k of FY12 funds with 60/40 CPU/disk split • Continue dCache deployment • LHCONE @ Illinois due to turn on 11/20/12 • Virtualization of Tier-2 utility services • Better integration into MWT2 monitoring