180 likes | 298 Vues
In a collaboration meeting on March 22, 2002, Ian Bird discussed the importance of evolving computing technologies for Hall D's experiments. With rapidly advancing compute power, storage options, and networking capabilities, resources will become increasingly available. The discussion emphasized innovative strategies for managing the expected petabytes of data and the challenge of efficiently utilizing emerging grid computing techniques. Key initiatives include robust data replication, intelligent job scheduling, and a collaborative approach to leverage high throughput computing for the advancement of physics research.
E N D
Computing for Hall D Ian Bird Hall D Collaboration Meeting March 22, 2002 Ian.Bird@jlab.org
Data Volume per experiment per year (Raw data - in units of 109 bytes) But: collaboration sizes!
Technologies • Technologies are advancing rapidly • Compute power • Storage – tape and disk • Networking • What will be available 5 years from now? • Difficult to predict – but it will not be a problem to provide any of the resources that Hall D will need…. • E.g computing: Ian.Bird@jlab.org
FY00, 16 duals (2u) + 500 GB cache (8u) per 19” rack Recently, 5 TB IDE cache disk (5 x 8u) per 19” First purchases, 9 duals per 24” rack FY01, 4 CPU per 1u Intel Linux Farm Ian.Bird@jlab.org
Compute power • Blades • Low power chips • Transmeta, Intel • Hundreds in a single rack • “An RLX System 300ex chassis holds twenty-four ServerBlade 800i units in a single 3U chassis. This density achievement packs 336 independent servers into a single 42U rack, delivering 268,800 MHz, over 27 terabytes of disk storage, and a whopping 366 gigabytes of DDR memory. “ Ian.Bird@jlab.org
Technologies • As well as computing, developments in Storage and Networking will also make rapid progress • Grid computing techniques will bring these technologies together • Facilities – new Computer Center planned • Issues will not be technology, but: • How to use them intelligently • Hall D computing model • People • Treating computing seriously enough to assign sufficient resources Ian.Bird@jlab.org
(Data-) Grid Computing Ian.Bird@jlab.org
Particle Physics Data GridCollaboratory Pilot Who we are: Four leading Grid Computer Science Projects and Six international High Energy and Nuclear Physics Collaborations The problem at hand today: Petabytes of storage, Teraops/s of computing Thousands of users, Hundreds of institutions, 10+ years of analysis ahead What we do: Develop and deploy Grid Services for our Experiment Collaborators and Promote and provide common Grid software and standards Ian.Bird@jlab.org
PPDG Experiments ATLAS - aToroidal LHC ApparatuS at CERN Runs 2006 onGoals: TeV physics - the Higgs and the origin of mass … http://atlasinfo.cern.ch/Atlas/Welcome.html BaBar - at the Stanford Linear Accelerator Center Running Now Goals: study CP violation and more http://www.slac.stanford.edu/BFROOT/ CMS - the Compact Muon Solenoid detector at CERN Runs 2006 on Goals: TeV physics - the Higgs and the origin of mass … http://cmsinfo.cern.ch/Welcome.html/ D0 – at theD0 colliding beam interaction region at Fermilab Runs Soon Goals: learn more about the top quark, supersymmetry, and the Higgs http://www-d0.fnal.gov/ STAR - Solenoidal Tracker At RHIC at BNL Running Now Goals: quark-gluon plasma … http://www.star.bnl.gov/ Thomas Jefferson National Laboratory Running Now Goals: understanding the nucleus using electron beams … http://www.jlab.org/ Ian.Bird@jlab.org
PPDG Computer Science Groups Condor – develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing on large collections of computing resources with distributed ownership. http://www.cs.wisc.edu/condor/ Globus - developing fundamental technologies needed to build persistent environments that enable software applications to integrate instruments, displays, computational and information resources that are managed by diverse organizations in widespread locations http://www.globus.org/ SDM - Scientific Data Management Research Group – optimized and standardized access to storage systems http://gizmo.lbl.gov/DM.html Storage Resource Broker - client-server middleware that provides a uniform interface for connecting to heterogeneous data resources over a network and cataloging/accessing replicated data sets. http://www.npaci.edu/DICE/SRB/index.html Ian.Bird@jlab.org
Delivery of End-to-End Applications& Integrated Production Systems • PPDG Focus: • Robust Data Replication • - Intelligent Job Placement • and Scheduling • - Management of Storage • Resources • - Monitoring and Information • of Global Services • Relies on Grid infrastructure: • - Security & Policy • High Speed Data Transfer • - Network management to allow thousands of physicists to share data & computing resources for scientific processing and analyses Operators & Users Resources: Computers, Storage, Networks Ian.Bird@jlab.org
Project Activities, End-to-End Applicationsand Cross-Cut Pilots • Project Activities are focused Experiment – Computer Science Collaborative developments. • Replicated data sets for science analysis – BaBar, CMS, STAR • Distributed Monte Carlo production services – ATLAS, D0, CMS • Common storage management and interfaces – STAR, JLAB • End-to-End Applications used in Experiment data handling systems to give real-world requirements, testing and feedback. • Error reporting and response • Fault tolerant integration of complex components • Cross-Cut Pilots for common services and policies • Certificate Authority policy and authentication • File transfer standards and protocols • Resource Monitoring – networks, computers, storage. Ian.Bird@jlab.org
Year 0.5-1 Milestones (1) Align milestones to Experiment data challenges: • ATLAS – production distributed data service – 6/1/02 • BaBar – analysis across partitioned dataset storage – 5/1/02 • CMS – Distributed simulation production – 1/1/02 • D0 – distributed analyses across multiple workgroup clusters – 4/1/02 • STAR – automated dataset replication – 12/1/01 • JLAB – policy driven file migration – 2/1/02 Ian.Bird@jlab.org
Year 0.5-1 Milestones • Common milestones with EDG: • GDMP – robust file replication layer – Joint Project with EDG Work Package (WP) 2 (Data Access) • Support of Project Month (PM) 9 WP6 TestBed Milestone. Will participate in integration fest at CERN - 10/1/01 • Collaborate on PM21 design for WP2 - 1/1/02 • Proposed WP8 Application tests using PM9 testbed – 3/1/02 • Collaboration with GriPhyN: • SC2001 demos will use common resources, infrastructure and presentations – 11/16/01 • Common, GriPhyN-led grid architecture • Joint work on monitoring proposed Ian.Bird@jlab.org
Year ~0.5-1 “Cross-cuts” • Grid File Replication Services used by >2 experiments: • GridFTP – production releases • Integrate with D0-SAM, STAR replication • Interfaced through SRB for BaBar, JLAB • Layered use by GDMP for CMS, ATLAS • SRB and Globus Replication Services • Include robustness features • Common catalog features and API • GDMP/Data Access layer continues to be shared between EDG and PPDG. • Distributed Job Scheduling and Management used by >1 experiment: • Condor-G, DAGman, Grid-Scheduler for D0-SAM, CMS • Job specification language interfaces to distributed schedulers – D0-SAM, CMS, JLAB • Storage Resource Interface and Management • Consensus on API between EDG, SRM, and PPDG • Disk cache management integrated with data replication services Ian.Bird@jlab.org
Year ~1 other goals: • Transatlantic Application Demonstrators: • BaBar data replication between SLAC and IN2P3 • D0 Monte Carlo Job Execution between Fermilab and NIKHEF • CMS & ATLAS simulation production between Europe/US • Certificate exchange and authorization. • DOE Science Grid as CA? • Robust data replication. • fault tolerant • between heterogeneous storage resources. • Monitoring Services • MDS2 (Metacomputing Directory Service)? • common framework • network, compute and storage information made available to scheduling and resource management. Ian.Bird@jlab.org
PPDG activities as part of the Global Grid Community • Coordination with other Grid Projects in our field: • GriPhyN – Grid for Physics Network • European DataGrid • Storage Resource Management collaboratory • HENP Data Grid Coordination Committee • Participation in Experiment and Grid deployments in our field: • ATLAS, BaBar, CMS, D0, Star, JLAB experiment data handling systems • iVDGL/DataTAG – International Virtual Data Grid Laboratory • Use DTF computational facilities? • Active in Standards Committees: • Internet2 HENP Working Group • Global Grid Forum Ian.Bird@jlab.org
What should happen now? • Collaboration needs to define it’s computing model • It really will be distributed – grid based • Although the compute resources can be provided – it is not obvious that the vast quantities of data can really be analyzed efficiently by a small group • Do not underestimate the task • The computing model will define requirements for computing – some of which may require some lead time • Ensure software and computing is managed as a project equivalent in scope to the entire detector • It has to last at least as long, it runs 24x365 • The complete software system is more complex than the detector, even for Hall D where the reconstruction is relatively straightforward • It will be used by everyone • Find and empower a computing project manager now Ian.Bird@jlab.org