GridX1: A Canadian Computational Grid for High-Energy Physics Applications

GridX1: A Canadian Computational Grid for HEP Applications A. Agarwal, P. Armstrong, M. Ahmed, B.L. Caron, A. Charbonneau, R. Desmarais, I. Gable, L.S. Groer, R. Haria, R. Impey, L. Klektau, C. Lindsay, G. Mateescu, Q. Matthews, A. Norton, W. Podaima, S. Popov, D. Quesnel, S. Ramage, R. Simmonds, R.J. Sobie, B. St. Arnaud, D.C. Vanderster, M. Vetterli, R. Walker CANARIE Inc., Ottawa, Ontario, Canada Institute of Particle Physics of Canada National Research Council, Ottawa, Ontario, Canada TRIUMF, Vancouver, British Columbia, Canada University of Alberta, Edmonton, Canada University of Calgary, Calgary, Canada Simon Fraser University, Burnaby, British Columbia, Canada University of Toronto, Toronto, Ontario, Canada University of Victoria, Victoria, British Columbia, Canada

Overview • Motivation • The GridX1 Framework • Middleware, Metascheduling, Monitoring • User Applications • BaBar and ATLAS • Web Services for GridX1

Motivation • Particle physics (HEP) simulations are “embarrassingly parallel”; multiple instances of serial (integer) jobs • We want to exploit the unused cycles at non-HEP sites • Support dedicated and shared facilities • Each shared facility may have unique configuration requirements • Minimal software demands on sites • We want to develop a general grid • Open to other applications (serial, integer)

The GridX1 Resources • GridX1 has used 8 shared clusters: • Alberta(2), NRC Ottawa, WestGrid, Victoria(2), McGill, Toronto • Total resources >> (2500 CPUs,100 TB disk,400 TB tape) Site Requirements: OS: Red Hat Enterprise Linux, Scientific Linux, CentOS, Suse Linux LRMS: PBS or Condor batch system Network: - External network access needed for worker nodes - Most sites have 1Gbit/s network connectivity

The GridX1 Infrastructure • Grid Middleware • Virtual Data Toolkit: packaged version of Globus Toolkit 2.4 • VDT is more stable than vanilla GT2 • We are evaluating GT4 & web services more on this later • Security and User Management • GridX1 hosts require an X.509 certificate issued by the Grid Canada Certificate Authority • User certificates from trusted CAs around the world are accepted • Authorization is managed at site level in a grid-mapfile • User certificates are mapped to local unix accounts

GridX1 Resource Brokering • We use Condor-G for resource brokering • Flexible and Scalable • Collector: accepts resource advertisements from clusters • Scheduler: queues jobs, submits to resources • Negotiator: performs matchmaking between tasks and resources • Jobs specify Rank and Requirements • Eg. Rank = -Estimated Wait Time Requirements: OS == Linux

Condor-G: A Scalable Metascheduler • The system scales: • To increase job throughput we add a Condor scheduler. Condor-G system for BaBar Condor-G system for ATLAS

Condor-G Adapted for Atlas • We have had success with CondorG on GridX1 • These techniques were applied to build a CondorG executor to submit jobs to Atlas-LCG sites: • Site information is extracted from the BDII and converted to ClassAds • The CondorG executor running at UVic extracts jobs from the Atlas Prodsys DB and submits them to CondorG • Condor Matchmaking matches jobs to Atlas and Canadian sites

GridX1 Monitoring GridX1 is monitored using a Google Maps Mashup

GridX1 Monitoring A web-based dynamic resource monitor Employs Web 2.0/AJAX techniques

Applications: ATLAS Status 2004-2005 GridX1 used by the ATLAS experiment via the LCG-TRIUMF gateway Over 20,000 ATLAS jobs successfully completed Success rate of jobs was similar to LCG (50%)

Applications: Atlas Status 2006 • Currently many GridX1 sites receive jobs directly from the Atlas-LCG Condor-G executor. • HEP clusters are being commissioned as Atlas Tier 2 sites and are linking directly to the LCG. • Non-HEP clusters will be connected using an interface • Atlas Tier-1 Center being built at TRIUMF • 10G lightpath link to be handed over to CANARIE November 1 from SURFnet to Connect CERN to Tier-1 centre at TRIUMF. • 1G Lightpaths currently being established from University of Toronto, and UVic to TRIUMF

Applications: Atlas Future Plans • Effort will be focused on recommissioning a GridX1 interface to facilitate addition of non-HEP sites • Non-LCG resources are integrated into LCG without all LCG middleware • Greatly simplifies the management of shared resources • VM's such as Xen can be used to simplify the requirements at non-HEP sites • CHEP 2006 Paper: Evaluation of Virtual Machines for HEP Grids • We showed that negligible performance penalty was suffered by the Atlas kit validation when run on Xen Virtual Machine. • We plan to research deploying pre packaged Atlas and BaBar images to GridX1 sites.

Applications: BaBar Status • Monthly successful job output plotted at Bottom. • GridX1 production has peaked at 30000 jobs per month • GridX1 provides ~50% of total Canadian BaBar production. • ~15% of global production • Plan to move all Canadian BaBar Production to GridX1.

Investigating service-oriented grid middleware Targeted Metascheduler & Registry Services Deployed a GT4 testbed at UVic and NRC Metascheduler service – based on Condor-G Registry service – WS-MDS Current Development: Exploring SOA Grid

A Metascheduler Service based on Condor-G • GT4 Condor-G JobManager • MDS ClassAd Extraction Tool • Information Provider • GLUE CE Schema with required Condor-G extensions Condor-G Job Manager

Summary • Built upon proven technologies: VDT, Condor-G • GridX1 allows us to exploit unused resources at HEP and non-HEP sites • Dynamic grid monitor available at http://monitor.gridx1.ca/ • GridX1 usage by ATLAS and BaBar applications is successful • Used for ATLAS DC2 during July 2004 – June 2005 • Receiving jobs from Atlas Executor in 2006 • Daily ~1000 BaBar jobs run daily • Moving towards a Web Services based architecture.

GridX1: A Canadian Computational Grid for High-Energy Physics Applications

GridX1: A Canadian Computational Grid for High-Energy Physics Applications

Presentation Transcript

Symbols of Canada

Accelerating Molecular Modeling Applications with GPU Computing

GRID COMPUTING

Grid Scheduling

Tutorial: Computational Voting Theory

Applications of SOA and Web Services in Grid Computing

Selenium Grid and Jenkins

Applications (1 of 2): Information Retrieval

Canadian Bioinformatics Workshops

GRID COMPUTING

Grid Systems and scheduling

SAM: Tevatron Experiments Using the Grid

Advanced Data Structures NTUA 2007 R-trees and Grid File

Computational Modeling of Macromolecular Systems

Scheduling for Grid Computing

Computational Linguistics

Canadian Children’s Literature

Canadian Oil and Gas

Dr. Rajkumar Buyya

Grid Computing and LA Grid