Asia Pacific Grid: Towards a production Grid

Asia Pacific Grid:Towards a production Grid Yoshio Tanaka Grid Technology Research Center, Advanced Industrial Science and Technology, Japan

Contents • Updates from PRAGMA 5 • Demo at SC2003 (climate simulation using Ninf-G) • Joint demo with NCHC • Joint demo with TeraGrid • Experiences and Lessons Learned • Towards a production Grid

Why the climate simulation? • Climate simulation is used as a test application to evaluate progress of resource sharing between institutions • We can confirm achievements of • Globus-level resource sharing • Globus is correctly installed • Mutual authentication based on GSI • High-level Middleware (GridRPC) –level resource sharing • JobManager works well • Network configuration of the cluster(note that most clusters use private IP addresses)

Behavior of the System Severs NCSA Cluster (225 CPU) Ninf-G Client (AIST) Severs AIST Cluster (50 CPU) Titech Cluster (200 CPU) KISTI Cluster (25 CPU)

Terrible 3 weeks (PRAGMA5~SC2003) • Increased resources • 14 clusters -> 22 clusters • 317 cpus -> 853 cpus • Installed Ninf-G and climate simulation on TeraGrid • Account was given in Nov. 4th • Port Ninf-G2 to IA64 architecture

Necessary steps for the demo • Apply my account to each site • Add an entry to grid-mapfile • Test globusrun • authentication • Is my CA trusted? Do I trust your CA? • Is my entry in grid-mapfile? • DNS lookup • reverse lookup is used for server authentication • firewall / TCP Wrapper • Can I connect to the Globus gatekeeper? • Can the globus jobmanager connect to my machine? • jobmanager • Is the queuing system (eg. pbs, sge) installed appropriately? • Does jobmanager script work as expected? • In case of TeraGrid • Obtained my user certificate from TeraGrid CA (NCSA CA) • Asked TITECH and KISTI to trust NCSA CA • It was not feasible to ask TeraGrid to trust AIST GTRC CA

Necessary steps for the demo (cont’d) • Install Ninf-G2 • Frequently occurred problem due to inappropriate installation of GT2 SDK • GT2 manual: • GRAM and DATA: gcc32dbg • Info: gcc32dbgpthr • Asked additional installation of Info SDK with gcc32dbg • Test Ninf-G application • Can Ninf-G server program connect to the client? • If private IP address is used for the backend node, NAT must be available • These are application/middleware specific requirements. Requirements depend on applications and middleware. • New Ninf-G application (TDDFT) needs Intel Fortran Compiler • Other application needs GAMESS / Gaussian

Lessons Learned • Need to pay much efforts for initiation • MDS is not scalable and still unstable • Need to modify some parameters in grid-info-slapd.conf • Testbed was unstable • Unstable / poor network • System maintenance (incl. version up of software) without notification • realized when the application would fail. • it worked well yesterday, but I’m not sure whether it works today

Lessons Learned (cont’d) • Difficulties caused by the grass-roots approach. • It is not easy to keep the GT2 version coherent between sites. • Different requirements for the Globus Toolkit between users • Most resources are not dedicated to the Testbed. • resources may be busy / highly utilized • Need grid level scheduler, fancy Grid reservation system? • (from point of view of resource providers) we need flexible control of donated resources • e.g. 32 nodes for default user, 64 nodes for specific groups, 256 nodes for my organization

Summary of current status (cont’d) • What has been done? • Resource sharing between more than 20 sites (853cpus were used by Ninf-G application) • Use GT2 as a common software • What hasn’t? • Formalize “how to use the Grid Testbed” • I could use, but it is difficult for others • I was given an account at each site by personal communication • Provide documentation • Keep the testbed stable • Develop management tools • Browse information • CA/Cert. management

Towards a production Grid • Define minimum requirements of Grid middleware • Resource WG has the responsibility • NMI, TeraGrid software stack • Each site must follow the requirement • Keep the testbed as stable as possible • Understand that the security is definitely essential for international collaboration • How is the security (CA) policy in Asia Pacific?

Towards a production Grid (cont’d) • Draft “Asia Pacific Grid Middleware Deployment Guide”, which is a recommendation document for deployment of Grid middleware • Minimum requirements • Configuration • Draft “Instruction of Grid Operation in the Asia Pacific Region”, which guides how to run Grid Operation Center to support management of stable Grid testbed. • Launch Asia Pacific Grid Policy Management Authority ( http://www.apgridpma.org/ ) • Coordinate security level in Asia • Interact with outside of Asia (DOEGrids PMA, EUGrid PMA) • Sophisticated users’ Guide is necessary

Towards a production Grid (cont’d) • Each site should provide a document and/or web for users • Requirements for users • How to obtain an account • Available resources • hardware • software and its configuration • resource utilization policy • support and contact information

Future Plan (cont’d) • Should think about GT3/GT4-based Grid Testbed • Each CA must provide CP/CPS • International Collaboration • TeraGrid, UK eScience, EUDG, etc. • Run more applications to evaluate feasibility of Grid • large-scale cluster + fat link • many small cluster + thin link

Summary • It is tough work to make resources available for applications • many steps • It is tough to keep the testbed stable • Many issues to be solved toward a production Grid • Technical • local and global scheduler • dedication / reservation / co-allocation • Political • CA policy • How can I get an account on your site? • Both • Coordination of middlewares • More interaction between resource and applications WG is necessary • Need to establish necessary procedures for resource sharing

Asia Pacific Grid: Towards a production Grid

Asia Pacific Grid: Towards a production Grid

Presentation Transcript

The Global Wordnet Grid: anchoring languages to universal meaning

Grid Computing Using Modern Technologies

GRID COMPUTING

Grid Scheduling

The Homework Grid

The Emergence of Open Grid Standards

Applications of SOA and Web Services in Grid Computing

Selenium Grid and Jenkins

Introduction to Grid Computing and the Globus Toolkit™

GRID COMPUTING

Grid Systems and scheduling

SAM: Tevatron Experiments Using the Grid

POWER GRID CORPORATION OF INDIA LIMITED BANGALORE

Scheduling for Grid Computing

Grid Portals – A User’s Gateway to the Grid

Marketing for MOST Module 12 – Strategic Management in the Asia-Pacific

iRODS Tutorial II. Data Grid Administration

Grid Computing and LA Grid