1 / 76

AIST Grid Initiative in Japan and the Asia Pacific Region

AIST Grid Initiative in Japan and the Asia Pacific Region. Yoshio Tanaka Grid Technology Research Center, A dvanced I ndustrial S cience and T echnology, Japan. Talk Contents. Introduction myself and AIST Research Activities Ninf-G: GridRPC Programming Middleware

jin-ratliff
Télécharger la présentation

AIST Grid Initiative in Japan and the Asia Pacific Region

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AIST Grid Initiative in Japan and the Asia Pacific Region Yoshio Tanaka Grid Technology Research Center, Advanced Industrial Science and Technology, Japan

  2. Talk Contents • Introduction • myself and AIST • Research Activities • Ninf-G: GridRPC Programming Middleware • ApGrid: Asia Pacific Grid • Grid PSE Builder

  3. Self Introduction • 1995 • Received Ph.D from Keio University • Parallel GC (Garbage Collection) • 1996~1999: Real World Computing Partnership (RWCP) • Programming and performance evaluation of SMP clusters • Firewall-compliant Globus Toolkit + MPICH-G • 2000: Electrotechnical Laboratory (ETL) • Ninf-G, ApGrid • 2001~: AIST

  4. What is the AIST ? • One of the largest Nat’l Labs in Japan • Research topics include • Environment • Material • Bio/Life science • Standards (JIS/OSI) • Geographical survey • Semiconductor device • Computer Science • etc. • 3,500 employee + 3,000 staff • roughly $1,400M USD/FY2002 AIST Tsukuba Main Campus 7 other campuses across Japan Tsukuba 40km Narita 50km Tokyo 50km

  5. Grid Technology Research Center • Establishment • Since Jan. 1, 2002 • 7 years term • 24th Research Center of AIST • Location • Tsukuba Central Umezono 1-1, Tsukuba • Tokyo Office • Ueno area • 30 people for software development • Engaged in developing grid middleware, applications and system technologies • Research $$ approx. 1000M JPY One of the world’s foremost GRID Research Center, the largest in Japan

  6. Grid Tech. Research Center Director: Satoshi Sekiguchi Deputy Director: Mitsuo Yokokawa

  7. AIST GTRC (Grid) Super Cluster P32: IBM eServer325 Opteron 2.0GHz, 6GB 2way x 1074 node Myrinet 2000 8.59TFlops/peak 10,200mm Myrinet M64: Intel Tiger 4 Madison 1.3GHz, 16GB 4way x 131 node Myrinet 2000 2.72TFlops/peak 10,800mm F32: Linux Networx Xeon 3.06GHz, 2GB 2way x 256+ node GbE 3.13TFlops/peak P32 M64 total 14.5TFlops/peak, 3188 CPUs

  8. National Research Grid Initiative (NAREGI) Project:Overview • A new Japanese MEXT National Grid R&D project • ~$(US)17M FY’03 (similar until FY’07) + $45mil • One of two major Japanese Govt. Grid Projects • c.f. “BusinessGrid” (~$(US)25M FY’03-05) METI • Collaboration of National Labs. Universities and Major Computing and Nanotechnology Industries • - Acquisition of Computer Resources underway (FY2003) MEXT:Ministry of Education, Culture, Sports,Science and Technology

  9. NAREGI Software Stack WP6: Grid-Enabled Apps WP3: Grid Visualization WP3: Grid PSE WP3: Grid Workflow WP2: Grid Programming-Grid RPC -Grid MPI WP4: Packaging WP1: Grid Monitoring & Accounting WP1: SuperScheduler (Globus,Condor,UNICOREOGSA) WP5: Grid PKI WP1: Grid VM WP5: High-Performance Grid Networking

  10. Ninf-G: GridRPC Programming Middleware

  11. Layered Programming Model/Method Easy but inflexible Portal / PSE GridPort, HotPage, GPDK, Grid PSE Builder, etc… High-level Grid Middleware MPI (MPICH-G2, PACX-MPI, …) GridRPC (Ninf-G, NetSolve, …) MPI Low-level Grid Middleware Globus Toolkit Primitives Socket, system calls, … Difficult but flexible

  12. Some Significant Grid Programming Models/Systems • Data Parallel • MPI - MPICH-G2, Stampi, PACX-MPI, MagPie • Task Parallel • GridRPC – Ninf, Netsolve, Punch… • Distributed Objects • CORBA, Java/RMI, … • Data Intensive Processing • DataCutter, Gfarm, … • Peer-To-Peer • Various Research and Commercial Systems • UD, Entropia, Parabon, JXTA, … • Others…

  13. GridRPC: RPC based Programming model Utilization of remote supercomputers ② Notify results Internet user ① Call remote procedures Call remote libraries Large scale computing utilizing multiple supercomputers on the Grid

  14. GridRPC: RPC “tailored” for the Grid • Medium to Coarse-grained calls • Call Duration < 1 sec to > week • Task-Parallel Programming on the Grid • Asynchronous calls, 1000s of scalable parallel calls • Large Matrix Data & File Transfer • Call-by-reference, shared-memory matrix arguments • Grid-level Security(e.g., Ninf-G with GSI) • Simple Client-side Programming & Management • No client-side stub programming or IDL management • Other features…

  15. GridRPC (cont’d) • v.s. MPI • Client-server programming is suitable for task-parallel applications. • Does not need co-allocation • Can use private IP address resources if NAT is available (at least when using Ninf-G) • Better fault tolerancy • Activities at the GGF GridPRC WG • Define standard GridRPC API; later deal with protocol • Standardize only minimal set of features; higher-level features can be built on top • Provide several reference implementations • Ninf-G, NetSolve, …

  16. Typical Scenario: Optimization Problems and Parameter Study on Cluster of Clusters rpc rpc rpc Structural Optimization Vehicle Routing Problem Slide by courtesy of Prof. Fujisawa

  17. Connect back Interface Reply Generate Invoke Executable Interface Request fork Remote Library Executable Register Sample Architecture and Protocol of GridRPC System – Ninf - Server side • Call remote library • Retrieve interface information • Invoke Remote Library Executable • It Calls back to the client Client side • Server side setup • Build Remote Library Executable • Register it to the Ninf Server IDL file Numerical Library Client IDL Compiler Ninf Server

  18. GridRPC: based on Client/Server model • Server-side setup • Remote libraries must be installed in advance • Write IDL files to describe interface to the library • Build remote libraries • Syntax of IDL depends on GridRPC systems • e.g. Ninf-G and NetSolve have different IDL • Client-side setup • Write a client program using GridRPC API • Write a client configuration file • Run the program

  19. The GridRPC API • Provide standardized, portable, and simple programming interface for Remote Procedure Call • Attempt to unify client access to existing grid computing systems (such as NetSolve and Ninf-G) • Working towards standardization through the GGF GridRPC WG • Initially standardize API; later deal with protocol • Standardize only minimal set of features; higher-level features can be built on top • Provide several reference implementations • Not attempting to dictate any implementation details

  20. Rough steps for RPC • Initialize • Create a function handle • Abstraction to a remote library • RPC • Call remote procedure grpc_initialize(config_file); grpc_function_handle_t handle; grpc_function_handle_init( &handle, host, port, “lib_name”); grpc_call(&handle, args…); or grpc_call_async(&handle, args…);

  21. Data Parallel Application • Call parallel libraries (e.g. MPI apps). • Backend “MPI” orBackend “BLACS”should be specifiedin the IDL Parallel Computer Parallel Numerical Libraries Parallel Applications

  22. Server Server Server Server Task Parallel Application • Parallel RPCs using asynchronous call.

  23. Task Parallel Application • Asynchronous Call • Waiting for reply Client ServerA ServerB grpc_call_async(...); grpc_call_async grpc_call_async grpc_wait_all grpc_wait(sessionID); grpc_wait_all(); grpc_wait_any(idPtr); grpc_wait_and(idArray, len); grpc_wait_or(idArray, len, idPtr); grpc_cancel(sessionID); Various task parallel programs spanning clusters are easy to write

  24. Ninf Project • Started in 1994 • Collaborators from various organizations • AIST • Satoshi Sekiguchi, Umpei Nagashima, Hidemoto Nakada, Hiromitsu Takagi, Osamu Tatebe, Yoshio Tanaka,Kazuyuki Shudo , Hirotaka Ogawa • University of Tsukuba • Mitsuhisa Sato, Taisuke Boku • Tokyo Institute of Technology • Satoshi Matsuoka, Kento Aida • Tokyo Electronic University • Katsuki Fujisawa • Ochanomizu University • Atsuko Takefusa • Kyoto University • Masaaki Shimasaki

  25. History of Ninf Project 1994 1997 2000 2003 Ninf-G development Ninf project launched Ninf-G Ver.2.0.0 Release Standard GridRPC API proposed Release Ninf version 1 GridRPC WG at GGF8 / GGF9 Start collaboration with NetSolve team Release Ninf-G version 0.9 Release Ninf-G version 1.0

  26. What is Ninf-G? • A software package which supports programming and execution of Grid applications using GridRPC. • Ninf-G includes • C/C++, Java APIs, libraries for software development • IDL compiler for stub generation • Shell scripts to • compile client program • build and publish remote libraries • sample programs • manual documents

  27. Ninf-G: Features At-a-Glance • Ease-of-use, client-server, Numerical-oriented RPC system • No stub information at the client side • User’s view: ordinary software library • Asymmetric client vs. server • Built on top of the Globus Toolkit • Uses GSI, GRAM, MDS, GASS, and Globus-IO • Supports various platforms • Ninf-G is available on Globus-enabled platforms • Client APIs: C/C++, Java, Fortran

  28. Connect back Invoke Executable fork Architecture of Ninf-G Client side Server side IDL file Numerical Library Client IDL Compiler Generate Globus-IO Interface Request Interface Reply Remote Library Executable GRAM GRIS Interface Information LDIF File retrieve

  29. Ninf-g Ninf-g Grid Lib Ninf-g Ninf-g Ninf-g user Demo System of a Climate Simulation • Integrating 2 Ninf-G programs • Climate Simulation program • Visualization program • Executed in a pipelined fashion • Accessing through GridLib portal S-model Program Reading Data Averaging results Solving Equations Solving Equations Solving Equations Visualizing Results

  30. Ttotal = Nsmpl*Tcalc Nsmpl = 105 ~ 106 Ttotal ~ 30 years! Replica Exchange Monte-Calro Simulation • Potential survey of molecules using direct method (ab-initio calc.) • Random walk survey • enables survey of complicated potential surface • Ab-initio calculation • enables precise energy calculation of molecules • Replica Exchange method • Enables efficient MC survey

  31. Gridifying the program • Two levels of parallelization • Coarse grained: parallel monte-carlo sampling • Fine grained: parallel ab-initio energy calculation • Dynamic task scheduling, machine reconfiguration • Task scheduling for balancing load on a heterogeneous computing resources • Machine scheduling for reconfiguring machine sets on the fly Bookkeeper Reconfiguration request Dynamic scheduling Energy calc. REXMC client Task allocation Monitoring Reconfiguration T1 T2 MC Sampling Servers T3 meta-computing test bed 10 institutes/20 Supercomputers ab initio calculation

  32. AIST REXMC Client For C20 triplet REXMC Client For C20 singlet Bookkeeper HPC Challenge in SC2002 Metacomputing Test-bed • 10 institutions (3 continentals) / 20 parallel computer (7 types) • High Performance Computing Center Stuttgart (HLRS), • Sandia National Laboratories (SNL), • Pittsburgh Supercomputing Center (PSC), • Grid Technology Research Center (AIST), • Manchester Computing Centre (MCC), • National Center for High Performance Computing (NCHC), • Japan Atomic Energy Research Institute (JAERI), • Korea Institute of Science and Technology Information (KISTI), • European Center of Parallelism in Barcelona (CEPBA/CIRI), • Finnish IT center for Science (CSC).

  33. Current Status • Ninf-G Ver. 2 alpha is available at http://ninf.apgrid.org/ • Ninf-G Ver. 2 will be released by the end of this March

  34. ApGrid: Asia Pacific Partnership for Grid Computing

  35. ApGrid: Asia Pacific Partnership for Grid Computing North America Europe • International Collaboration • Standardization Asia ApGrid Testbed International Grid Testbed over the Asia Pacific countries • ApGrid focuses on • Sharing resources, knowledge, technologies • Developing Grid technologies • Helping the use of our technologies in create new applications • Collaboration on each others work • Possible Applications on the Grid • Bio Informatics • (Rice Genome, etc.) • Earth Science • (Weather forecast, Fluid prediction, Earthquake prediction, etc.)

  36. PRAGMA Pacific Rim Application andGrid Middleware Assembly http://www.pragma-grid.net

  37. History and Future Plan 2000 2002 2001 Kick-off meeting Yokohama, Japan Demo @ HPCAsia Gold Coast, Australia demo @ SC2002 Baltimore, USA (50cpu) Presentation @ GF5 Boston, USA 1st ApGrid Workshop Tokyo, Japan presentation @ SC2001 SC Global Event demo @ iGrid2002 Amsterdam, Netherland 1st Core Meeting Phuket, Thailand ApGrid PRAGMA Presentation @ APAN Shanghai, China 1st PRAGMA Workshop San Diego, USA 2nd PRAGMA Workshop Seoul, Korea 2nd ApGid Workshop/Core Meeting Taipei, Taiwan

  38. History and Future Plan (cont’d) 2003 2004 3rd PRAGMA Workshop Fukuoka, Japan presentation @ APAN Hawaii, USA demo @ SC2004 Pittsburgh, USA 7th PRAGMA Workshop San Diego, USA demo @ CCGrid Tokyo, Japan (100cpu) 6th PRAGMA Workshop Beijing, China Asia Grid Workshop (HPC Asia) Oomiya, Japan 4th PRAGMA Workshop Melbourne, Australia (200cpu) demo @ SC2003 Joing Demo with TeraGrid Phoenix, USA (853CPU) demo & ApGrid Informal Meeting @ APAC’03 Gold Coast, Australia (250cpu) 5th PRAGMA Workshop Hsinchu, Taiwan (300cpu)

  39. ApGrid Branch in Sun’s Boothat SC2001

  40. Sun Grid Engine on the ApGrid Testbed Ultra Enterprise Cluster + Sun Grid Engine (AIST, Japan) Sun Demo Station Denver, USA 622Mbps x 2 Job submisssion via Globus

  41. Large-scale Sun Grid Engine Grid Testbed in Asia Pacific

  42. ApGrid/PRAGMA Testbed • Architecture, technology • Based on GT2 • Allow multiple CAs • Build MDS Tree • Grid middleware/tools from Asia Pacific • Ninf-G (GridRPC programming) • Nimrod-G (parametric modeling system) • SCMSWeb (resource monitoring) • Grid Data Farm (Grid File System), etc. • Status • 26 organizations (10 countries) • 27 clusters (889 CPUs)

  43. Lessons Learned • We have to pay much efforts for initiation • Problems on installation of GT2/PBS/jobmanger • Installation/configuration of GT2/PBS/jobmanager is still not so easy for application people. • Most sites needed help for the installation. • Software requirements depends on the application and middleware used by the application. • In order to run GridRPC (Ninf-G) applications, I asked • Open firewall/TCP Wrapper • Additionally build Info SDK bundle with gcc32dbg • Install Ninf-G • change configuration of xinetd/inetd • Enable NAT

  44. Lessons Learned (cont’d) • MDS is not scalable and still unstable • Terrible performance • GIIS lookup takes several ten seconds ~ minutes • Some parameters in grid-info-slapd.conf such as sizelimit, timeout, cahcettl, etc., should be set to appropriate values depends on your environment (number of registered objects, network performance between GRISes and GIISes, etc.). • Well known problem  • Firewall, private IP addresses…

  45. Lessons Learned (cont’d) • Difficulties caused by the grass-roots approach. • It is not easy to keep the GT2 version coherent between sites. • Different requirements for the Globus Toolkit between users. • Middleware developers needs the newest one. • Application developers satisfy with using the stable (older) one. • It is not easy to catch up frequent version up of the Globus Toolkit. • CoG is a current problem 

  46. Lessons Learned (cont’d) • Difficulties caused by the grass-roots approach (cont’d) • Most resources are not dedicated to the ApGrid Testbed. (though this is a common problem for Grids) • There may be busy resources • Need grid level scheduler, fancy Grid reservation system?

  47. Lessons Learned (cont’d) • Some resources are not stable • example: If I call many RPCs, some of them fails (but sometimes all will done) • not yet resolved • GT2? Ninf-G? OS? Hardware? • Other instability • System maintenance (incl. version up of software) without notification • realized when the application would fail. • it worked well yesterday, but I’m not sure whether it works today • But this is the Grid 

  48. Observations • Still being a “grass roots” organization • Less administrative formality • cf. PRAGMA, APAN, APEC/TEL, etc. • Difficulty in establishing collaboration with others • Unclear membership rules • Join/leave, membership levels • Rights/Obligations • Vague mission, but already collected (potentially) large computing resources

  49. Observations (cont’d) • Duplication of efforts on “similar” activities • Organization-wise • APAN - participation by country • PRAGMA – most organizations are overlapped • Operation-wise • ApGrid testbed vs PRAGMA-resource • may cause confusion • technically, the same approach • Multi-grid federation • Network-wise • Primary APAN – TransPAC • Skillful engineering team

  50. Summary of current status • Difficulties are caused by not technical problems but sociological/political problems • Each site has its own policy • account management • firewalls • trusted CAs • … • Differences in interests • Application, middleware, networking, etc. • Differences in culture, language, etc. • Human interaction is very important

More Related