NetSolve Happenings

NetSolve Happenings A Progress Report of the NetSolve Grid Computing System Cluster and Computational Grids for Scientific Computing September 24-27, 2000 Le Château de Faverges de la Tour, Lyon, France.

Outline • The Grid. • NetSolve Overview. • The Key to Success: • Interoperability. • Applications. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ • Interoperability, Applications and NetSolve.

Current Trends in HPC • Highlights of TOP 500 computers (June 2000). • #1: 9632 processor Intel based “ASCI Red” at Sandia National Laboratory. 2379.6 Gflops. (74.2%) • #2 & #3: 2144 Gflops & 1608 Gflops. (55%, 52%) • Others in top 10: LLNL, LANL, Leibniz Rechenzentrum (Munich), University of Tokyo. • #10: 815.1 Gflops. 1324 procs, Cray T3E900 (68.4%) • #250: 58.68 Gflops. 256 procs, Hitachi based arch. (76.2%) • #500: 43.82 Gflops. 64 procs, SunHPC (400Mhz) (85.6%)

Computational Grids • Motivation • Regardless of the number and capacity of computational resources available,there will always be a need/desire formore computational power. • Innovations to increase computationalcapacity not only through hardware, but software infrastructures as well. • Often the case where all resources (data, storage facilities, computational servers, human users, etc.) are distributedly (even globally) located. • Need for technology that reliably manages large collections of distributed computational resources, efficiently scheduling and allocating their services to meet the needs of users while providing robustness, high availability and quality of service.

Computational Grids application user

Vision for the Grid • Uniform, location independent, and transient access to the resources of science and engineering to facilitate the solution of large scale, complex, multi-institutional, multidisciplinary data and computational based problems. • Resources can be: • Hardware (networks, CPU,storage, etc.) • Software (libraries, modules,source code, etc.) • Human collaborators

WebOS Globus IBP IPG NAG-NASA NCSA Workbench PUNCH Webflow NEOS PVM AppLeS Habanero Cumulvs LoCI Legion TeraWeb Everyware NWS Electronic Notebook JINI Condor Harness UniCore Ninf Ninja Gateway JiPANG SinRG Attack of the Grid NetSolve

The NetSolve Grid Environment • Brief Overview of the NetSolve System.

NetSolve Overview • More than just a “not very well-defined user-level protocol!” • Problem Solving Environment Toolkit • Client/Agent/Server system. • Remote access to hardware AND software. • “Robust, fault-tolerant, flexible, heterogeneous environment that provides dynamic management and allocation policies for distributed computational resources.”

NetSolve - The Big Picture Is That Your Final Answer? Dude, I need more computer power.…AND my software selection totally sucks! What’s the name of thatrocking system again? NetSolve! ServiceResults Scheduling Computational Resources Client Agent Information Service Query

PSEs and Applications Matlab SCIRun Custom C Fortran NetSolve Resource Discovery Fault Tolerance Middleware System Management Resource Scheduling Globus proxy NetSolve proxy Ninf proxy Legion proxy Globus NetSolve Ninf Legion NetSolve Infrastructure Metacomputing Resources

Sudesh Agrawal Dorian Arnold Dieter Bachmann Susan Blackford Henri Casanova Jack Dongarra Yuang Hang Karine Heydemann Michelle Miller Keith Moore Terry Moore Ganapathy Raman Keith Seymour Sathish Vahdiyar Tinghua Xu NetSolve Credits

Interoperability and the Grid

The Problem • The goal of the grid: “enable and maintain the controlled sharing of distributed resources to solve multidisciplinary problems of common interest to different groups or organizations.” • Hodgepodge of systems – each possessing their unique perspective, AND UNFORTUNATELY their unique custom protocols and components.

Why The Problem? • Sociological: • Of course, mine is bigger, better, … Even if not, I cannot admit that, dismiss my efforts and use yours. • Technical: • Immaturity • Doesn’t exactly fit needs • Software problems • Economical: • Reinvest time and efforts, throwing away existing code to incorporate ones. • I’ve been funded for this, so …

The Problem (cont’d) • No single system will emerge as the single Grid computing system of choice: • Each has unique characteristics that appeal to different classes of users • Ease of install/administration/maintenance • Stringent Security • Ease of integration • Performance • Interface • Services Provided • Code Robustness/system maturity • …

AND The consensus is an unwillingness to changeexisting custom protocols, objects, etc. Q & A If interoperability is indeed desirable,necessary or both for success of the Grid. THEN Are we stuck?

Globus NetSolve Condor-G NetSolve Proxies Condor Globus Condor Condor-Servers Ninf NetSolve Ninf Proxies NetSolve Current Solutions • Laborious integration efforts that only work between specific systems, typically under specialized circumstances.

EveryWare NetSolve … Legion Globus JiPANG HotPage NPACI Resources NetSolve Ninf PBS Globus Globus Current Solutions (cont’d) CAUTION STOP Legion • Computing Portals as front-ends tosweep the dirt of un-interoperablesystems under the cover.

A Better Solution? • Representation standards for objects, protocols, services, etc. would be ideal. XML? • Is there a possibility of using _____ to • allow us to keep our customizations while allowing other systems to translate/interpret them?

NetSolve Interoperability • XML PDFs • Use XML as the language to implement the description of software services. • Proliferation of XML tools and parsers to exploit. • Collaboration with Ninf project to establish a standardized IDL. • Investigate XML representation for “standard” Grid components – machines, storage, etc. • Standard objects/languages allow systems to share information. There still needs to be some commonly understood protocols to allow inter-system transactions.

NetSolve Interoperability • Within the current NetSolve framework: • Publishing the client-proxy interface allows other metacomputing systems to easily leverage NetSolve resources via. • Implementing new proxies allow NetSolve client users to leverage other metacomputing systems.

Client Proxies • Negotiates for metacomputing services on behalf of the client. • Allows client to be more lightweight. • Proxies provide a translation between “language” of the client and “language” of the underlying services, i.e. NetSolve, Globus, etc.

PSEs and Applications Matlab SCIRun Custom C Fortran NetSolve Resource Discovery Fault Tolerance Middleware System Management Resource Scheduling Globus proxy NetSolve proxy Ninf proxy Legion proxy Globus NetSolve Ninf Legion NetSolve Infrastructure Metacomputing Resources

Applications for the Grid • Heterogeneous application types/classes • independent parallelism, pipeline simulations may represent a key class of applications that can efficiently perform on a Globally distributed computational infrastructure.

Data Persistence • Chain together a sequence of requests. • Analyze parameters to determine data dependencies. Essentially a DAG is created where nodes represent computational modules and arcs represent data flow. • Transmit superset of all input/output parameters and make persistent near server(s) for duration of sequence execution. • Schedule individual request modules for execution.

Request Sequencing • Goals: • Transmit no unnecessary (redundant) data parameters. • Ensure all necessary data parameters are transmitted. • Execute modules simultaneously whenever possible.

Request Sequencing Interface … netsl_begin_sequence( ); netsl(“command1”, A, B, C); netsl(“command2”, A, C, D); netsl(“command3”, D, E, F); netsl_end_sequence(C, D); … … netsl(“command1”, A, B, C); netsl(“command2”, A, C, D); netsl(“command3”, D, E, F); …

DAG Construction • “C” Implementation. • Analyze all input/output references in the request sequence. • Two references are equal if they refer to the same memory address. • Size parameters checked for “subset” objects. • Only NetSolve “Matrices” and “Vectors” are checked. • Constructed DAG scheduled for execution at NetSolve server.

A B E command1 C command2 D command3 F DAG for Example Sequence … netsl_begin_sequence( ); netsl(“command1”, A, B, C); netsl(“command2”, A, C, D); netsl(“command3”, D, E, F); netsl_end_sequence(C, D); …

command1(A, B) sequence(A, B, E) Client Client Server Server result C netsl_begin_sequence( ); netsl(“command1”, A, B, C); netsl(“command2”, A, C, D); netsl(“command3”, D, E, F); netsl_end_sequence(C, D); input A, intermediate output C netsl(“command1”, A, B, C); netsl(“command2”, A, C, D); netsl(“command3”, D, E, F); command2(A, C) Client Server Server result D intermediate output D, input E command3(D, E) Client Client Server Server result F result F Data Persistence (cont’d)

Enhanced Sequencing • Multiple NetSolve server sequencing. • Currently only single NetSolve server can be used to service entire sequence. • If no single server possesses all software, cannot be executed as sequence. • Truly parallel execution only on SMPs like the SGI server used. • Investigate whether graph scheduling heuristics and algorithms for parallel machines can apply to distributed resources as well.

Data Logistics and Distributed Storage Infrastructures • Expand Data Persistence model to multiple servers using Distributed Storage Infrastructures to conveniently cache data parameters near all involved servers. • Example DSIs: IBP, GASS, … • Leveraging remote storage as request parameters, users can pre-allocate data to expedite services or use already remote data in NetSolve requests.

Server cluster Multiple Server Sequencing and DSIs client DSI data caches Sequence Parameters Server Server

Conclusion • Small likelihood that any single system will emerge as the Grid system of choice. Therefore, the interoperability of systems and standardization of protocols and object representations becomes highly desirable. • The Grid community should continue to develop the concepts and technologies necessary to facilitate a seamless Grid environment that is easy to use, highly available and highly efficient. • However, they should promote more cooperation and less competition in an effort to establish a global heterogeneous GC fabric that makes supercomputing power available to the masses.

THE END! http://www.cs.utk.edu/netsolve

NetSolve Happenings