Recent Developments of the Ninf Global Computing System Satoshi Matsuoka(TIT) Satoshi Sekiguchi(ETL)

Recent Developments of the Ninf Global Computing SystemSatoshi Matsuoka(TIT)Satoshi Sekiguchi(ETL) http://ninf.etl.go.jp

Today’s Talk • Global Computing Testbed Infrastructure (GCI) effort in Japan • APAN/TransPac, Int’l Collaboration • Brief Intro to Ninf System • Recent Developments • Modeling and Simulating Global Computing • Better Scientific&Engineering Discipline

Global Computing Infrastructure • Collaborative Effort Starting in Japan to Establish Testbed for the Grid • Participants • ETL, Waseda-U, RWCP, Osaka-U, Tokyo Institute of Technology, etc. • Installation/Test Deployment of Multiple Grid Software and Apps • AppLeS/NWS, Condor, Globus, Legion, Netsolve, Ninf, Applications etc. planned • International Collab. thru APAN

APAN/TransPAC Research-dedicated network within Asia and inbetween Asia and North America Being Launched Sep. 4. Perpetual Link between Asian participants and vBNS sites Need Grid software?! - both app&systems North America ASIA-Pacific Region 35Mbps(TransPAC) ChicagoStarTAP KDD Tokyo

APAN Participants from Japan Agriculture, Forestry and Fisheries Research Council Agency of Industrial Science and Technology (AIST) Communication Research Laboratories Electrotechnical Laboratory (ETL w/RWC and Tokyo Inst. Tech.) Institute of Space and Astronautical Science (ISAS) KDD R&D Laboratories KEK(High Energy Accelerator Research Organization) Medical Internet Exchange Association (MDX Association) NASDA (National Aeronautics and Space Development Agency) National Cancer Center (NCC) National Institute of Genetics (NIG) NTT Laboratories (NTT Labs) RIKEN(The Institute of Physical and Chemical Research) University of Tokyo WASEDA University KEIO University et al. (WIDE)

Ninf Executable Ninf Executable Ninf Executable Ninf Component Architecture Other Global Computing Systems, e.g., NetSolve via Adapters Ninf DB Server Ninf Register Meta Server Internet Ninf Computational Server Meta Server Meta Server Stub Program Ninf Procedure Ninf Client Library : Ninf_call(“linpack”, ..); : Ninf RPC Ninf Stub Generator IDL File Program

Brief History of Ninf • The first draft paper (Jun.’94) • A naive implementation (Sep.’94) w/PVM • Paper POOMA’95 at Santa Fe (Mar.’95) • Cray J90 installed as Ninf server Sep.’95 • The Metaserver introduced Feb.’96 • The First package released Jun.’96 • Ninf/Netsolve Collaboration, Fall ’97 • Extensive Tools Development Early ’97~

NinfCalc+ ExcelNinf Mathematica ... Numerical Scientific Computing Progs. Application Mathematical Libraries Ninf Client API (F77, C, Java,…) Ninf DB Ninf Computation Server NetSolve Server Programming Tool Ninf MetaServer NetSolve Adpter Resource Manager Ninf Protocol Service FTP HTTP TCP/IP Hardware Gigabit Net LAN WAN Architectural Layers of Ninf

Resource Scheduling for the “Grid” • Max “performance” under Dynamic, Hetero. Env. • Computing Server Performance/Load • Network Topology/Bandwidth/Congestion • Multiple Users at Multiple Sites: HPC vs. Hi-thruput  Scheduling for the “Grid” • Exisiting Scheduling Systems MetaServer (Ninf), Agent (NetSolve), AppLeS/NWS, Prophet No frameworks to judge effectiveness • Difficult to perform large-scale experiments • Benchmarks®Fair Reproducibility also difficult

Objective of the Model • Simulation Model and Simulators for HPDS (the “Grid”) • Modeling Various Grid Environment • Large-scale Simulation • Reproducibility • Contents • Overview of the Simulation Model • Simulator and Validity of the Model • Application: Evaluation of different Schedulers

General Architecture of an HPDC System (Computing) • Clients • Computing Servers • Scheduling System • Schedulers (e.g. AppLeS, Prophet) • Performs Scheduling According to System/User Policy • Directory Service (ex. Globus-MDS) • Central Database of Resource Info • Monitors/Predictors (ex. NWS) • Monitors and Predicts Server and Network Status

Request Generation Query the Scheduler Periodic Monitoring of Server and Network Assign Appropriate server Execute Request Return ComputedResult Canonical Model of Typical Grid Execution Scheduling Unit Directory Service Site1 Monitor Scheduler Client A Server A Site2 Server B Internet Client B Server C Client C

The Model for Grid Simulation • Requirements for a Simulation Model • Various Clients, Servers, and Network Topologies • Servers: Performance, Load, Variance over time • Network : Bandwidth, Throughput (congestion), Variance over time ® Employ Queuing Theory • Characteristics of our Model • Simulation of large-scale execution environment • Reproducible, Fair Evaluation of Algorithms

Simulating the Grid with a Queuing Model --- Overview Site1 Site1’ Qns1 Qnr1 Server A Client A Client A’ Qs1 Qns2 Qnr2 Site2 Site2’ Server B Client B’ Client B Qns3 Qnr3 Qs2 Qns4 Qnr4 Server C Client C’ Client C Qs3

Simulating the Grid with a Queuing Model (2) ns_others • Arrival Rate of Data into Qns : ns= ns_request + ns_others ns_request: Request packets，ns_others : External Perturbation • Arrival Rate of Job into Qs : s= s_request + s_others s_request: Job Request，n_others : External Perturbation s_others Server Network Client Qs Qns ns ns s s s_request ns_request

Processing at the Client • Emits Request w/probability request • Queries Scheduler for a Server • Provides Info on Request • Computing steps, Amount of Data Transfer The Scheduler assigns an appropriate server • Emits Request to the Assigned Server into Qnr • Divides Data into Logical Packets The Server Completes the Processing of Request • Client Receives Result Data from Qnr

Client Parameters • The Probability of Emitting a Logical Packet into Qns packet= Tnet / Wpacket Tnet : Network Bandwidth Wpacket : Logical Packet Size • Example: Tnet=1.0[MB/s], Wpacket=0.01[MB] ® packet = 1.0/0.01 = 100

Processing at the Network • Describe Comm. Throughput w/ns_others  Employ the M/M/1/N Queue for Qns • Arrival at Qns : both Request Data packets and packets of External Perturbation • When the Queue is full, then the request data packet is retransmitted • Each incoming data packet into Qns is processed for [data size/bandwidth] time, and then comes out of Qns

Network Throughput • Arrival Rate of External Perturbation --- Determines Network Throughput packet/ (ns_others+ packet) = Tact / Tnet  ns_others= (Tnet / Tact - 1)  packet • Length of Qns --- Determines Latency Wpacket N / Tnet Tlatency N  Tlatency  Tnet/ Wpacket (note: N 2)

Examples of Network Parameters • Under Tnet=1.0[MB/s], Wpacket=0.01[MB](packet=100) network throughput Tact=0.1[MB/s], to simulate Tlatency=0.1: • Arrival Rate of External Perturbation ns_others= (Tnet/Tact-1)packet = (1.0/0.1-1)100 = 900 • Queue Length N  TlatencyTnet/Wpacket = 0.11.0/0.01 = 10

Processing at the Server • Describe Response time of Job Execution ® Employ the M/M/1 Queue for Qs • Server Receiving Requests • All data packets comes out of Qns, goes into Qs • Each Job on Qsserver is processd for [Compute Amount/Server Performance] time • Returns Result of Request to the Client • Divides Return Data into Logical Packets • Emit logical packets into Qnr w/ probability  packet

Server Parameters • Arrival Rate of External Perturbation Jobs--- Determines Server Utilization s_others = Tser / Ws_othersU Tser: Server Performance Ws_others: Av. computing steps of EPJ U : Server Utilization • Example Parameters • Under Tser=100[Mflops], Ws_others=0.01 [MB] , to Simulate U=0.1: s_others = 100/0.010.1 = 1000

Characteristics of our Simulator • OO design - Simulation Env. is pluggable • Client, Server, and Network Topologies • Scheduling Models • Processing at the Network/Server • Randomness Distribution (Poisson, etc.) (Employ Abstract Factory Pattern) • Each object can have independent (pseudo-)random number sequence • Implemented w/ Java • Parallel Simulator planned

Evaluating Validity of the Model • Comparing Simulation to Actual Measurement on the Ninf System • Linpack --- Compute: 2/3n3 + 2n2 [flops], Comm: 8n2 + 20n[bytes] • Evaluation Environment • # server : client = 1 : 1, 1 : 4 Clients Ocha-U [SS10] (0.16MB/s, 32ms) Server Internet U-Tokyo [Ultra1] (0.35MB/s, 20ms) ETL [J90, 4PE] NITech [Ultra2] (0.15MB/s, 41ms) TITech [Ultra1] (0.036MB/s, 18ms)

Evaluation Model of the Parameter • Client • request= 1 / [Request time + interval] • Wpacket = 10, 50, 100 [KB] (fixed) • Network(FCFS) • Bandwith Tnet = 1.5 [MB/s] • Ext. Purtb. : Av. Size = Wpacket (Exp. Dist.) ns_others, nr_others: Poisson Arrival • Server (FCFS) ー From Actual Measurements • Performance Tser = 500 [Mflops] (Cray J-90) • EPJ : Av. Compute Steps= 10 [Mflops] (Exp. Dist.) Utilization 4 [%], Poisson Arrival

Evaluation of the Validity of the Simulation Model (1 : 1) • Almost Same Performance w/actual measurements for different packet sizes  simulation cost could be reduced • Matches actual measurements for different problem sizes as well

Evaluation of the Validity of the Simulation Model (1 : 4) • Different Throughput Still Results in close match to Real Measurements 600 1000 1400

400 Mops 40Mops Server A Server B 1.08MB/s 0.2MB/s Client 1 Client 3 Client 4 Client 2 Application : Evaluating Scheduling Algorithms with Simulation • 3 different basic scheduling algorithms • RR : Round-Robin • LOAD: Comp. Power + Load min (L  1) / P (L : av. load, P : server perf.) • LOTH: Comp. Power + Load + Comm. Thruput: min Comp / (P / (L + 1)) + Comm / Tnet • Evaluation with Linpack / EP under hetero. environ. • Simple Predictionin addition to LOTH

Parameters for Evaluating Scheduling Algorithms • Clients • Problem Sizes: Linpack - 600，EP - 221 • request= 1 / (Worst Request time + interval) (interval : Linpack 5 [sec], EP 20 [sec]), Poisson • Logical packets Wpacket = 100 [KB] (fixed)，Poisson • Networks (FCFS) • Tnet = 1.5 [MB/s] • Ext. Perturb Data: av. size = Wpacket(exp. distr.) ns_others, nr_others are Poisson arrivals • Server (FCFS) • EPJ: av. computing = 10 [Mflops] (exp. distr.) Utilization 10 [%] , Poisson arrival

Results of Evaluating Scheduling Algorithms • LOTH best result: resource info best utilized • Prediction did not work well as expected • LOAD performance of Linpack poor • Network bottleneck causes (false) low server utilization [SC97] • RR performs worst

Weakness of the Model and Simulator • Does not model • Inter-server communication • Inteference between Networks • Distinguish Application vs. JobScheduling • Need to model co-scheduling • Simulator not very fast --- parallelization

The Parallel Grid Simulators • Performance Simulator • 33 PentiumII 400Mhz 128Mb • 100Base-T (Hub+Switch) • Windows NT 4.0 + Java(Ninflet) • Network Simulator • 33Node PentiumII 333Mhz256Mb/12Gb • Switched 100Base-T • Linux/RWC Score • 150cm x 135cm x 40cm(MicroATX)

Recent Ninf Developments (1) • Metaserver resource management architecture • Java-based • Could plug in predictors (e..g, NWS), directory (MDS) • Ninf v2. Development • New Protocol and IDL • Numerical RPC protocol w/Netsolve • Inregration w/CORBA • Utilize Globus DUROC in metaserver load management? • Security • local security(prev.ports) & global security (SSL) • preliminay performance

Recent Ninf Developments (2) • Matrix Workshop • Collab. w/ Matrix Market • Automatically get gen. Matrix from Matrix Market and Matrix Workshop • MPI backend • Run on Wiz and RWC cluster • Automatic data distribution planned • Other backends e.g. Condor? • Demo at SC98 • Fluid Dynamics • Run on RWC cluster, J90/cluster in Japan

Summary • Proposed a Simulation Model and a Simulator for HPDC-Grid Environment • Obtained almost equal results to real measurements, validating the effectiveness of the model • Evaluated basic scheduling algorithms using the model, and results were quantitatively similar to observation w/real measurements

Issues • Collaboration: Bridging the gap between Grid systems • Ninf-Netsolve experience • Various Grid collaborations happening • Agreement of Interfaces • Network Protocols • IDLs and other description • Library APIs • Data Formats

Standardization Due? • Interfaces are important • e.g., PVM vs. MPI • Success stories • Various Network Protocols • Programming Languages: C++, Java, etc. • SQL, HTML, etc. • Drawbacks • Standardizing “too early”

Related Issue - Coping with “Industrial Standards” • CORBA example • Easy to say “CORBA is not appropriate for the Grid” • Is this true? What is exactly missing? • Need real technical underpinnings • Most Grid systems support the ORB of CORBA as a transport • Higher-level services - CORBAServices

今後の課題 • シミュレーションモデルの有効性の向上 • 実際のネットワークにおける変動を考慮 • 計算サーバでのジョブの処理方式の多様化 • Round-Robin など • シミュレーションコストの削減 • 高性能広域計算システムにおける他のスケジューリング手法の評価 • より適切なスケジューリング手法の提案

Recent Developments of the Ninf Global Computing System Satoshi Matsuoka(TIT) Satoshi Sekiguchi(ETL)

Recent Developments of the Ninf Global Computing System Satoshi Matsuoka(TIT) Satoshi Sekiguchi(ETL)

Presentation Transcript

The Legal System and Patent Damages Recent Developments

State of the Climate: Recent Developments

The ABC of ETL with SSIS

Recent developments of the HEADTAIL code

Recent developments

Flux System Characterization of Marine Boundary Layer for Dispersion

The Madrid System Basic features and recent developments

Ninf-G: GridRPC System

Recent Developments

Overview of a Performance Evaluation System for Global Computing Scheduling Algorithms

The Global Fund: Recent Developments

REcent Developments

Hyundai Motor Brazil CRM System Implementation Project

THE 2004 NORTH SLOPE OF ALASKA ARCTIC WINTER RADIOMETRIC EXPERIMENT

Hyundai Motor Brazil CRM System Implémentation Project

The Mediterranean Forecasting System: recent developments

Hyundai Motor Brazil CRM System Implémentation Project

VET system recent developments in Serbia

REcent Developments

Recent Developments in Voting System Standards

Recent Developments