1 / 48

GRID activities at MTA SZTAKI

GRID activities at MTA SZTAKI. Peter Kacsuk MTA SZTAKI Laboratory of Parallel and Distributed Systems www.lpds.sztaki.hu. Contents. SZTAKI participation in EU and Hungarian Grid projects P-GRADE ( P arallel G r id R un-time and A pplication D evelopment E nvironment )

talor
Télécharger la présentation

GRID activities at MTA SZTAKI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GRID activities at MTASZTAKI Peter Kacsuk MTA SZTAKI Laboratory of Parallel and Distributed Systems www.lpds.sztaki.hu

  2. Contents • SZTAKI participation in EU and Hungarian Grid projects • P-GRADE (Parallel Grid Run-time andApplication Development Environment) • Integration of P-GRADE and Condor • TotalGrid • Meteorology application by TotalGrid • Future plans

  3. Hungarian and international GRID projects EU DataGrid • VISSZKI • Globus test • Condor test • DemoGrid • - file system • monitoring • applications Cactus CERN LHC Grid • SuperGrid • - P-GRADE • portal • security • accounting Condor APART-2 EU GridLab EU COST SIMBEX

  4. EU Grid projects of SZTAKI • DataGrid – application performance monitoring and visualization • GridLab – grid monitoring and information system • APART-2 – leading the Grid performance analysis WP • SIMBEX – developing a European metacomputing system for chemists based on P-GRADE

  5. Hungarian Grid projects of SZTAKI • VISSZKI • explore and adopt Globus and Condor • DemoGrid • grid and application performance monitoring and visualization • SuperGrid (Hungarian Supercomputing Grid) • integrating P-GRADE with Condor and Globus in order to provide a high-level program development environment for the Grid • ChemistryGrid (Hungarian Chemistry Grid) • Developing chemistry Grid applications in P-GRADE • JiniGrid (Hungarian Jini Grid) • Combining P-GRADE with Jini and Java • Creation the OGSA version of P-GRADE • Hungarian Cluster Grid Initiative • To provide a nation-wide cluster Grid for universities

  6. NIIFI 2*64 proc. Sun E10000 ELTE 16 proc. Compaq AlphaServer BME 16 proc. Compaq AlphaServer • SZTAKI 58 proc. cluster • University (ELTE, BME) clusters Structure of theHungarianSupercomputing Grid 2.5 Gb/s Internet

  7. The Hungarian Supercomputing GRID project GRID application GRID application Web based GRID access GRID portal High-level parallel development layer P-GRADE Low-level parallel development PVM MW MPI Grid level job management Condor-G Grid middleware Globus Grid fabric Condor, SGE Condor, SGE Condor, SGE Condor, SGE Compaq Alpha Server Compaq Alpha Server SUN HPC Clusters

  8. Distributed supercomputing: P-GRADE • P-GRADE (Parallel Grid Run-time and Application Development Environment) • A highly integrated parallel Grid application development system • Provides: • Parallel, supercomputing programming for the Grid • Fast and efficient development of Grid programs • Observation and visualization of Grid programs • Fault and performance analysis of Grid programs • Further development in the: • Hungarian Supercomputing Grid project • Hungarian Chemistry Grid project • Hungarian Jini Grid project

  9. Three layers of GRAPNEL

  10. Communication Templates • Pre-defined regular process topologies • process farm • pipeline • 2D mesh • User defines: • representative processes • actual size • Automatic scaling

  11. Mesh Template

  12. Hierarchical Debuggingby DIWIDE

  13. Support for systematic debugging to handle non-deterministic behaviour of parallel applications Automatic dead-lock detection Replay technique with collective breakpoints Systematic and automatic generation of Execution Trees Testing parallel programs for every time condition Macrostep Debugging

  14. GRM semi-on-line monitor • Monitoring and visualising parallel programs at GRAPNEL level. • Evaluation of long-running programs based on semi-on-line trace collection • Support for debugger in P-GRADE by execution visualisation • Collection of both statistics and event traces • Application monitoring and visualization in the Grid • No lost of trace data at program abortion. The execution can be visualised to the point of abortion.

  15. PROVE Statistics Windows • Profiling based on counters • Analysis of very long running programs is enabled

  16. PROVE: Visualization of Event Traces • User controlled focus on processors, processes and messages • Scrolling visualization windows forward and backwards

  17. Integration of Macrostep Debugging and PROVE

  18. Features of P-GRADE • Designed for non-specialist programmers • Enables fast reengineering of sequential programs for parallel computers and Grid systems • Unified graphical support in program design, debugging and performance analysis • Portabilityon • supercomputers • heterogeneous clusters • components of the Grid • Two execution modes: • Interactive mode • Job mode

  19. Design/Edit Development cycle Visualize Compile Monitor Map Debug Typical usage on supercomputers or clusters P-GRADE Interactive Mode P-GRADE Interactive mode

  20. Design/Edit Attach P-GRADE and Condor Compile Detach CondorMap Submit job P-GRADE Job Mode with Condor Typical usage on clusters or in the Grid

  21. Condor flocking Condor 2100 2100 2100 2100 Condor 2100 2100 2100 2100 P-GRADE P-GRADE P-GRADE Mainframes Clusters Grid Condor/P-GRADE on the whole range of parallel and distributed systems GFlops Super-computers

  22. P-GRADE program runsat the Madisoncluster P-GRADE program runs at the Budapest cluster P-GRADE program runsat the Westminstercluster Berlin CCGrid Grid Demo workshop: Flocking of P-GRADEprograms by Condor P-GRADE Budapest n0 n1 m0 m1 Budapest Madison p0 p1 Westminster

  23. P-GRADE program runsat the Londoncluster P-GRADE program downloaded to London as a Condor job 1 3 P-GRADE program runs at theBudapestcluster 4 2 London clusteroverloaded=> check-pointing P-GRADE program migrates to Budapest as a Condor job Next step: Check-pointing and migration of P-GRADEprograms Wisconsin P-GRADE GUI Budapest London n0 n1 m0 m1

  24. Further development: TotalGrid • TotalGrid is a total Grid solution that integrates the different software layers of a Grid (see next slide) and provides for companies and universities • exploitation of free cycles of desktop machines in a Grid environment after the working/labor hours • achieving supercomputer capacity using the actual desktops of the institution without further investments • Development and test of Grid programs

  25. Layers of TotalGrid P-GRADE PERL-GRID Condor or SGE PVM or MPI Internet Ethernet

  26. PERL-GRID • A thin layer for • Grid level job management between P-GRADE and various local job managers like • Condor • SGE, etc. • file staging • Application in the Hungarian Cluster Grid

  27. Hungarian Cluster Grid Initiative • Goal: To connect the 99 new clusters of the Hungarian higher education institutions into a Grid • Each cluster contains 20 PCs and a network server PC. • Day-time: the components of the clusters are used for education • At night: all the clusters are connected to the Hungarian Grid by the Hungarian Academic network (2.5 Gbit/sec) • Total Grid capacity by the end of 2003: 2079 PCs • Current status: • About 400 PCs are already connected at 8 universities • Condor-based Grid system • VPN (Virtual Private Network) • Open Grid: other clusters can join at any time

  28. Structure of the Hungarian Cluster Grid Condor => TotalGrid 2003: 99*21 PC Linux clusters, total 2079 PCs Condor => TotalGrid 2.5 Gb/s Internet Condor => TotalGrid

  29. Live demonstration of TotalGrid • MEANDER Nowcast Program Package: • Goal: Ultra-short forecasting (30 mins) of dangerous weather situations (storms, fog, etc.) • Method: Analysis of all the available meteorology information for producing parameters on a regular mesh (10km->1km) • Collaborative partners: • OMSZ (Hungarian Meteorology Service) • MTA SZTAKI

  30. Structure of MEANDER First guess data ALADIN SYNOP data Satelite Radar Lightning CANARI Delta analysis decode Basic fields: pressure, temperature, humidity, wind. Radar to grid Rainfall state Derived fields: Type of clouds, visibility, etc. Satelite to grid Visibility Overcast GRID Type of clouds Current time Visualization For users: GIF For meteorologists:HAWK

  31. P-GRADE version of MEANDER 25 x 10 x 25 x 5 x

  32. Implementation of the Delta method in P-GRADE

  33. Live demo of MEANDER based on TotalGrid P-GRADE PERL-GRID 11/5 Mbit Dedicated job ftp.met.hu HAWK netCDF 34 Mbit Shared netCDF output job netCDF input netCDF output 512 kbit Shared PERL-GRID CONDOR-PVM Parallel execution

  34. Results of the delta method • Temperature fields at 850 hPa pressure • Wind speed and direction on the 3D mesh of the MEANDER system

  35. On-line Performance Visualization in TotalGrid P-GRADE PERL-GRID 11/5 Mbit Dedicated job ftp.met.hu netCDF 34 Mbit Shared job netCDF input GRM TRACE GRM TRACE 512 kbit Shared PERL-GRID CONDOR-PVM Parallel execution and GRM

  36. PROVE performance visualisation

  37. P-GRADE: Software Development and Execution Edit, debugging Performance-analysis Execution Grid

  38. Applications in P-GRADE Completed applications • Meteorology: Nowcast package (Hungarian Meteorology Service) • Urban traffic simulation (Univ. of Westminster) Applications under development • Chemistry applications • Smog forecast system • Analysis of smog alarm strategies

  39. Further extensions of P-GRADE • Automatic check-pointing of parallel applications inside a cluster (already prototyped) • Dynamic load-balancing at • Fault-tolerant execution mechanism • Automatic check-pointing of parallel applications in the Grid (under development) • Automatic application migration in the Grid • Fault-tolerant execution mechanism in the Grid • Saving unfinished parallel jobs of the Cluster Grid • Extensions under design • Parameter study support • Connecting P-GRADE with GAT • Workflow layer for complex Grid applications

  40. 1st job 2nd job 3rd job 4th job 5th job Workflow interpretation of MEANDER

  41. Conclusions • SZTAKI participates in the largest EU Grid projects and in all the Hungarian Grid projects • Main results: • P-GRADE (SuperGrid project) • Integration of P-GRADE and Condor (SuperGrid) • demo at Berlin CCGrid • TotalGrid (Hungarian Cluster Grid) • Meteorology application in the Grid based on the P-GRADE and TotalGrid approaches • demo at the 5th EU DataGrid conference • Access of P-GRADE 8.2.2: www.lpds.sztaki.hu

  42. Thanks for your attention ? Further information: www.lpds.sztaki.hu

  43. GRM semi-on-line monitor • Semi-on-line • stores trace events in local storage (off-line) • makes it available for analysis at any time during execution for user or system request (on-line pull model); • Advantages • analyse the state (performance) of the application at any time • scalability: analyse trace data in smaller sections and delete them if they are not longer needed • Less overhead/intrusion to the execution system than with on-line collection (see NetLogger) • Less network traffic: pull modelinstead of push model. Collections initiated only from top.

  44. GRM/PROVE in the DataGrid project • Basic tasks: • step 1: To create a GRM/PROVE version that is independent from P-GRADE and runable in the Grid • step 2: To connect the GRM monitor to the R-GMA information system

  45. GRM in the grid Submit machine Main MonitorMM Trace file PROVE Pull model => smaller network traffic than in NetLogger Site 1 Local monitor => more scalable than NetLogger Site 2 PC 1 PC 2 PC 1 Local MonitorLM Local MonitorLM Local MonitorLM shm shm shm App. Process App. Process App. Process App. Process

  46. Server host MM Host:port host 3 GUI Main Monitor host 2 host 4 Submit host Site Start-up of Local Monitors This mechanism is used in TotalGrid and in the live demo Grid broker Local job manager LAN WAN application process2 application process1 Local Monitor Local Monitor

  47. Client machine Main Monitor PROVE 2nd step: Integration with R-GMA R-GMA Site Machine 2 Machine 1 App.Process App.Process App.Process

  48. Application Main Monitor Consumer Servlet Consumer API Registry API Registry Servlet Schema API Producer API Registry API Schema Servlet Instrumented application code Sensor ProducerServlet “database of event types” Integration with R-GMA R-GMA XML SQL SELECT SQL CREATE TABLE SQL INSERT

More Related