1 / 30

CrossGrid After the First Year: A Technical Overview

CrossGrid After the First Year: A Technical Overview. Marian Bubak , Maciej Malawski, and Katarzyna Zaj ą c X# TAT Institute of Computer Science & ACC CYFRONET AGH, Kraków, Poland www.eu-crossgrid.org. Main Objectives. A n ew category of Grid - enabled applications

davis
Télécharger la présentation

CrossGrid After the First Year: A Technical Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CrossGrid After the First Year: A Technical Overview Marian Bubak, Maciej Malawski, and Katarzyna Zając X# TAT Institute of Computer Science & ACC CYFRONET AGH, Kraków, Poland www.eu-crossgrid.org

  2. Main Objectives • A new category of Grid-enabled applications • Compute- and data-intensive • distributed • near real-time response (person in a loop) • layered • New programming tools • Grid more user-friendly, secure and efficient • Interoperability with other Grids • Implementation of standards

  3. CrossGrid in a Nutshell Interactive, Compute and Data Intensive Applications • Interactive simulation and visualization of a biomedical system • Flooding crisis team support • Distributed data analysis in HEP • Weather forecasting and air pollutionmodeling Tool Environment • MPI code debugging and verification • Metrics and benchmarks • Interactive and semiautomatic performance evaluation tools Application Specific Services • User Interactive Services • Grid Visualization Kernel New Generic Grid Services DataGrid • Portals and roaming access • Scheduling agents • Application and Grid monitoring • Optimization of data access Services Globus Middleware Fabric

  4. Key Features of CG Applications • Data • Data generators and databases geographically distributed • Selected on demand • Processing • Interactive • Requires large processing capacity; both HPC &HTC • Presentation • Complex data requires versatile 3D visualisation • Support interaction and feedback to other components

  5. Biomedical Application • Adding small modifications to the proposed structure results in immediate changes in the blood flow. • Online presentation of simulation results via a 3D environment. • The progress of the simulation and the estimated time of convergence should be available for inspection. LB flow simulation Visualization VE WD Interaction PC PDA

  6. Data sources Meteorological simulations Hydrological simulations Users Hydraulic simulations Output visualization Basic Characteristics of Flood Simulation • Meteorological • Intensive simulation (HPC), large input/output data sets, high availability of resources • Hydrological • Parametric simulations (HTC) may requiredifferent models (heterogeneous simulations) • Hydraulic • Many 1-D simulations HTC, 2-D hydraulic simulations require HPC

  7. Interactive Replica Session Manager Resource Broker Interactive Interactive Session Interactive Session Interactive Worker Session Interactive Worker Session Worker Session Worker Worker Distributed Data Analysis in HEP • Objectives • Distributed data access • Distributed data mining techniques with neural networks • Issues • Typical interactive requests will run on o(TB) of distributed data • Transfer/replication times for the whole data on the order of one hour • Data transfers once and in advance of the interactive session. • Allocation, installation and setup the corresponding database servers before the interactive session starts Portal XML in/out Interactive DB Installation Session On-line output Manager Interactive Session Database server DISTRIBUTED PROCESSING

  8. Weather Forecasting and Air Pollution Modeling • Distributed/parallel code on Grid • Coupled Ocean/Atmosphere Mesoscale Prediction System • STEM-II Air Pollution Code • Integration of distributed databases • Data mining applied to downscaling weather forecasts

  9. Initial version of X# architecture Applications 1.1 BioMed 1.2 Flooding 1.3 Interactive Distributed Data Access 1.3 Data Mining on Grid (NN) 1.4 Meteo Pollution 2.2 MPI Verification 2.3 Metrics and Benchmarks 2.4 Performance Analysis 3.1 Portal & Migrating Desktop Supporting Tools Applications Development Support MPICH-G 1.1, 1.2 HLA and others App. Spec Services 1.1 User Interaction Services 1.3 Interactive Session Services 1.1 Grid Visualisation Kernel 3.2 Scheduling Agents 3.4 Optimization of Grid Data Access 3.3 Grid Monitoring 3.1 Roaming Access DataGrid Replica Manager Globus Replica Manager Generic Services DataGrid Job Submission Service GRAM GridFTP GIS / MDS GSI Globus-IO Replica Catalog Replica Catalog Fabric Resource Manager (CE) Resource Manager (SE) Resource Manager Resource Manager 3.4 Optimization of Local Data Access CPU Secondary Storage Instruments ( Satelites, Radars) Tertiary Storage

  10. Project Phases M 4 - 12: first development phase: design, 1st prototypes, refinement of requirements M 25 - 32: third development phase: complete integration, final code versions M 33 - 36: final phase: demonstration and documentation M 1 - 3: requirements definition and merging M 13 - 24: second development phase: integration of components, 2nd prototypes

  11. Tools Benchmarks G-PM • MPI code debugging andverification • Metrics and benchmarks for the Grid environment • Grid-enabled Performance Measurement • Performance Prediction Component High Level Analysis Component Applications executing on Grid testbed RMD PMD Grid Monitoring Performance Measurement Component User Interface and Visualization Component MPI Verification MARMOT Application source code Performance Prediction Component

  12. MPI Client Side MPI Verification • verifies the correctness of parallel,distributed Grid applications (MPI) • technical basis: MPI profiling interface which allows a detailed analysis of the MPI application Application or Test Tool Additional Process (Debug Profiling Interface Server) Core Tool Server Side

  13. Benchmark Categories • Micro-benchmarks • For identifying basic performance properties of Grid services, sites, and constellations • Micro-kernels • Generic HPC/HTC kernels, including general and often-used kernels in Grid environments • Application kernels • Characteristic of representative CG applications Embedding Portal gbView Invocation Retrieval gbControl gbARC Storage/ Retrieval gbRMP Direct Invocation Invocation/ Collection through GPM SE storage Grid Benchsuite

  14. Performance Measurement Tool G-PM • Components: • performance measurement component (PMC), • component for high-level analysis (HLAC), • component for performance prediction (PPC) based on analytical performance models of application kernels, • user interface and visualization component UIVC. UIVC Interface HLAC Measurement Interface PMC OCM-G Interface OCM-G

  15. User Interactive Service Interaction GidService Visualisation GridService Simulation GridService RTIExec GridService Registry OGSA WSDL RTI Tuple Space functionality description +Dynamic discovery of OGSA Services Large On-line Data transfer Short Messages and Events GridFTP SOAP/IIOP TCP or UDP/IP • enables end users to run distributed simulations in the Grid environment and to steer those simulations in near real time • uses OGSA mechanisms to call externalresource brokers, job submission services (efficient and transparent execution of the simulation on the Grid).

  16. Simulation Init Visualization Update Visualization Grid Visualization Kernel • addresses the problems of distributed visualization on heterogeneous devices • allows easily and transparently interconnect Grid applications with existing visualisation tools (AVS, OpenDX, VTK, ...) • handles multiple concurrent input data streams • multiplexes compressed data and images efficiently across long-distance networks GVK Portal Server GVK Visualization Planner GRAM GASS MDS GVK Visualization pipeline Simulation Data

  17. New Grid Services • Portals and roaming access • Grid resource management • Grid monitoring • Optimization of data access

  18. Roaming Access – Current Design • Portal -easier access and use of the Grid by applications • Migrating Desktop -a transparent, independent user environment • Roaming Access Server - responsible for managing user profiles, job submission, file transfers and Grid monitoring Web Browser LDAP DataBase Application Portal Server Desktop Portal Server Web Browser Roaming Replica Access Server Manager Scheduling Agent Command Line Benchmarks

  19. Scheduling Agents - Current Design • scheduling user jobs over the CrossGrid testbed infrastructure, • submition based on Condor-G, • support for sequential andMPI parallel jobs, batch jobs and interactive jobs, • priorities and preferences determined by the user for each job Web Portal Resource Resource list Broker Scheduling Agent Logging Job monitoring & JSS commands Bookkeping JSS / CondorG CE CE CE

  20. Application Monitoring • OCM-G Components • Service Managers • Local Monitors • Application processes • Tool(s) • External name service • Componentdiscovery Tool OMIS ServiceManager ExternalLocalization OMIS LocalMonitor SharedMemory ApplicationProcess

  21. Infrastructure Monitoring Jiro info Jiro Infrastructure Services MDS MDS info Static info Globus Information DB Performance Information Non-invasive Post-processing Instruments Monitoring System • Infrastructure monitoring • Invasive monitoring (based on Jiro technology) • Non-invasive monitoring (Santa-G)

  22. Data Access Design • Selection of specialized components best suited for dataaccess operations • Estimation of data access latency and bandwidth inside the storage elements • Faster access to large tape-resident through fragmentation

  23. Application Portal and Tools Benchmarks Migrating Desktop User Interaction Grid Visualization Services Kernel Infrastructure Roaming Access OCM-G Scheduling DataGrid Data Management Monitoring Agent DataGrid Job Globus Management Data Access Toolkit Current status of CG Architecture Applications Supporting Tools Application Specific Services Generic Services

  24. Application-centric view

  25. The Current Testbed • The current CrossGrid testbed is based on: • EDG distribution release 1.2.2 and 1.2.3(production) • EDG distribution release 1.4.3 (validation) • The current infrastructure permits: • installation of initial prototypes of CrossGrid software releases (described in M12 Deliverables) • testing applications using: • Globus and EDG middleware • MPI • achieving compatibility with DataGrid and therefore extending Grid coverage in Europe

  26. Grid Service • Transient, stateful Web Service (created dynamically) • Described by WSDL • Identified by Grid Service Handle (GSH) in the form of URI • Can be queried for configuration and state in standard way – Service Data mechanism

  27. Why use OGSA • Standards • „to be part of the Grid = to implement OGSA Grid protocols” • Interoperability in heterogeneous environments • Possible contribution to future Grid activities

  28. Grid Services – where? • Dynamic service creation and lifetime management to control the state of some process, e.g.: • user session in a portal • data transfer • running simulation. • Service data model can be applied to monitoring systems that can be used as information providers for other services. • Service discovery – to solve the bootstrap problem: • to connect the modules of a distributed simulation • to connect the application to a monitoring system

  29. Steps towards OGSA • Using Web Service interfaces and XML where possible • Experimenting with prototyping services using OGSA alpha releases • Applying Grid Service extensions to services • Solving GT2 - GT3 transition and compatibility issues

  30. Summary • Achievements of the first project year : • Software Requirements Specifications together with use cases written • CrossGrid Architecture defined • Detailed Design documents for tools and new Grid services (OO approach, UML) written • First prototype of software running and documented • Detailed description of the test and integration procedures created • Testbed set up

More Related