1 / 38

The Problem Solving Environments of TeraGrid, Science Gateways, and the Intersection of the Two

The Problem Solving Environments of TeraGrid, Science Gateways, and the Intersection of the Two. Jim Basney 1 , Stuart Martin 2 , JP Navarro 2 , Marlon Pierce 3 , Tom Scavo 1 , Leif Strand 4 , Tom Uram 2,5 , Nancy Wilkins-Diehr 6 , Wenjun Wu 2 , Choonhan Youn 6

barryp
Télécharger la présentation

The Problem Solving Environments of TeraGrid, Science Gateways, and the Intersection of the Two

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Problem Solving Environments of TeraGrid, Science Gateways, and the Intersection of the Two Jim Basney1, Stuart Martin2, JP Navarro2, Marlon Pierce3, Tom Scavo1, Leif Strand4, Tom Uram2,5, Nancy Wilkins-Diehr6, Wenjun Wu2, Choonhan Youn6 1National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign 2Argonne National Laboratory 3Indiana University 4California Institute of Technology 5University of Chicago 6San Diego Supercomputer Center, University of California at San Diego

  2. TeraGrid, what is it?A unique combination of fundamental CI components Dedicated high-speed, cross—country network Staff & Advanced Support 20 Petabytes Storage 2 PetaFLOPS Computation Visualization Navajo Technical College, September 25, 2008

  3. Gateways, what are they?Problem Solving Environments for Science • Portal or client-server interfaces to high end resources • Web developments, explosion of digital data lead to the increased importance of the internet and the web for science • Only 16 years since the availability of web browsers • Developments in web technology • From static html to cgi forms to the wikis and social web pages of today • Full impact on science yet to be felt • Web usage model resonates with scientists • But, need persistency if the Web is to have a profound impact on science (this is key for all PSEs) • TeraGrid provides common infrastructure for gateway developers Navajo Technical College, September 25, 2008

  4. TeraGrid’s Infrastructure for Gateways • Problem • Local compute resources are typically not enough for Gateways • Goal • Make it easy to use any TeraGrid site from a Gateway • Approach • Provide a set of client APIs and command line tools for use in Gateways/portals • Maintain and deploy a set of common services on each site • Maintain and deploy some central services

  5. Infrastructure Capabilities • Information Discovery • Find deployed services • Get details about the compute resources • Data Management • Move data to and from compute resources • Execution Management • Submit and monitor remote computational jobs • Security • Make sure secure access is in place with all services and tools

  6. Security • Based on Grid Security Infrastructure (GSI) • Uses X509 PKI • End entity certificates (e.g. issued to a person or host) • User proxy certificates (valid for a limited period of time) • Enables single sign-on to all TG resources • Enables delegation • Users/clients can disconnect and let services perform actions securely on their behalf • Integrated in grid middleware services • User Portal, MyProxy, GSISSH, GridFTP, GRAM, MDS, RFT, etc

  7. GSI in Action GT4 Server GT4 Client Java WS Container Globus Web Service Globus WS Client X.509 proxy certificate grid-proxy-init proxy credential Key Gridmap end entity credential Key

  8. Single Sign-On

  9. Gateway Workflow with GSISSH gatewayJobs • Client does: • myproxy-logon (once) • Move files with gsiscp • Submit job with gsissh and lrm commands PBS LSF GSISSH Local Jobs Local Jobs GSISSH Service GSISSH Service Scheduler (e.g., PBS) Scheduler (e.g., LSF) Compute Nodes Compute Nodes Resource A Resource B

  10. Remote Execution Management • Grid Resource Allocation and Management (GRAM) • Provide an abstraction layer on top of various local resource managers (PBS, Condor, LSF, SGE, …) • Defines a common job description language • Client API and command line tools to asynchronously access remote LRMs • Fault tolerant • GSI Security • “job” Workflow • File staging before and after job execution • Lastly, File cleanup • File staging requires delegation

  11. Traditional LRM Interaction • Satisfies many users and use cases • TACC’s Ranger (62976 cores!) is the Costco of HTC ;-), one stop shopping, why do we need more? Local Jobs Scheduler (e.g., PBS) Compute Nodes Resource A

  12. GRAM Benefit remoteGRAM4Jobs • Adds remote execution capability • Enable clients/devices to manage jobs from off of the cluster (Gateways!) gramJob API Local Jobs GRAM4 Service Scheduler (e.g., PBS) Compute Nodes Resource A

  13. GRAM Benefit GRAM4Jobs • Provides scheduler abstraction gramJob API Local Jobs Local Jobs GRAM4 Service GRAM4 Service Scheduler (e.g., PBS) Scheduler (e.g., LSF) Compute Nodes Compute Nodes Resource A Resource B

  14. Gateway Perspective GRAM4 GRAM4 GRAM4 GRAM4 GRAM4 GRAM4 GRAM4 GRAM4 GRAM4 GRAM4 GRAM4 GRAM4 GRAM4 GRAM4 GRAM4 GRAM4 GRAM4 GRAM4 Sched Sched Sched Sched Sched Sched Sched Sched Sched Sched Sched Sched Sched Sched Sched Sched Sched Sched Compute Nodes Compute Nodes Compute Nodes Compute Nodes Compute Nodes Compute Nodes Compute Nodes Compute Nodes Compute Nodes Compute Nodes Compute Nodes Compute Nodes Compute Nodes Compute Nodes Compute Nodes Compute Nodes Compute Nodes Compute Nodes GRAM4 jobs • Scalable jobmanagement • Interoperability gramJob API

  15. Data Management - GridFTP • GridFTP • High-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-area • GSI Security • Third-party transfers • Parallel Transfers • Striping • Lots of small files (LOSF) • Can outperform other file transfer methods like scp • Limited in that it does not queue and throttle requests • Needs a reliable higher-level service, hence RFT

  16. Data Management - RFT • Reliable File Transfer • Adds reliability on top of GridFTP • GSI Security • Throttles requests • Retries non-fatal transfer errors • Resumes transfers from the last known position • Requires delegation in order to contact GridFTP servers on user’s behalf

  17. Science Gateway with Community Credential Web Browser WebAuthn Web Interface Java WS Container Webapp WS GRAM Client WS GRAM Service proxy certificate communitycredential proxy credential community account Key Key Science Gateway Resource Provider

  18. GridShib-enabled GSI GT4 Client GT4 Server Java WS Container (with GridShib for GT) Globus WS Client GridShibSAML PIP Globus Web Service proxy certificate SAML GridShib SAML Tools proxy credential Security Context SAML Key end entity credential Logs Policy Key

  19. Web Browser WebAuthn Web Interface Java WS Container (with GridShib for GT) attributes Webapp WS GRAM Client GridShibSAML PIP WS GRAM Service proxy certificate SAML username GridShib SAML Tools Security Context proxy credential SAML Key communitycredential Logs Policy Key Science Gateway Resource Provider GridShib-enabled Science Gateway

  20. Information Management • TeraGrid’s Integrated Information Services are a network of web services responsible for aggregating the availability of TeraGrid capability kits, software, and services across all the infrastructure providers • Where are the job submission, file-transfer, and login services needed by Gateways? • What is the queue status and estimated delay for each resource? • What are the available testbeds (non-production / experimental software)? • What are the Gateways (problem solving environments) available to users?

  21. High-Level Components TeraGrid Wide Information Services Apache 2.0 WS/RESTHTTP GET Clients Cache TomcatWebMDS TeraGridWide Information WS/SOAP Clients WS MDS4 Service ProviderInformation Services WS/SOAP Clients ServiceProvider Information WS MDS4

  22. High-Availability Design TeraGrid WideInformation Services Clients info.teragrid.org Service Provider Information Services info.dyn.teragrid.org TeraGrid Dynamic DNS Static paths Server failover propagates globally in 15 minutes … Dynamic paths

  23. Today, there are approximately 29 gateways using the TeraGrid NSF Program Officers, September 10, 2008

  24. Selected Highlights from the PSE08 paper • The Social Informatics Data (SID) Grid • The Geosciences Network (GEON) • QuakeSim • Computational Infrastructure for Geodynamics (CIG) • Conclusions

  25. Social Informatics Data Grid • Heavy use of “multimodal” data. • Subject might be viewing a video, while a researcher collects heart rate and eye movement data. • Events must be synchronized for analysis, large datasets result • Extensive analysis capabilities are not something that each researcher should have to create for themselves. http://www.ci.uchicago.edu/research/files/sidgrid.mov NSF Program Officers, September 10, 2008

  26. How does SIDGrid use the TeraGrid? • Computationally intensive tasks • Speech, gesture, facial expression, and physiological measurements • Media transcoding for pitch analysis of audio tracks • Once stored in raw form, data streams converted to formats compatible with software for annotation, coding, integration, analysis • fMRI image analysis • Workflows for massive job submissions and data transfers using Virtual Data System (VDS) • Worflows converted to concrete execution plan via Pegasus Grid planner • TeraGrid information service (MDS) • Replica location service (RLS) • DAGMAN and Condor-G/GRAM

  27. The goal of GEON is • to advance the field of geoinformatics and • to prepare and train current and future generations of geoscience researchers, educators, and practitioners in the use of cyberinfrastructure to further their research, education, and professional goals. • GEON is providing several key features • data access, computational simulations, personal work spaces and analyses environments • identifying best practices with the objective of dramatically advancing geoscience research and education.

  28. How does GEON use the TeraGrid? • Computationally intensive tasks • Ability to speedily construct earth models, access observed earthquake recordings and simulate them to understand the subsurface structure and characteristics of seismic wave propagation in an efficient manner • SYNSEIS (SYNtheticSEISmogram generation tool), provides access to seismic waveform data and simulate seismic records using 2D and 3D models. • Conduct advanced calculations for simulating seismic waveforms of either earthquakes or explosions at regional distances (< 1000 km). • GSI (security), GAMA (account management), GridFTP (data transfer), GRAM (job submission), MyWorkspace (job monitoring) • Account management for classroom use, MyProjects collaboration tool and tagging also serve students

  29. QuakeSim - Some Design Choices • Build portals out of portlets (Java Standard) • Reuse capabilities from our Open Grid Computing Environments (OGCE) project, the REASoN GPS Explorer project, and many TeraGrid Science Gateways. • Decorate with Google Maps, Yahoo UI gadgets, etc. • Use Java Server Faces to build individual component portlets. • Build standalone tools, then convert to portlets at the very end. • Use simple Web Services for accessing codes and data. • Keep It Stateless … • Use Condor-G and Globus job and file management services for interacting with high performance computers. • TeraGrid • Favor Google Maps and Google Earth for their simplicity, interactivity and open APIs. • Generate KML and GeoRSS • Use Apache Maven based build and compile system, SVN on SourceForge

  30. Browser Interface HTTP(S) Portlets + Client Stubs SOAP/HTTP WSDL WSDL WSDL WSDL WSDL WSDL WSDL WSDL DB Service Job Sub/Mon And File Services Visualization Or Map Service JDBC DB DB Operating and Queuing Systems Host 1 (Quaketables) Host 2 (Grid) Host 3 (G Maps)

  31. Two Approaches to the Middle Tier Fat Client Thin Client Portal Comp. Portal Comp. Grid Client HTTP + SOAP Web Service Grid Protocol (SOAP) Grid Client Grid Protocol (SOAP) Grid Service Grid Service Backend Resource Backend Resource

  32. Daily RDAHMM Updates Daily analysis and event classification of GPS data from REASoN’s GRWS.

  33. Disloc output converted to KML and plotted.

  34. GeoFEST Finite Element Modeling portlet and plotting tools

  35. Desktop Users, Web Portal and Gateway style application Standard Web Service Interface Request Manager QBET Web Service Resource Ranking Manager DataModel Manager Fault Manager Hosted by UCSB User A’s Job Board User A’s Job Queue RDMBS User A’s Resource Pool Job Distributor Tokens for resource X,Y,Z MyProxy Server Hosted by TeraGrid Project Job Execution Manager Condor G with Birdbath High Performance Computing Clusters: Grid style clusters and condor computing nodes “SWARM: Scheduling Large-scale Jobs over the Loosely-Coupled HPC Clusters,” S. L. Pallickara and M. E. Pierce, Friday, December 12, 2 p.m. to 2:30 p.m. http://escience2008.iu.edu/sessions/SWARM.shtml

  36. Membership-governed organization • 40 institutional member, 9 foreign affiliates • Supports and promotes Earth science by developing and maintaining software for computational geophysics NSF Program Officers, September 10, 2008

  37. How does CIG use the TeraGrid? • Seismograms allow scientists to understand the ground motion • Computationally-intensive simulations run on TeraGrid using an assortment of 3D and 1D earth models produce synthetic seismograms • Necessary input datasets provided via the portal • Daemon (Python, Pyre) constantly polls the web site looking for work to do • GSI-OpenSSH and MyProxy credentials to submit jobs, monitors jobs, transfers output back to portal • status updates to the web site using HTTP POST • Users can download results in ASCII and Seismic Analysis Code (SAC) format • Visualizations include "beachball" graphics depicting the earthquake's source mechanism, and maps showing the locations of the earthquake and the seismic stations using GMT (http://gmt.soest.hawaii.edu/) • Researchers quickly receive results and can concentrate on the scientific aspects of the output rather than on the details of running the analysis on a supercomputer • Future Directions • Parameter explorations • Custom earth models for users

  38. Conclusions • Technical requirements of some PSEs dictate seamless access to high-end compute and data resources • A robust, flexible and scalable infrastructure can provide a foundation for many PSEs • PSEs themselves must be treated as sustainable infrastructure • Researchers will not truly rely on PSEs for their work unless they have confidence that the PSE will remain operational for the long term and provide reliable services

More Related