1 / 39

Developing Computational Grid Portals Based on the GridPort Toolkit

Developing Computational Grid Portals Based on the GridPort Toolkit. Mary Thomas Texas Advanced Computing Center (TACC) (mthomas@tacc.utexas.edu) Presented at the 2002 NPACI Parallel Computing Institute, August 19-23, 2002, San Diego, CA. Outline. What is the Grid and why do we need it?

moswen
Télécharger la présentation

Developing Computational Grid Portals Based on the GridPort Toolkit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing Computational Grid Portals Based on the GridPort Toolkit Mary Thomas Texas Advanced Computing Center (TACC) (mthomas@tacc.utexas.edu) Presented at the 2002 NPACI Parallel Computing Institute, August 19-23, 2002, San Diego, CA

  2. Outline • What is the Grid and why do we need it? • What are computing portals? • The GridPort Toolkit • Programming Example • References

  3. HPC/Computational Science Environment is Complex • Environment is rich, varied, and changes: • Users have access to a variety of distributed resources (compute, storage, etc.), lifetimes ~5 years. • Interfaces, OS’s, compiler and debugging tools to these resources vary and change often. • Policies at sites differ, allocations change. • Using multiple resources can be cumbersome • Code compiling, tuning, optimization is difficult: • Users are reluctant to port code, and who can blame them • Need for simplification: • most research scientists are not computer scientists.

  4. http://www.npaci.edu

  5. Defining Grid Computing • “Grid computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation.” • Foster, Kesselman, et. al. “Anatomy of the Grid” (www.globus.org) • “Grids are persistent environments that enable software applications to integrate instruments, displays, computational and information resources that are managed by diverse organizations in widespread locations.”

  6. George E. Brown, Jr. Network for Earthquake Engineering Simulation (NEES) • Goals: • Shift emphasis of earthquake engineering research from physical testing to integrated experimentation, computation, theory, databases, and model-based simulation • Improve seismic design and performance of civil and mechanical infrastructure systems. • Large NSF Program (MRE): • An $81.9 million, 5-year program (proposed in 1999). • Funding (millions): $7.7 (‘00), $28.14 (‘01), $24.4 (‘02) • Includes approximately 20 major sites • Be developed by September 30, 2004 • Operational through September 30, 2014

  7. Distributed Terascale Facility http://www.neesgrid.org

  8. National Virtual Observatory forEarthquake Engineering Research http://www.neesgrid.org

  9. “Simple” Campus Grid

  10. What is a Portal? • Web sites that provide centralized access to a set of resources • Characterized by • Personalization • Security/authentication/authorization • What you see often changes based on what you are looking for (e.g.: adds) • Navigation/choices • Gateway for Web access to distributed resources/information • Hub from which users can locate all the Web content that they commonly need.

  11. Portals Provide Simple Interfaces • Portals are web based and that has advantages - • Users know & understand the web • Can serve as a layer in the middle-tier infrastructure of the Grid • Integrate various Grid services and resources • Users can be isolated from resource specific details • Single web interface isolates system changes/differences • Not and end-all solution - several issues/challenges here • Not all apps will run well on a grid, performance, scaleability

  12. Portals Integrate Grid + Web

  13. The Grid Portal (GridPort) Toolkit(http://gridport.npaci.edu) • Based on architecture developed for NPACI HotPage • Focus on computational scientists and application developers • Support application level, customized science portals development • Facilitate seamless web-based access to distributed compute resources and grid services • Built with commodity technologies: • Sits on top of the middle-tier of the Grid • An interface to these services for web • Set of simple, modular services and tools.

  14. GridPort Toolkit Requirements • Create a reusable toolkit that: • reduces complexity and is easy to install, port, adapt • integrates existing grid research & development efforts • Minimizes burden on remote resources/admins • Base software design on common infrastructure • use commodity WWW technologies wherever possible • Use Grid services and applications where available • Innovate solutions for the remainder • Provide security at all layers: • SSL, HTTPS, GSI certificate (PKI) • Any site should be able to host a user portal • Any user should be able to create their own user portal if they have accounts and certificate

  15. Web Technologies Employed • Client Requirements: • Any Browser: Communicator, IE > 4.0 • Support simple javascript, HTTPS, SSL • Server Requirements: • HTTP, HTTPS, SSL, Perl/CGI, SSH, FTP, Grid software • Netscape or Apache servers, Unix, Linux port (sumer 02) • Based on simple technology, this software is easily ported to, and used by other sites. • easy to modify and adapt to local site policies and requirements • Goal is to design a toolkit that is simple to implement, support, port, and develop

  16. Grid Technologies Employed • Security: • Globus Grid Security Infrastructure (GSI), SSH • MyProxy for remote proxies • Job Execution: • Globus/GRAM Gatekeeper, used to run interactive jobs and tasks on remote resources • Information: • Grid Information Services based primarily on proprietary information provider scripts and the Globus MDS 2.1 (Grid Resource Information System/GRIS) • File Management: • SDSC Storage Resource (SRB) for file collection mgmt • GridFTP

  17. GridPort Architecture

  18. GridPort Layers • Client Layer: • consumers of Grid computing portals and are typically Web browsers, PDAs, or even applications • Portals Layer: • Application portals run on standard Web servers and process the client requests and the responses. • Portal Services Layer: • for application portals including managing session state, portal accounts and file collections, and monitoring • Grid Services (Technologies) Layer: • software components and services needed to handle requests of the portal services. • Resources Layer: • Compute, archival, network, other

  19. login/logout to grid services single sign-on jobs: Submit/cancel jobs to queues monitor jobs and track them web-based batch script builders command execution any UNIX commands files: dir listing, file transfer/archival file upload & download SRB integration, default collections for users gridFTP accounts: Personalization Webnewu, reslist Current Portal Services

  20. GridPort “Interactive’ Services

  21. GridPort + SRB Architecture

  22. GridPort Multi-Application Arch.

  23. Variety of GridPort Applications • NPACI/PACI HotPages (also @PACI/NCSA ) • https://hotpage.npaci.edu • LAPK Portal: Pharmacokinetic Modeling (live demo of Pharmacokinetic Modeling Portal) • https://gridport.npaci.edu/LAPK • NBCR Portals: • GAMESS (General Atomic and Molecular electronic Structure System):https://gridport.npaci.edu/GAMESS • AMBER: http://gridport.npaci.edu/Amber • Telescience (Ellisman) • https://gridport.npaci.edu/Telescience • Protein Data Bank CE Portal (Phil Bourne) • https://gridport.npaci.edu/CE

  24. Informational Services • Vertical portal to NPACI Resources and Services: • News/events, documentation, training , news, consulting • Simple tools: • application search systems information • generation of batch scripts for all compute resources • Provides dynamic information for each resource: • Status Bar: live updates/operational status/utilization • Machine Usage: summary of machine status, load, queues • Queues Summaries: displays currently executing and queued jobs • Node Maps: graphical map of running applications mapped to nodes • Network Weathering System: connectivity information between a user’s local host and grid resources • Pulled from 3 possible sources: • MDS, web services, local cron jobs

  25. HotPage View: Job Submission

  26. Laboratory for Applied Pharmacokinetics • (LAPK) Portal: • Users are Doctors, so need extremely simple interface • Must be portable – run from many countries • Need to hide details such as • Type of resources (T3E), file storage, batch script details, compilation,UNIX command line • Major Success: • LAPK users can now run multiple jobs at one time using portal. • Not possible before because developers had to keep codes & scripts simple enough for doctors to use on T3E

  27. Laboratory for Applied Pharmacokinetics • Uses gridport.npaci.edu portal services/capabilities: • File upload/download between local host/portal/HPC systems • Job Submit: • submission (builds batch script, moves files to resource, submit jobs) • Job tracking: in the background portal tracks jobs on system and moves results back over to portal storage when done • Job cancel/delete • Job History: maintains relevant job information

  28. LAPK Job Submit and Job History

  29. Programming Example: Job Submit • Client: • Example of Client HTML page • HTML Code • Server: • Perl/CGI parser script running on server • GridPort Toolkit function code

  30. HotPage View: Job Submission

  31. JobSubmit Web Page

  32. JobSumbit HTML Code <FORM action="https://hotpage.npaci.edu/tools/cgi-bin/job_submit.cgi" method=post enctype="application/x-www-form-urlencoded" name="job_submit"> Arguments: <INPUT TYPE="text" NAME="args"> Select Queue: <SELECT NAME="queue"> <OPTION VALUE="low">low <OPTION VALUE="normal">normal <OPTION VALUE="high">high <OPTION VALUE="express">express </SELECT> Number of Cpu’s: <INPUT TYPE="text" NAME="cpus"> Max Time (min): <INPUT TYPE="text" NAME="max_time"> <INPUT TYPE="hidden" NAME="mach" VALUE="SSPHN"> <INPUT TYPE="hidden" NAME="exe" VALUE="/rmount/paci/sdsc/mthomas/mpi_pi"> <INPUT TYPE="submit" METHOD="post" ACTION="https://hotpage.npaci.edu/tools/cgi-bin/job_submit.cgi" > </FORM>

  33. JobSumbit: Server Perl/CGI Parser • GRABS HTTP/CGI data and sends it to GridPort subroutine, waits for results #!/usr/local/bin/perl use CGI qw(:all); my $query = new CGI; $|=1; BEGIN{ ###GET THE SCRIPTS LOCATION AND THE GLOBAL VARS### $MY_LOCATION = "tools/cgi-bin"; $CURRENT_DIR = `pwd`; ($PORTAL_ROOT, $rest) = split(/$MY_LOCATION/, $CURRENT_DIR); $GLOBAL_VARS_CONFIG = $PORTAL_ROOT . "cgi-bin/global_vars.cgi"; require "$GLOBAL_VARS_CONFIG"; require "$PORTAL_HOME_DIR/cgi-bin/hotpage_authen.cgi"; }

  34. JobSubmit: Server Perl/CGI code (cont.) # load in code to do job submission through globus require "$GRIDPORT_HOME_DIR/services/globus/cgi-bin/gridport_globus_job.cgi"; # subroutines to get/set user directories (home,work, current) and do job handling require "$PORTAL_HOME_DIR/tools/cgi-bin/user_dirs.cgi"; require "$PORTAL_HOME_DIR/tools/cgi-bin/user_jobs.cgi"; my $args = $query->param(args); my $queue = $query->param(queue); my $cpus = $query->param(cpus); my $max_time = $query->param(max_time); $mach = $query->param(mach); my $exe = $query->param(exe); $exe = $exe . " $args"; # run the command through Globus, trap output, return to caller process @output = gridport_globus_job_submit($mach,$cpus,60,$exe,$max_time,$queue);

  35. gridport_globus_job_submit sub gridport_globus_job_submit { my @job = (); my $user = &get_username(); ### get the input and set up globus my ($mach, $cpus, $timeout, $exe, $max_cpu_time, $queue) = @_; &globus_config($user); # verify data &mach_config($mach); # verify data #build the globus command my $globus_submit = "$globus_job_submit{$machines{$mach}{gv}} "; $globus_submit .= "$machines{$mach}{name}{job} -np $cpus -queue $queue "; $globus_submit .= "-maxtime $max_cpu_time $exe"; @job = run_command_timeout($globus_submit, $timeout); # run job return @job; }

  36. Related Portal Toolkits • Commercial: • Sun: Java Servlets, iPlanet • IBM: WebSphere • MSFT: .NET • Special interest groups: • uPortal, Javaspeed • R&D within Grid community: • GridPort Toolkit (http://gridport.npaci.edu, Perl) • Grid Portal Development Kit (GPDK, Java) • Gateway (Java, Goeffrey Fox) • CCA (Gannon) • Java CoG (http://www.globus.org/cog, G. von Laszewski) • PyGlobus (Keith Jackson, LBL)

  37. GridPort Team • GridPort Project represents collaboration efforts spanning TACC, SDSC, NPACI: • Mary Thomas, Jay Boisseau, Rich Toscano, Shyamal Mitra (TACC) • Steve Mock, Maytal Dahan, Cathie Mills, Kurt Mueller (SDSC) • And input from other Institutions: • Universities: Dr. Jim Browne, UT/CS, Dr’s Dennis Gannon, Geoffrey Fox, and Marlon Pierce (Univ. of Indiana) • Argonne/ISI: Globus development team • NCSA/Alliance: Randy Butler, Doru Marcusiou, others • NASA/IPG, DoE SciDAC funding • GGF/GCE, Interoperable Web Services Testbed

  38. References • NPCI 2002 Tutorial: • http://gridport.npaci.edu/npci2002 • This tutorial, plus install and programming tutorials • GridPort Toolkit: • Contact: Mary Thomas (mthomas@tacc.utexas.edu) • Project website: https://gridport.npaci.edu • HotPage User Portals • https://hotpage.npaci.edu, https://hotpage.paci.org • Downloads • http://gridport.npaci.edu/download • GridPort Toolkit, NPACI HotPage

More Related