1 / 16

High Throughput Urgent Computing

Condor Week 2008. High Throughput Urgent Computing. Jason Cope jason.cope@colorado.edu. Project Collaborators. Argonne National Laboratory / University of Chicago Pete Beckman Suman Nadella Nick Trebon University of Wisconsin-Madison Ian Alderman Miron Livny. Urgent Computing Use Cases.

abra
Télécharger la présentation

High Throughput Urgent Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Condor Week 2008 High Throughput Urgent Computing Jason Cope jason.cope@colorado.edu

  2. Project Collaborators • Argonne National Laboratory / University of Chicago • Pete Beckman • Suman Nadella • Nick Trebon • University of Wisconsin-Madison • Ian Alderman • Miron Livny

  3. Urgent Computing Use Cases

  4. High Throughput Urgent Computing • Urgent computing provides immediate, cohesive access to computing resources for emergency computations • Support for urgent high throughput computing environments is necessary • Support for high throughput emergency computing applications • Urgent cycle scavenging

  5. Resources for Urgent Computing Environments

  6. SPRUCE • Special PRiority Urgent Computing Environment (SPRUCE) • TeraGrid Science Gateway • http://spruce.teragrid.org • GOAL: Provide cohesive urgent computing infrastructure for emergency computations • Authorization • Resource Selection • Resource Allocation

  7. Event Automated Trigger 2 First Responder SPRUCE Gateway / Web Services 1 Right-of-Way Token Human Trigger Right-of-Way Token SPRUCE Architecture Overview ( 1 / 2 ) Source: Pete Beckman, ‘SPRUCE: An Infrastructure for Urgent Computing’

  8. SPRUCE Architecture Overview ( 2 / 2 ) User Team Authentication 4 ? Urgent Computing Job Submission Conventional Job Submission Parameters Priority Job Queue Choose a Resource SPRUCE Job Manager 3 ! 5 Local Site Policies Urgent Computing Parameters Supercomputer Resource Source: Pete Beckman, ‘SPRUCE: An Infrastructure for Urgent Computing’

  9. SPRUCE Resources • Deployed on TeraGrid resources at IU, NCSA, NCAR, Purdue, TACC, SDSC, UC/ANL • Supported Resource Managers • PBS • PBS Pro • LSF • SGE • LoadLeveler • Cobalt • Local and Grid resource managers supported

  10. SPRUCE and Condor User Team Authentication ? Urgent Computing Job Submission Conventional Job Submission Parameters Choose a Resource SPRUCE Job Manager 3 ! 4 Local Site Policies Urgent Computing Parameters Condor Pool Adapted from Pete Beckman, ‘SPRUCE: An Infrastructure for Urgent Computing’

  11. SPRUCE / Condor Integration • Added support for urgent computing ClassAds • SPRUCE_URGENCY • SPRUCE_TOKEN_VALID • SPRUCE_TOKEN_VALID_CHECK_TIME • Modifications to the Condor schedd that support identifying SPRUCE jobs • SPRUCE Grid ASCII Helper Protocol (GAHP) Server • Asynchronously invoke SPRUCE Web service operations • GAHP calls integrated into the Condor schedd

  12. SPRUCE / Condor Integration

  13. SPRUCE / Condor Integration • SPRUCE provides an authorization mechanism for access to Condor resources • “Right-of-Way” access to Condor resources • Same authorization infrastructure for supercomputer and Grid resource access • Leverage existing Condor features to enhance scheduling policies • Job ranking / suspension / preemption • Site administrators define local scheduling policies

  14. SPRUCE / Condor Status • Prototype complete August, 2007 • Demonstrated urgent authorization and scheduling capabilities • Deployed and tested on equipment at the University of Colorado • Currently revising the prototype for a stable software release • Condor 7.0 support • Final software development iteration before official release • Evaluation of SPRUCE-related software integrated into larger Condor pools

  15. Future Work • High throughput support for urgent computing applications • SURA SCOOP CH3D Grid Appliance • Many additional evaluation tasks • Application requirements • Security • Deadline scheduling / response time • Reliability / fault tolerance analysis • Data management

  16. High Throughput Urgent Computing Questions? jason.cope@colorado.edu http://spruce.teragrid.org

More Related