1 / 53

TeraGrid National CyberInfrastructure for Scientific Research

TeraGrid National CyberInfrastructure for Scientific Research. Philip Blood Senior Scientific Specialist Pittsburgh Supercomputing Center April 23, 2010. What is the TeraGrid?. The TeraGrid (TG) is the world’s largest open scientific discovery infrastructure, providing:.

ormand
Télécharger la présentation

TeraGrid National CyberInfrastructure for Scientific Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TeraGridNational CyberInfrastructure for Scientific Research Philip Blood Senior Scientific Specialist Pittsburgh Supercomputing Center April 23, 2010

  2. What is the TeraGrid? The TeraGrid (TG) is the world’s largest open scientific discovery infrastructure, providing: • Computational resources • Data storage, access, and management • Visualization systems • Specialized gateways for scientific domains • Centralized user services, allocations, and usage tracking • All connected via high-performance networks

  3. TeraGrid Governance • 11 Resource Providers (RPs) funded under individual agreements with NSF • Mostly different: start and end dates, goals, and funding models • 1 Coordinating Body – Grid Integration Group (GIG) • University of Chicago/Argonne • Subcontracts to all RPs and six other universities • ~10 Area Directors, lead coordinated work across TG • ~18 Working groups with members from many RPs work on day-to-day issues • RATs formed to handle short-term issues • TeraGrid Forum sets policies and is responsible for the overall TeraGrid • Each RP and the GIG votes in the TG Forum Slide courtesy of Dan Katz

  4. Who Uses TeraGrid (2008)

  5. TeraGrid Objectives • DEEP Science: Enabling Petascale Science • Make science more productive through integrated set of advanced computational resources • Address key challenges prioritized by users • WIDE Impact: Empowering Communities • Bring TeraGrid capabilities to the broad science community • Partnerships with community leaders, “Science Gateways” to make access easier • OPEN Infrastructure, OPEN Partnership • Provide a coordinated, general purpose, reliable set of services • Free and open to U.S. scientific research community and their international partners

  6. Introduction to the TeraGrid • TG Portal and Documentation • Compute & Visualization Resources • More than 1.5 petaflops of computing power • Data Resources • Can obtain allocations of data storage facilities • Over 100 Scientific Data Collections made available to communities • Science Gateways • How to Apply for TG Services & Resources • User Support & Successes • Central point of contact for support of all systems • Personal User Support Contact • Advanced Support for TeraGrid Applications (ASTA) • Education and training events and resources

  7. TeraGrid User Portal • Web-based single point of contact for : • Access to your TeraGrid accounts and allocated resources • Interfaces for data management, data collections, and other user tasks and resources • Access to TeraGrid Knowledge Base, Help Desk, and online training

  8. Teragrid User Portal: portal.teragrid.org Many features (certain resources, documentation, training, consulting, allocations) do not require a portal account!

  9. portal.teragrid.org Documentation Find Information about TeraGrid www.teragrid.org • Click “Knowledge Base” link for quick answers to technical questions • Click “User Info” link to go to www.teragrid.org--> User Support • Science Highlights • News and press releases • Education, outreach and training events and resources

  10. portal.teragrid.orgResources Resources by Category • Shows status of systems currently in production • Click on names for more info on each resource

  11. www.teragrid.orgUserSupportResources Resources by Site • Complete listing of TG resources (including those not yet running) • Scroll through list to see details on each resource • Click on icons to go to user guides for each resource

  12. A few examples of different types of TG resources... Slide courtesy of Dan Katz

  13. Massively Parallel Resources • Ranger@TACC • First NSF ‘Track2’ HPC system • 504 TF • 15,744 Quad-Core AMD Opteron processors • 123 TB memory, 1.7 PB disk • Kraken@NICS (UT/ORNL) • Second NSF ‘Track2’ HPC system • 1 PF Cray XT5 system • 16,512 compute sockets, 99,072 cores • 129 TB memory, 3.3 PB disk Blue Waters@NCSA NSF Track 1 10 PF peak Coming in 2011 Slide courtesy of Dan Katz

  14. Shared Memory Resources • Pople@PSC: • SGI Altix system • 768 Itanium 2 cores • 1.5 TB global shared memory • Primarily for large shared memory and hybrid applications • Nautilus@NICS • SGI UltraViolet • 1024 cores (Intel Nehalem) • 16 GPUS • 4 TB global shared memory • 1 PB file system • Visualization and Analysis • Ember@NCSA (coming in September) • SGI UltraViolet (1536 Nehalem cores)

  15. Visualization & Analysis Resources • Longhorn@TACC: • Dell/NVIDIA Visualization and Data Analysis Cluster • a hybrid CPU/GPU system • designed for remote, interactive visualization and data analysis, but it also supports production, compute-intensive calculations on both the CPUs and GPUs via off-hour queues • TeraDRE@Purdue: • Subcluster featuring NVIDIA GeForce 6600GT GPUs • Used for rendering graphics with Maya, POV-ray, and Blender (among others) • Spur@TACC: • Sun Visualization Cluster • 128 compute cores / 32 NVIDIA FX5600 GPUs • Spur is intended for serial and parallel visualization applications that take advantage of large per-node memory, multiple computing cores, and multiple graphics processors.

  16. Other Specialized TG Resources Data-Intensive Computing • Dash@SDSC: • 64 Intel Nehalem compute nodes (512 cores) • 4 I/O nodes (32 cores) • vSMP (virtual shared memory) • aggregates memory across 16 nodes. • allows applications to address 768GB • 4 TB of Flash memory • (fast file I/O subsystem or fast virtual memory) Heterogeneous CPU/GPU Computing • Lincoln@NCSA: • 192 Dell PowerEdge 1950 nodes (1536 cores) • 96 NVIDIA Tesla S1070 High Throughput Computing • Condor Pool@Purdue • Pool of over 27,000+ processors • Various Architectures and OS • Excellent for parameter sweeps, serial applications

  17. Data Storage Resources • Global File System • GPFS-WAN • 700 TB disk storage at SDSC, historically mounted at a few TG sites • Licensing issues prevent further use • Data Capacitor (Lustre-WAN) • Mounted on growing number of TG systems • 535 TB storage at IU, including databases • Ongoing work to improve performance and authentication infrastructure • Another Lustre-WAN implementation being built by PSC • pNFS is a possible path for global file systems, but is far away from being viable for production • Data Collections • Allocable storage at SDSC and IU (files, databases) for collections used by communities • Tape Storage • Allocable resources available at IU, NCAR, NCSA, SDSC • Most sites provide “free” archival tape storage with compute allocations • Access is generally through GridFTP (through portal or command-line) Adapted from slide by Dan Katz

  18. portal.teragrid.orgResourcesData Collections Data collections represent permanent data storage that is organized, searchable, and available to a wide audience, either a for a collaborative group or the scientific public in general

  19. What is a Science Gateway? • A Science Gateway • Enables scientific communities of users with a common scientific goal • Uses high performance computing • Has a common interface • Leverages community investment • Three common forms: • Web-based portals • Application programs running on users' machines but accessing services in TeraGrid • Coordinated access points enabling users to move seamlessly between TeraGrid and other grids

  20. How can a Gateway help? • Make science more productive • Researchers use same tools • Complex workflows • Common data formats • Data sharing • Bring TeraGrid capabilities to the broad science community • Lots of disk space • Lots of compute resources • Powerful analysis capabilities • A nice interface to information

  21. Gateway HighlightNanoHub Harnesses TeraGrid for Education • Nanotechnology education • Used in dozens of courses at many universities • Teaching materials • Collaboration space • Research seminars • Modeling tools • Access to cutting edge research software

  22. Gateways HighlightSCEC Produces Hazard Map • PSHA hazard map for California using newly released Earthquake Rupture Forecast (UCERF2.0) calculated using SCEC Science Gateway • Warm colors indicate regions with a high probability of experiencing strong ground motion in the next 50 years. • High resolution map, significant CPU use

  23. How can I build a gateway? • Web information available: www.teragrid.org/programs/sci_gateways • How to turn your project into a science gateway • Details about current gateways • Link to write a winning gateway proposal for a TeraGrid allocation • Download code and instructions • Building a simple gateway tutorial • Talk to us • Biweekly telecons to get advice from others • Potential assistance from TeraGrid staff • Nancy Wilkins-Diehr, wilkinsn@sdsc.edu • Vickie Lynch, lynchve@ornl.gov

  24. Some Current Science Gateways Biology and Biomedicine Science Gateway Open Life Sciences Gateway The Telescience Project Grid Analysis Environment (GAE) Neutron Science Instrument Gateway TeraGrid Visualization Gateway, ANL BIRN Open Science Grid (OSG) Special PRiority and Urgent Computing Environment (SPRUCE) National Virtual Observatory (NVO) Linked Environments for Atmospheric Discovery (LEAD) Computational Chemistry Grid (GridChem) Computational Science and Engineering Online (CSE-Online) GEON(GEOsciences Network) Network for Earthquake Engineering Simulation (NEES) SCEC Earthworks Project Network for Computational Nanotechnology and nanoHUB GIScience Gateway (GISolve) Gridblast Bioinformatics Gateway Earth Systems Grid Astrophysical Data Repository (Cornell) Slide courtesy of Nancy Wilkins-Diehr

  25. portal.teragrid.orgResourcesScience Gateways Explore current TG science gateways from the TG User Portal

  26. How One Uses TeraGrid POPS User Portal Science Gateways Command Line Data Service Viz Service 1. Get an allocation for your project RP 1 RP 2 2. Allocation PI adds users TeraGrid Infrastructure (Accounting, Network, Authorization,…) Network, Accounting, … 2. Use TeraGrid resources RP 3 Compute Service (HPC, HTC, CPUs, GPUs, VMs) Slide courtesy of Dan Katz

  27. How to Get Started on the TeraGrid Two Methods: • Direct • PI on allocations must be researchers at US institutions • postdocs can be PIs, but not graduate students • Decide which systems you wish to apply for • Read resource descriptions on www.teragrid.org • Send email to help@teragrid.org with questions • Create a login on the proposal system (POPS) • Apply for a Startup allocation on POPS • Total allocation on all resources must not exceed 200K SUs (core-hours) • Each machine has an individual limit for Startup allocations (check resource description) • Campus Champion • Contact your Campus Champion and discuss your computing needs • Send him your contact information and you’ll be added to the CC account • Experiment with various TG systems to discover the best ones for you • Apply for your own Startup account via the “Direct” method

  28. Proposal System https://pops-submit.teragrid.org Welcome to POPS What’s New Area … Welcome, References to Guide, Policies, Resources, etc. … How to use summary … Deadline Dates …

  29. You can get a lot for a little! By submitting an abstract, your CV, and filling out a form, you get: • A Startup allocation • Up to 200,000 SUs (core hours) on TG systems for one year • That is the equivalent of 8333 days (22.8 years) of processing time on a single core! • Access to consulting from TeraGrid personnel regarding your computational challenges • Opportunity to apply for Advanced Support • Requires additional 1 page justification of your need for advanced support • Can be done together with your Startup request, or at anytime after that

  30. Access to resources • Terminal: ssh, gsissh • Portal: TeraGrid user portal, Gateways • Once logged in to portal, click on “Login” • Also, SSO from command-line Slide courtesy of Dan Katz

  31. TGUP Data Mover • Drag and drop java applet in user portal • Uses GridFTP, 3rd-party transfers, RESTful services, etc. Slide courtesy of Dan Katz

  32. Need Help? First, try searching the Knowledge Base or other Documentation Submit a ticket Send an email to help@teragrid.org Use the TeraGrid User Portal ‘Consulting’ tab Can also call TeraGrid Help Desk 24/7: 1-866-907-2383

  33. A UserExperience: TeraGrid Support Enabling New Science Jan. 2004 DL_POLY ~13,000 atoms NAMD: 740,000 atoms 60X larger!

  34. TeraGrid to the Rescue Fall 2004: Granted allocations at PSC, NCSA, SDSC

  35. Where to Run? • EVERYWHERE • Minimize/preequilibrate on ANL IA-64 (high availability/long queue time) • Smaller simulations 350,000 atoms on NCSA IA-64s (and later Lonestar) • Large simulations 740,000 atoms on highly scalable systems: PSC XT3 and SDSC Datastar • TeraGrid infrastructure critical • Archiving data • Moving data between sites • Analyzing data on the TeraGrid

  36. Result: Open up new phenomenon to investigation through simulation Membrane Remodeling Blood, P.D. and Voth, G.A. Proc. Natl. Acad. Sci.103, 15068 (2006).

  37. Personalized User Support: Critical to Success • Contacted by TG Support to determine needs • Worked closely with TG Support on (then) new XT3 architecture to keep runs going during initial stages of machine operation • TG worked closely with application developers to find problems with the code and improve performance • Same pattern established throughout TeraGrid for User Support • Special advanced support can be obtained for deeper needs (improving codes/workflows/etc.) by applying in POPS (and providing description of needs)

  38. Applying for Advanced Support • Go to teragrid.org Help & Support • Look at criteria and write 1 page justification • Submit with your regular proposal, or as a supplement later on

  39. Advanced Support: Improving Parallel Performance of Protein Folding Simulations The UNRES molecular dynamics (MD) code utilizes a carefully-derived mesoscopic protein force field to study and predict protein folding pathways by means of molecular dynamics simulations. http://www.chem.cornell.edu/has/

  40. Load Imbalance Detection in UNRES Only looking at time spent in the important MD phase • In this case: Developers unaware that chosen algorithm would create load imbalance • Reexamined available algorithms and found one with much better load balance – also faster in serial! • Also parallelized serial function causing bottleneck Observe multiple causes of load imbalance, as well as the serial bottleneck

  41. Major Serial Bottleneck and Load Imbalance in UNRES Eliminated • After looking at the performance profiling done by the PSC, developers discovered that they could use an algorithm with much better load balance and faster serial performance. • Code now runs 4x faster!

  42. Hurricanes and tornadoes cause massive loss of life and damage to property TeraGrid supported spring 2007 NOAA and University of Oklahoma Hazardous Weather Testbed Major Goal: assess how well ensemble forecasting predicts thunderstorms, including the supercells that spawn tornadoes Nightly reservation at PSC, spawning jobs at NCSA as needed for details Input, output, and intermediate data transfers Delivers “better than real time” prediction Used 675,000 CPU hours for the season Used 312 TB on HPSS storage at PSC TG App: Predicting storms Slide courtesy of Dennis Gannon, ex-IU, and LEAD Collaboration

  43. App: GridChem Different licensed applications with different queues Will be scheduled for workflows Slide courtesy of Joohyun Kim

  44. Apps: Genius and Materials Fully-atomistic simulations of clay-polymer nanocomposites Modeling blood flow before (during?) surgery Why cross-site / distributed runs? Rapid turnaround, conglomeration of idle processors to run a single large job Run big compute & big memory jobs not possible on a single machine LAMMPS on TeraGrid HemeLB on LONI Slide courtesy of Steven Manos and Peter Coveney

  45. TeraGrid Annual Conference • Showcases capabilities, achievements and impact of TeraGrid in research • Presentations, demos, posters, visualizations • Tutorials, training and peer support • Student competitions and volunteer opportunities • www.teragrid.org/tg10

  46. Campus Champions Program • Source of local, regional and national high performance computing and cyberinfrastructure information at home campus • Source of information about TeraGrid resources and services that will benefit their campus • Source of startup accounts to quickly get researchers and educators using their allocation of time on the TeraGrid resources • Direct access to TeraGrid staff www.teragrid.org/web/eot/campus_champions

  47. TeraGrid HPC Education and Training • Workshops, institutes and seminars on high-performance scientific computing • Hands-on tutorials on porting and optimizing code for the TeraGrid systems • On-line self-paced tutorials • High-impact educational and visual materials suitable for K–12, undergraduate and graduate classes • www.teragrid.org/web/eot/workshops

  48. HPC University • Virtual Organization to advance researchers’ HPC skills • Catalog of live and self-paced training • Schedule series of training courses • Gap analysis of materials to drive development • Work with educators to enhance the curriculum • Search catalog of HPC resources • Schedule workshops for curricular development • Leverage good work of others • Offer Student Research Experiences • Enroll in HPC internship opportunities • Offer Student Competitions • Publish Science and Education Impact • Promote via TeraGrid Science Highlights, iSGTW • Publish education resources to NSDL-CSERD http://hpcuniv.org/

  49. Sampling of Training Topics Offered • HPC Computing • Introduction to Parallel Computing • Toward MulticorePetascale Applications • Scaling Workshop - Scaling to Petaflops • Effective Use of Multi-core Technology • TeraGrid - Wide BlueGene Applications • Domain-specific Sessions • Petascale Computing in the Biosciences • Visualization • Introduction to Scientific Visualization • Remote/Collaborative TeraScale Visualization on the TeraGrid • Other Topics • Rocks Linux Cluster Workshop • LCI International Conference on HPC Clustered Computing • Over 30 on-line asynchronous tutorials

  50. Broaden Awareness through CI Days • Work with campuses to develop leadership in promoting CI to accelerate scientific discovery • Catalyze campus-wide and regional discussions and planning • Collaboration of Open Science Grid, Internet 2, National Lamda Rail, EDUCAUSE, Minority Serving Institution Cyberinfrastructure Empowerment Coalition, TeraGrid, and local & regional organizations • Identify Campus Champions http://www.cidays.org

More Related