1 / 22

TeraGrid Deep Highlights SAB Presentation January 14, 2008

TeraGrid Deep Highlights SAB Presentation January 14, 2008. Ralph Roskies, Scientific Director Pittsburgh Supercomputing Center roskies@psc.edu. Overview. TeraGrid DEEP goal is enabling discovery of new Science using high-end computational resources

yuri-dean
Télécharger la présentation

TeraGrid Deep Highlights SAB Presentation January 14, 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TeraGrid Deep HighlightsSAB PresentationJanuary 14, 2008 Ralph Roskies, Scientific Director Pittsburgh Supercomputing Center roskies@psc.edu

  2. Overview • TeraGrid DEEP goal is enabling discovery of new Science using high-end computational resources • Will present brief descriptions of a sampling of recent achievements, (most also found in the TeraGrid Science Highlights booklet of 2007). • Will also point out the value added of the TeraGrid, beyond simply providing well-managed, reliable platforms for the research • Will comment briefly on the transformational implications of this work • Will conclude with comments about the petascale era.

  3. CosmologyMike Norman* UCSD • Small (1 part in 105) spatial inhomogeneities 380,000 years after the Big Bang, as revealed by WMAP Satellite data, get transformed by gravitation into the pattern of severe inhomogeneities (galaxies, stars, voids etc.) that we see today. • Enormously demanding computations that will clearly use petascale computing when available. • Uniform meshes won’t do, must zoom in on dense regions to capture the key physical processes- gravitation (including dark matter), shock heating and radiative cooling of gas. So need an adaptive mesh refinement scheme (they use 7 levels of mesh refinement). * Indicates ASTA support The filamentary structure in this simulation in a cube 1.5 billion light years on a side is also seen in real life observations such as the Sloan Digital Sky Survey.

  4. Cosmology • Benefits from large memory capabilities (NCSA Altix- with 1.5 TB of shared memory, and 2 TB of distributed memory on IBM Datastar at SDSC). • Adaptive mesh refinement is very hard to load-balance on distributed memory machines. • SDSC helped make major improvements in the scaling and efficiency of the code (ENZO). • Used an NCSA-developed tool (Amore) for high quality visualizations; also used the HDF format, developed at NCSA for facilitating the data handing of 8 TB of data, which was stored at SDSC.

  5. CosmologyTiziana di Matteo, Carnegie Mellon U • Included the effect of black holes on large-scale simulations of the universe. • Found that black holes regulate galaxy formation. As they swallow gas, they radiate so much energy that they stop the inflow of gas • Worked with PSC to improve scaling and use hybrid MPI-shared memory programming for GADGET. This led to better understanding of what was necessary for peta-scalability, and led to the successful PetaApps proposal. Gas density is shown (increasing with brightness) with temperature (increasing from blue to red color). Yellow circles indicate black holes (diameter increasing with mass). At about 6 billion years, the universe has many black holes and a pronounced filamentary structure.

  6. Protein StructureDavid Baker, U. of Washington • How is the 3-D structure of a protein determined by its sequence of amino acids? • David Baker’s Rosetta code has proved the best at predicting protein structure from sequence in biannual competitions (CASP- Critical Assessment of Structural Predictions) • Can then design enzymes to accomplish particular tasks by investigating the folding pattern of alternate sequences Protein structure prediction by the Rosetta code, showing the predicted structure (blue), the X-ray structure (red, unknown when the prediction was calculated), and a low-resolution NMR structure (green).

  7. Protein Structure • For CASP7, used 1.3 M hours on NCSA Condor resource to identify promising targets (coarse resolution). Then refined 22 promising targets on 730,000 hours of SDSC Blue Gene. • SDSC helped improve scaling to run on 40,960 processor BlueGene at IBM, which reduced the running time for a single prediction to 3 hours, instead of weeks on a typical 1000 processor cluster. (BlueGene well suited to compute intensive, small memory tasks) • A web portal, called Robetta, allows researchers to run Rosetta jobs without programming. NCSA worked with Baker’s team and the Condor group at U. of Wisconsin to integrate Condor workload management system supporting Robetta, with NCSA computing resources. Substantially reduces time to completion.

  8. Storm predictionMing Xue*, U. of Oklahoma • Better alerts for thunderstorms, especially supercells that spawn tornados, could save milions of dollars and many lives. • Unprecedented experiment, every day from April 15- June 8 (tornado season) to test the ability of storm-scale ensemble prediction under real forecasting conditions for US east of the Rockies. • First time for • ensemble forecasting at storm scale (had been used for larger scale models) • real-time in a simulated operational environment • Successful predictions of the overall pattern and evolution of many of the convective-scale features, sometimes out to the second day, and good ability to capture storm-scale uncertainties Top- prediction 21 hours ahead of time for May 24, 2007 ; bottom- observed

  9. Storm Modeling • 10-member ensembles (at 4 km resolution) ran for 6.5 to 9.5 hours each day, using 66 Cray XT3 processors at PSC. One 600 processor high resolution model (2 km resolution) ran for 9 hours. >100× more computing daily than the most sophisticated National Weather Service operational forecasts.. • Transferred 2.6 TB of data daily to Norman, Oklahoma • PSC optimized IO, and modified the reservation and job-processing logic of its job-scheduling software to implement auto-scheduling of the runs and related post-processing, (760 jobs/day), demonstrating the ability to use the Cray XT3, a very large capability resource, on a scheduled, real-time basis. • Also used TeraGrid Gateway LEAD to test on-demand forecasts, triggered automatically in regions where storms were likely to develop. Those ran on NCSA Tungsten system at 2km resolution.

  10. Earthquake AnalysisJacobo Bielak* and David O’Halloran, CMU • Civil engineers want to predict how the earth will shake, taking into account subsurface soil properties and the nature of seismic waves. • CMU Team and SCEC (Southern California Earthquake Center) create realistic 3-D models of earthquakes in the Los Angeles basin, using empirical information about the inhomogeneous basin properties. The changing nature of soil characteristics demand adaptive meshing (but only once). • It is computationally very demanding to find the ‘high frequency’ (above 1 Hz) properties because these involve shorter wavelengths and thus finer meshes. But these are what matter for the building engineers.

  11. Earthquake Analysis • Quake team, a fruitful collaboration of computer scientists and computational scientists and PSC consultants, developed Hercules code for PSC Cray XT3- does the meshing, the load balancing, the wave propagation, and the visualization. Won the SC06 Analytics Challenge Award. • Runs on the whole XT3, and sustains over a Teraflop. PSC helped optimize the code and developed the ability to stream results to remote sites to enable the researchers to interact with the calculations in real time, changing what is being visualized. • PSC also developed visualization tools to compare results from Hercules with those of SCEC (uniform meshes) to validate results.

  12. Viral replication in fluGregory Voth, U. of Utah • Influenza A M2 channel is a trans-membrane four helix channel believed to play a key role in the viral life cycle, by transporting protons into the cell. • Showed how the M2 channel operates as a proton conductor in responding to acidic conditions on either side of the cell membrane, and how the anti-flu drug amantadine blocks the channel. (M2 exhibits unique pH-gated behavior as opposed to voltage-gated behavior of many other proton channels). • The next step is to study strains of M2 that do not bind amantadine to see whether other compounds could function as effective anti-flu drugs. The M2 channel with the proton-conducting water wire disrupted by the presence of the anti-flu drug amantadine. The helices of the M2 channel (blue), the proton-gating His37 residues (mauve), and the proton-blocking amantadine molecule (orange) are depicted. The lipid bi-layer membrane is not shown so that the channel can be seen more clearly.

  13. Viral replication in flu • Used clusters at TACC (Lonestar), and Indiana U. Big Red with significant advice from them on compiler options and the MPI environment which improved efficiency. • These were hybrid QM/MM (quantum mechanics, molecular mechanics) simulations with an effective QM potential. They also used systems at NCSA to simulate explicit proton transfer using purely molecular dynamics simulations

  14. Urban WaterJim Uber- U. of CincinnatiK. Mahinthakumar, R. Ranjithan & D. Brill- NCSU • Urban water systems cover hundreds of square miles, and include thousands of miles of pipe. • Developed methods to locate the source of contaminants of urban water systems, and approaches to limiting their impact (for security, public health, regional economy). • Using observed sensor data, they use an iterative process to simulate various sources, compare with real sensor data, and launch another set of simulations. • Have run serious simulations of the Cincinnati Water Works. Team is learning to cope with problem situations, for example to engage different or new sensors when some within the network malfunction. • As a result, they also developed better algorithms for distinguishing different contamination sources which present similar sensor profiles.

  15. Urban Water • Uses hundreds of processor simultaneously at NCSA, SDSC and Argonne (the original DTF systems). They worked with the Argonne team to build a framework that figures out how many jobs to send to which site, based on the length of each system’s queues. • SDSC helped them with the grid software, including cross-site runs using MPICH-G2. • Early simulations simulated a few hundreds of sensors; have now grown to 11,000 but a city network could have 300,000.

  16. Zeolite DatabasesMichael Deem*, Rice U.David Earl, U. of Pittsburgh • Zeolites are silicate minerals with a porous, Swiss-cheese-like structure. For decades, chemists have relied on zeolites to catalyze chemical reactions on an industrial scale. They are used to make everything from gasoline and asphalt to laundry detergent and aquarium filters. • In the past 50 years, the catalog of naturally occurring zeolites – there are about 50 of them – has been bolstered to approximately 180 with the addition of synthetic varieties, • Deem and Earl used the TeraGrid to identify potentially new zeolites by searching for hypothetically stable structures. Their database now contains over 3.5 million structures. By studying the catalog, scientists might find structures that are more efficient, either in terms of energy inputs or in waste byproducts.

  17. Zeolite Databases • Used systems at TACC, Purdue, Argonne, NCSA, SDSC. • TACC developed tools like MyCluster, harnessing the distributed, heterogeneous resources available on the TeraGrid network into a single virtual environment for the management and execution of their simulation runs. • At Purdue, the application was used within a Condor pool of more than 7000 processors using standard Linux tools for job management. 4M+ processor hours used in 22 months. Performance engineer supported application revisions and job management scripts to ... • Track execution time to detect runaways and terminate gracefully. • Add application self checkpoint to recover from job and system failures • Control of number of jobs in queue to practical maximum w.r.t. system capabilities. • Dynamically adjust cap to hold percentage of jobs executing to 80-90% of total in queue (number executing varied from <100 to peak of ~2000).

  18. Transformative Aspects • Many of the scientific breakthroughs are potentially transformative. • Large scale computations intrinsic to the enormous strides cosmology has taken in the past 2 decades. • Understanding the detailed atomic level mechanisms behind biological systems leads to major opportunities in design of improvements. • Similarly atomic understanding will transform design of new materials (zeolites, magnetic storage, alloys, …) • Computation is transforming how we can deal with disaster warning and mitigation (storms, earthquakes, attacks on water systems)

  19. Transformative Aspects • There is also a substantial transformation in how we do the Science • Faster turnaround leads to greater researcher productivity and changes the questions we ask in all disciplines. • Also allows things like interactive steering, ensemble forecasts and better uncertainty analysis • High speed networks allow much greater span of collaborative activities, and better use of distributed heterogeneous resources

  20. Futures in the Petascale Era • Scientific need for petascale computing widely documented • TeraGrid energized by prospect of Track 1 and Track 2 systems • Serious concern about community readiness to deal with scaling needs. Widespread agreement that NSF should identify and support key needed software developments. NSF has already held competitions (and announced some awards) for • Peta-applications to support the development of petascale scientific applications, • SDCI to support the development of petascale tools, • CI-TEAM to support learning and workforce development. • TeraGrid held a meeting for winners and likely winners. It produced recommendations, including:

  21. Some PetaApps Meeting Recommendations(will factor into Year 4 and 5 planning) • Support a forum for exchanging experiences of researchers at scale • Use TG08 meeting (June) to bring the groups together again to share findings • Provide a repository of information about petascale computing • Track 1 and Track 2 specifications, to facilitate appropriate algorithm development • Libraries, tools, workflow tools, and characterize advantages/strengths to help users select appropriate tools (workflow, optimization, parallelization, debugging, profiling etc.) • Exemplars, case studies, best practices, etc. • Special allocation and job scheduling policy for debugging massively parallel applications • Petascale support services, with expertise in algorithm alternatives, workflows

  22. Questions??

More Related