1 / 24

Center for High-End Computing Systems

Center for High-End Computing Systems. Srinidhi Varadarajan Director. Motivation. We need a paradigm shift to make supercomputers more usable for mainstream computational scientists.

lucas
Télécharger la présentation

Center for High-End Computing Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Center for High-End Computing Systems Srinidhi Varadarajan Director

  2. Motivation • We need a paradigm shift to make supercomputers more usable for mainstream computational scientists. • A similar shift occurred in computing in the 1970s when the advent of inexpensive minicomputers into academia spurred a large body of computing research. • Results from this research went back to industry creating a growth cycle that lead computing being a commodity. • This requires a comprehensive “rethink” of programming languages, runtime systems, operating systems, scheduling, reliability and operations and management • Moving to petascale class systems significantly complicates this challenge. • Need a computing environment that can efficiently and usably span the scales from department sized systems to national resources.

  3. Perspectives • Most of the “big iron” today is concentrated in DoD, DOE, NASA and NSF supercomputing centers. • Their mandate is national interest – their goal is to provide stable production cycles to computational scientists. • The future of supercomputing – research into supercomputing itself – is necessarily different from providing stable production cycles.

  4. Vision • Our goal is to build a world class research group focused on high-end systems research. • This involves research in architectures, networks, power optimization, operating systems, compilers and programming models, algorithms, scheduling and reliability. • Our faculty hiring in systems is targeted to cover the breadth of these research areas. • The center is involved in research and development work, including design and prototyping of systems and development of production quality systems software. • The goal is to design and build the software infrastructure that makes HPC systems usable by the broad computational science and engineering community. • Provide support to high performance computing users on-campus. This involves the center in supporting actual applications, are then profiled to gauge the performance impact of its research.

  5. Structure • CHECS was setup in the College of Engineering in Sep. 2005 • Funded by the College of Engineering • The Center consists of several core research labs with affiliated faculty. • Has affiliated faculty within and outside of CS with domain expertise. • Complemented by an industry affiliates program.

  6. Research Labs • Computing Systems Research Lab (CSRL) • Distributed Systems and Storage Lab (DSSL) • Laboratory for Advanced Scientific Computing and Applications (LASCA) • Parallel Emerging Architectures Research Lab (PEARL) • Scalable Performance Laboratory (SCAPE) • Systems, Networking and Renaissance Grokking Lab (SyNeRGY)

  7. CS Faculty/Students

  8. Usability • Flows: Threads based distributed shared memory programming model • MPI On-Ramp: Removing the difficulties of mapping communication design abstractions to MPI code through visual tools and code generation • Operation stacking framework: algorithms and tools for improving the performance of large-scale ensemble computations • ReSHAPE: improving utilization and throughput on clusters via dynamically re-sizable parallel computations • Code Generation on Steroids: enhancing the functionality of automatically generated code through Generative Aspect Oriented Programming

  9. Operating Systems • FlexiCache: Improving OS file system performance by developing an interface to support a repertoire of (pluggable) cache replacement polices in the kernel • Cadus: Co-Scheduling of real-time threads and garbage collection • Practical Fair-Sharing scheduling: finding and automatically adopting policies for stock kernels • ‘MAGNETizing’ SystemTap: Enabling dynamic, on-the-fly probing and export of kernel information

  10. Power Aware Computing • High-performance, power-aware computing: frameworks for power, energy, and thermal measurement, analysis, and optimization • Frameworks: PowerPack, MISER • Supercomputing in small spaces: low-power & power-aware supercomputing

  11. Architectures • Programming Layered Multiprocessors: a unified programming approach for layered shared-memory multiprocessors, with multithreaded or multicore execution components • MELISSES: Continuous hardware monitors for power-performance adaptation schemes on layered parallel architectures

  12. Runtime Systems • Top: A framework for flexible, high-level instrumentation of binaries • DyniX: A framework for combined static/dynamic analysis of Java code • déjà vu: Transparent checkpointing and recovery for parallel applications • Weaves: Runtime system for adaptive compositional codes

  13. Networks • High-performance networking: architecture, protocols, performance (modeling, evaluation, auto-tuning) in system-area & wide-area networks • Open Network Emulator: Integrated environment for simulation and direct-code execution of network protocols

  14. Numerical Methods • Surrogate approximation: mathematical construction of functional approximations using sparse data in high dimensions, with ultimate application to multidisciplinary design optimization (MDO) • Robust design optimization: solving optimization problems with stochastic variables and constraints. • pDIRECT: massively parallel direct search algorithms for global optimization • Mathematical software for terascale machines: scalable algorithms for polynomial systems of equations, global optimization, MDO, and interpolatory approximation.

  15. Applications • mpiBLAST: high-performance bioinformatics • Stochastic modeling: parameter estimation for stochastic cell cycle models • Remote sensing: parallel algorithms for remote sensing applications • WBCSim: a problem solving environment for wood based composites manufacturing processes.

  16. Facilities • System X: 2200 processor PowerPC cluster with Infiniband interconnect • Anantham: 400 processor Opteron Cluster with Myrinet interconnect • Several 8-32 processor research clusters. • 12 processor SGI Altix shared memory system • 8 processor AMD Opteron shared memory system. • 16 core AMD Opteron shared memory system • 16 node Playstation 3 cluster • Currently building a 2400 core x86 cluster for research in power aware computing, programming models and fault tolerance.

  17. Publications

  18. Funding

  19. CHECS Outreach • Training • Summer FDI on parallel computation: 2005, 2006, 2007. Average attendance of 15-20 faculty plus graduate students. • Offered 6-hour short courses on MPI and OpenMP parallel programming to graduate students each semester; average attendance of 15. • Anantham: approximately 500,000 jobs were run in 2007 alone; huge majority by CoE users including students working with Andrew Duggleby (ME), Walt O’Brien (ME), and David Cox (ChemE). • Visitors: Reed, Fowler, Munoz, Dongarra • Chair of System X allocation committee • Developing senior level course in Computational Science & Engineering, to be cross-listed with ESM

  20. CHECS HPC Consulting 2006-07

  21. Industrial Impact • Developed industrial affiliates program modeled on MPRG. • In negotiations with Merrill Lynch to get them as the first affiliate. • Two venture funded startups originated from CHECS. • Created the Green500 list that ranks the most energy-efficient supercomputers.

  22. Recent Achievements • 5 NSF CAREER awards • 2 DOE CAREER awards • 3 IBM Faculty awards • Dean’s Award for Excellence in Research • 2 VT Faculty Fellows • Best Paper Award at PPoPP • Won Storage Challenge at Supercomputing 2007 • 2 Faculty in HPCWire List of People to Watch in Supercomputing • 1 Faculty in MIT TR100 list of young researchers

More Related