1 / 28

SCD Update

SCD Update. Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder, CO USA. User Forum 17-19 May 2005. NCAR/SCD. IBM Power4. 1. 50. 100. IBM Power3. Position. 150. 200. 250. 300. 350. 1996 Procurement. Year. SCD Update.

ernst
Télécharger la présentation

SCD Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SCD Update Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder, CO USA User Forum 17-19 May 2005

  2. NCAR/SCD IBM Power4 1 50 100 IBM Power3 Position 150 200 250 300 350 1996Procurement Year

  3. SCD Update • Production HEC Computing • Mass Storage System • Services • Server Consolidation and Decommissions • Physical Facility Infrastructure Update • Future HEC at NCAR

  4. News: Production Computing • Redeployed SGI 3800 as Data Analysis engine • chinook became tempest • departure of dave • IBM Power 3 blackforest decommissioned Jan 2005 • Loss of 2.0 Tflops of peak computing capacity • IBM Linux Cluster lightning joined production pool March 2005 • Gain of 1.1 Tflops of peak computing capacity • 256 processors (128 dual node configuration) • 2.2 GHz AMD Opteron processors • 6 TByte FastT500 RAID with GPFS • 40% faster than bluesky (1.3 GHz POWER4) cluster on parallel POP and CAM simulations • 3rd party vendor compilers

  5. Resource Usage FY04 • At the end of FY04, the combined supercomputing capacity at NCAR was ~ 11 TFLOPs • Roughly 81% of that capacity was used for climate simulation and analysis(Climate &IPCC)

  6. bluesky Workload by Facility April 2005

  7. Computing Demand • Science Driving Demand for Scientific Computing Summer 2004: CSL Requests 1.5x Availability Sept 2004: NCAR Requests 2x Availability Sept 2004: University Requests 3x Availability March 2005: University Requests 1.7x Availability

  8. Computational Campaigns • BAMEX Spring 2003 • IPCC FY 2004 • MMM Spring Real-Time Forecasts Spring 2004 • WRF Real-Time Hurricane Forecast Fall 2004 • DTC Winter Real-Time Forecasts Winter 2004-2005 • MMM Spring Real-Time Forecast Spring 2005 • MMM East Pacific Hurricane Formation July 2005

  9. bluesky 8-way

  10. bluesky 32-way

  11. Servicing the DemandNCAR Computing Facility • SCD’s supercomputers are well utilized ... • ... yet average job queue-wait times† are measured in hours (was minutes in ’04), not days † April 2005 average

  12. Average bluesky Queue-Wait Times (HH:MM)

  13. bluesky Queue Wait Times • blackforest removed • lightning charging did not start until March 1 • Corrective (minor) actions taken: • Disallow “batch” node_usage=shared jobs • Increase utility of the “share” nodes (4 nodes, 128 pes) • Shift the “facility” split (CSL/Community) from 50/50 to 45/55 • More accurately reflects the actual allocation distribution • Reduce premium charge from 2.0x to 1.5x • Encourage use of premium if needed for critical turnaround • Have reduced NCAR 30-day allocation limit from 130% to 120% • Matches other groups (leveled playing field) • SCD is watching closely……

  14. Average Compute Factor per GAU Charged Jan 1 Feb 1 Mar 1 Apr 1 May 1 2005

  15. Mass Storage System

  16. Mass Storage System • Disk cache expanded to service files 100MB 60% of files this size being read from cache, not tape mount • Deployment of 200GB cartridges (previous 60 GB) • Now over 500TB of data on these cartridges • Drives provide 3x increase in transfer rate • Full silo holds 1.2 PBs 5 silos hold 6 PBs of data • Users have recently moved to single copy class of service (motivated by GAU compute charges) • Embarking on project to address future MSS growth • Manageable growth rate • User management tools (identify, remove, etc) • User access patterns / User Education (archive selectively, tar) • Compression

  17. SCD Customer Support • Consistent with SCD Reorganization • Phased Deployment Dec 2004 May 2005 • Advantages: • Enhanced service – Computer Production Group 24/7 • Effectively utilize other SCD groups in customer support • Easier questions handled sooner • Harder questions routed to correct group sooner • Feedback Plan SCD will provide a balanced set of services to enable researchers to easily and effectively utilize community resources.

  18. Server Decommissions • MIGS – MSS access from remote sites • Decommission April 12, 2005 • Other contemporary methods now available • IRJE – job submittal to supers (firewall made obsolete) • Decommissioned March 21, 2005 • Front-End Server Consolidation to single new server over next few months • UCAR front-end Sun server (meeker) • UCAR front-end Linux server (longs) • Joint SCD/CSS Sun computational server (k2) • SCD front-end Sun server (niwot)

  19. Physical Facility Infrastructure Update • Chilled water upgrade continues • Brings cooling up to power capacity of data center. • Startup of new chiller went flawlessly on March 15th • May 19-22 Last planned shutdown • Stand-By Generators proved themselves again during outage March 13th , and Xcel power drops April 29 • Design phase of planning electrical distribution upgrades to be completed by late 2005 • Risk assessment identified concerns about substation 3 • Power to data center (station is near lifetime limit) • Additional testing completed Feb. 26th • Awaiting report

  20. Future Plans for HEC at NCAR……

  21. SCD Strategic Plan:High-End Computing Within the current funding envelop, achieve a 25-fold increase over current sustained computing capacity in five years. SCD intends as well to pursue opportunities for substantial additional funding for computational equipment and infrastructure to support the realization of demanding institutional science objectives. SCD will continue to investigate and acquire experimental hardware and software systems. • IBM BlueGene/L 1Q2005

  22. SCD Target Capacity

  23. Challenges in Achieving 2006-2007 Goals • Capability vs. Capacity • Costs (price performance) • Need/Desire for Capability Computing (define!) • Balance within center of capability and capacity. How? • NCAR/SCD “fixed income” • Business Plans • Evaluating Year 5 Option with IBM • Engaging vendors to informally analyze SCD Strategic Plan for HEC • Likely to enter year-long procurement for 4Q2006 deployment of additional capacity and capability

  24. Beyond 2006 • Data Center Limitations / Data Center Expansion • NCAR center limits of power/cooling/space will be reached with 2006 computing addition • New center requirements have been compiled/completed • Conceptual Design for new center is near completion • Funding options being developed with UCAR • Opportunity of NSF Petascale Computing Initiative • Commitment to balanced and sustained investment in robust cyberinfrastructure. • Supercomputing systems • Mass storage • Networking • Data Management Systems • Software Tools and Frameworks • Services and Expertise • Security

  25. Scientific Computing DivisionStrategic Plan2005-2009 to serve the computing, research and data management needs of atmospheric and related sciences. www.scd.ucar.edu

  26. Questions

More Related