1 / 5

HPCx:an Overview

HPCx:an Overview. Dr Arthur Trew Director, EPCC. what is HPCx?. HPCx is the latest in a series of HPC services for UK academia £52.9M from UK Research Councils £1M from IBM for HPC R&D at EPCC £600k from IBM for Life Sciences outreach UoE HPCx Ltd runs the contract

zazu
Télécharger la présentation

HPCx:an Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HPCx:an Overview Dr Arthur Trew Director, EPCC

  2. what is HPCx? • HPCx is the latest in a series of HPC services for UK academia • £52.9M from UK Research Councils • £1M from IBM for HPC R&D at EPCC • £600k from IBM for Life Sciences outreach • UoE HPCx Ltd runs the contract • wholly-owned subsidiary of the the University • CCLRC, EPCC and IBM are subcontractors • EPSRC’s objectives for the procurement were • “ to deliver the optimum service resulting in world-leading science” • “ address the problems in scaling codes to capability levels (512+)” • … so, the challenges we face are to • support change from capacity to capability • develop more scalable codes • science support is the key to success IBM Team Talent Meeting

  3. the story so far • Phase 1 service started on 9 December 2002 • The first year was extremely successful • CPU utilisation grown to >80% • 25+ user groups, ~350 users IBM Team Talent Meeting

  4. Metric TSL FSL Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Ave Technology serviceability (%) 80 99.2 99.6 98.0 96.8 99.9 99.8 100.0 99.5 98.5 99.6 98.2 99.9 100.0 99.1 Technology MTBF (hours) 200 300 293 183 81 732 732  366 183 418 209 1464  300 Number of AV FTEs 7.5 10 12.6 11.7 12.6 13.5 11.6 12.9 12.4 9.8 10.9 11.2 13.0 10.8 11.9 Number of training days per month 30/12 40/12 10/1 17/2 17/3 24/4 33/5 33/6 33/7 33/8 35/9 40/10 49/11 50/12 50/12 queries esolved <3 days (%) 85 97 98.7 98.7 97.8 100.0 100.0 100.0 100.0 98.5 100.0 100.0 100.0 100.0 99.5 Number of A&M FTEs 3.75 5.75 8.2 7.1 7.9 5.4 5.4 5.6 6.7 5.1 6.7 7.9 6.5 5.4 6.5 A&M serviceability (%) 80 100 99.4 99.6 99.9 100. 99.9 99.5 100. 99.9 99.9 98.8 99.9 99.7 99.7 • … but the colony switch did have poor performance and reliability IBM Team Talent Meeting

  5. looking forward • the Phase 1  Phase 2 upgrade most risky part of the project • new hardware, colony  federation • new software, PSSP  CSM • EPSRC has funded small Phase 2 development machine • so, the support teams are more prepared • but switch performance is (currently) poor • … and unlikely to satisfy EPSRC • termination is unlikely but the relationship with EPSRC could be less cordial post-Phase 2 IBM Team Talent Meeting

More Related