1 / 37

Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I

Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I. Patryk Lasoń, Marek Magryś. ACC Cyfronet AGH-UST. established in 1973 part of AGH University of Science and Technology in Krakow , PL p rovides free computing resources for scientific institutions

geary
Télécharger la présentation

Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towardsenergyefficient HPC HP Apollo 8000 atCyfronet Part I Patryk Lasoń, Marek Magryś

  2. ACC Cyfronet AGH-UST • establishedin1973 • part of AGH University of Science and TechnologyinKrakow, PL • providesfreecomputing resources for scientific institutions • centre of competenceinHPC and GridComputing • IT service management expertise (ITIL, ISO 20k) • member of PIONIER • operator of Krakow MAN • home for Zeus

  3. International projects

  4. PL-GridConsortium • Consortiumcreation – January 2007 • a response to requirements from Polish scientists • due to ongoing Grid activities in Europe (EGEE, EGI_DS) • Aim:significantextension of amount of computing resources provided to the scientific community (start of the PL-GridProgramme) • Development based on: • projectsfunded by the EuropeanRegional Development Fund as part of the InnovativeEconomy Program • closeinternationalcollaboration (EGI, ….) • previousprojects (5FP, 6FP, 7FP, EDA…) • National Network Infrastructure available: Pionier National Project • computingresources: Top500 list • Polish scientific communities: ~75% highly rated Polish publications in 5 Communities PL-Grid Consortium members: 5 High Performance Computing Polish Centres, representing Communities, coordinated by ACC Cyfronet AGH

  5. PL-Gridinfrastructure • Polish national IT infrastructure supporting e-Science • based upon resources of most powerful academic resource centres • compatible and interoperable with European Grid • offering grid and cloud computing paradigms • coordinated by Cyfronet • Benefits for users • one infrastructure instead of 5 separate compute centres • unified access to software, compute and storage resources • non-trivial quality of service • Challenges • unified monitoring, accounting, security • create environment of cooperation rather than competition • Federation – the key to success

  6. Competence Centre in the Field of Distributed Computing Grid Infrastructures PLGridCoreproject • Budget: total 104 949 901,16 PLN, including funding from the EC : 89 207 415,99 PLN • Duration: 01.01.2014 – 31.11.2015 • ProjectCoordinator: Academic Computer Centre CYFRONET AGH The main objective of the project is to support the development of ACC Cyfronet AGH as a specialized competencecentrein the field of distributed computing infrastructures, with particular emphasis on grid technologies, cloud computing and infrastructures supporting computations on big data.

  7. PLGridCoreproject– services • Basic infrastructure services • Uniform access to distributed data • PaaSCloud for scientists • Applications maintenance environment of MapReduce type • End-user services • Technologies and environmentsimplementing the Open Science paradigm • Computing environment for interactiveprocessing of scientific data • Platform for development and execution of large-scale applications organized in a workflow • Automatic selection of scientific literature • Environment supporting data farming mass computations

  8. HPC at Cyfronet 2013 2007 2008 2009 2010 2011 2012 Baribal Panda Zeus vSMP Mars Zeus Zeus Platon U3 FPGA Zeus GPU

  9. ZEUS

  10. 374 TFLOPS #176, #1 in Poland

  11. Zeus • over1300servers • HP BL2x220c blades • HP BL685cfatnodes (64 cores, 256 GB) • HP BL490cvSMPnodes (up to 768 cores, 6 TB) • HP SL390s GPGPU (2x,8x) nodes • InfinibandQDR (Mellanox+Qlogic) • >3 PB of diskstorage (Lustre+GPFS) • Scientific Linux 6, Torque/Moab

  12. Zeus - statistics • 2400 registered users • >2000 jobs running simultaneously • >22000 jobs per day • 96 000 000 computing hours in 2013 • jobs lasting from minutes to weeks • jobs from 1 core to 4000 cores

  13. Cooling Hot aisle Coldaisle Hot aisle Rack Rack 40°C 40°C 20°C

  14. Future system

  15. Whyupgrade? • Jobsgrowing • Usershatequeuing • New users, newrequirements • Technology movingforward • Power bill stayingthe same

  16. New building

  17. Requirements • Petascale system • Lowest TCO • Energy efficient • Dense • Good MTBF • Hardware: • corecount • memorysize • networktopology • storage

  18. Cooling

  19. DirectLiquidCooling! • Up to 1000x moreefficientheatexchangethan air • Less energy needed to movethecoolant • Hardware can handle • CPUs ~70C • memory ~80C • Hard to cool 100% of HW withliquid • networkswitches • PSUs

  20. MTBF • The less movementthebetter • less pumps • less fans • less HDDs • Example • pump MTBF: 50 000 hrs • fan MTBF: 50 000 hrs • 1800 node system MTBF: 7 hrs

  21. Thetopology Core IB switches services nodes storagenodes 576 computingnodes 576 computingnodes 576 computingnodes Service isle Computing isle Computing isle Computing isle

  22. Itshouldcount • Max jobsize ~10k cores • FastestCPUs, but compatiblewith old codes • Twosocketsareenough • CPUs, not accelerators • Newestmemory • and morethanbefore • Fast interconnect • stillInfiniband • but no need for full CBB fattree

  23. Thehard part • Public institution, public tender • Strict requirements • 1.65 PFLOPS, max. 1728 servers • 128 GB DDR4 per node • warm water cooling, no pumps inside nodes • infiniband topology • compute+cooling, dry-cooler only • Criteria: price, power, space

  24. And thewinneris… • HP Apollo 8000 • Most energy efficient • Theonlysolutionwith 100% warmwatercooling • Leastfloorspaceneeded • Lowest TCO

  25. Evenmore Apollo • Focusesalso on ‘1’ in PUE! • Power distribution • Less fans • Detailed monitoring • ‘energy to solution’ • Safermaintenance • Less cables • Prefabricatedpiping • Simplified management

  26. System configuration • 1.65 PFLOPS (first 30. of the current Top500) • 1728 nodes, Intel Haswell E5-2680v3 • 41472 cores, 13824 per island • 216 TB DDR4 RAM • PUE ~1.05, 680 kW total power • 15 racks, 12.99 m2 • System ready for undisruptive upgrade • Scientific Linux 6 or 7

  27. Prometheus • Created human • Gave fire to the people • Accelerated innovation • Defeated Zeus

  28. Deployment plan • Contractsigned on 20.10.2014 • Installation of theprimaryloopstarted on 12.11.2014 • First delivery (service island) expected on 24.11.2014 • Apollo pipingshouldarrivebeforeChristmas • Maindeliveryin January • Installation and acceptanceinFebruary • Productionworksince Q2 2015

  29. Futureplans • Benchmarking and Top500 submission • Evaluation of Scientific Linux 7 • Movingusersfromtheprevious system • Tuning of applications • Energy-awarescheduling • First experiencepresentedatHP-CAST 24

  30. prometheus@cyfronet.pl

  31. More information • www.cyfronet.krakow.pl/en • www.plgrid.pl/en

More Related