1 / 31

Scientific Computing Developments at STFC

Scientific Computing Developments at STFC. Peter Oliver Scientific Computing Department Oct 2012. Outline. STFC Compute and Data National and International Services Hartree Centre Summary. STFC Operate World-Class Science Facilities. Daresbury Laboratory

marika
Télécharger la présentation

Scientific Computing Developments at STFC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scientific Computing Developments at STFC Peter Oliver Scientific Computing Department Oct 2012

  2. Outline • STFC • Compute and Data • National and International Services • Hartree Centre • Summary

  3. STFC Operate World-Class Science Facilities

  4. Daresbury Laboratory Daresbury Science and Innovation Campus Warrington, Cheshire UK Astronomy Technology Centre Edinburgh Polaris House Swindon, Wiltshire Rutherford Appleton Laboratory Harwell Oxford Science and Innovation Campus Chilbolton Observatory Stockbridge, Hampshire Joint Astronomy Centre Hawaii Isaac Newton Group of Telescopes La Palma

  5. What we do…. • The nuts and bolts that make it work • enable scientists, engineers and researcher to develop world class science, innovation and skills

  6. SCARF • Providing Resources for STFC Facilities, Staff and their collaborators • ~2700 Cores • Infiniband • Panasasfilesystem • Managed as one entity • ~50 peer reviewed publications/year • Additional capacity per year for general use • Facilities such as CLF add capacity using their own funds • NGS Partner • Local access using Myproxy-SSO • Users use federal id and password to login • UK e-Science Certificate access

  7. NSCCS (National Service Computational Chemistry Software) • Providing National and International Compute, Training and support • EPSRC Mid-Range Service • SGI Altix UV SMP system, 512 CPUs, 2TB shared memory • Large memory SMP chosen over a traditional cluster as this best suites the Computational Chemistry Applications • Supports over 100 active users • ~70 peer reviewed papers per year • Over 40 applications installed • Authentication using NGS technologies • Portal to submit jobs • to allow access to less computationally aware chemists

  8. Tier-1 Architecture OPN • >8000 processor cores • >500 disk servers (10PB) • “STK” tape robot (10PB) • >37 dedicated T10000 tape drives (A/B/C) SJ5 ATLAS CASTOR CMS CASTOR LHCB CASTOR GEN CASTOR CPU Storage Pools ESC

  9. E-infrastructure South • Consortium of UK universities • Oxford, Bristol, Southampton, UCL • Formed the Centre for Innovation • With STFC as a partner • Two New Services (£3.7M) • IRIDIS – Southampton – x86-64 • EMERALD – STFC – GPGPU Cluster • Part of larger investment in e-infrastructure • A Midland Centre of Excellence (£1M). Led by Loughborough University • West of Scotland Supercomputing Centre for Academia and Industry (£1.3m). Led by the University of Strathclyde • E-Infrastructure Interconnectivity (£2.58M). Led by the University of Manchester • MidPlus: A Centre of Excellence for Computational Science, Engineering and Mathematics (£1.6 M). Led by the University of Warwick

  10. EMERALD • Providing Resources to Consortium and partners • Consortium of UK universities • Oxford, Bristol, Southampton, UCL, STFC • Largest production GPU facility in UK • 372 NvidiaTelsa M2090 GPUs • Scientific Applications • Still under discussion • Computational Chemistry front runners • AMBER • NAMD • GROMACS • LAMMPS • Eventually 100’s of applications covering all sciences

  11. EMERALD • 6 racks

  12. EMERALD • System Applications • RedHat Enterprise 6.x • Platform LSF • CUDA tool kit • SDK and libraries • Intel and Portland Compilers • Scientific Applications • Still under discussion • Computational Chemistry front runners • AMBER • NAMD • GROMACS • LAMMPS • Eventually 100’s of applications covering all sciences

  13. EMERALD • Managing a GPU cluster • Headline • GPUs are more power efficient and give more Gflops/Watt than x86-64 servers • Reality……True……But • Each 4 U Chassis • ~1.2 kW/U space • Full rack required 40+ kW! • Hard to cool • Additional in row coolers • Containment • Uneven power demand • stresses aircon and power infrastructure • 240 GPU job • 31kW Cluster idle to 80kW instantly

  14. JASMIN/CEMS • CEDA data storage & services • Curated data archive • Archive management services • Archive access services (HTTP, FTP, Helpdesk, ...) • Data intensive scientific computing • Global / regional datasets & models • High spatial, temporal resolution • Private cloud • Flexible access to high-volume & complex data for climate & earth observation communities • Online workspaces • Services for sharing & collaboration

  15. JASMIN/CEMS • Deadline (or funding gone!) 31st March 2012 for “doing science” • Government Procurement : £5M Tender to order < 4 weeks • Machine room upgrades + Large Cluster compete for time • Bare floor to operation in 6 weeks • 6 hours from power off to 4.6PBytes ActiveStore11 mounted at RAL • “Doing science” 14th March • 3 Satellite Site installs in Parallel (Leeds 100TB, Reading 500TB, ISIC 600TB) Oct 2011 ... 8-Mar-2012 BIS Funds .. Tender ... Order ... Build ... Network .. Complete

  16. JASMIN/CEMS at RAL • 12 Racks w. Mixed Servers and Storage • 15KW/rack peak (180KW Total) • Enclosed cold aisle + in-aisle cooling • 600kg / rack (7.2 Tonnes total) • Distributed 10Gb network • (1 Terabit/s bandwidth) • - Single 4.5PB global file system • Two VMware vSphere pools of servers with dedicated image storage. • 6 Weeks bare floor to working 4.6PB.

  17. JASMIN/ CEMS Infrastructure Configuration: Storage:103 Panasas ActiveStor 11 shelves, (2,208 x 3TB drives total). Computing: ‘Cloud’ of 100’s of Virtual machines hosted on 20 Dell R610 Servers Networking: 10Gb Gnodal throughout. “Lightpath” dedicated links to UK and EU Supercomputers Physical: 12 Racks. Enclosed aisle w. In line chillers Capacity: RAL 4.6 PB useable (6.6PB raw). This is equivalent to 920,000 DVDs (a 1.47 km high tower of DVDs) High Performance: 1.03 Tb/s total storage bandwidth = Copying 1500 DVDs per minute Single Namespace Solution: one single file system, managed as one system Status: The largest Panasas system in the world and one of the largest storage deployments in the UK

  18. JASMIN/CEMS Networking • Gnodal 10Gb Networking • 160 x 10Gb Ports • in a 4 x GS4008 switch stack • Compute • 23 Dell servers for VM hosting • (VMware vCentre + vCloud) and HPC access to storage. • 8 Dell Servers for compute • Dell EquallogiciSCSI arrays (VM images) • All 10Gb connected. • Already upgraded 10Gb network • to add 80 more Gnodal 10Gb ports • Compute expansion

  19. What is Panasas Storage? Director Blade • “A complete hardware and software storage solution” • Ease of Management • Single Management Console for 4.6PB • Performance • Parallel access via DirecFlow, NFS, CIFS • Fast Parallel reconstruction • ObjectRAID • All files stored as objects. • RAID level per file • Vertical, Horizontal and network parity • Distributed parallel file system • Parts (objects) of files on every blade • All blades transmit/receive in parallel • Global Name Space • Battery UPS • Enough to shut down cleanly. • 1x 10Gb Uplink per shelf • Performance scales with size Storage Blades

  20. PanActive Manager

  21. Panasas in Operation • Performance • Random IO 400MB/s per host • Sequential IO 1Gbyte/s per host • External Performance • 10Gb connected • Sustained 6Gp/s • Reliability • 1133 Blades • 206 Power Supplies • 103 Shelf Network switches • 1442 components • Soak testing revealed 27 faults • In Operation 7 faults • No loss of service • ~0.6% failure per year • Compared to commodity storage ~5% per year

  22. Infrastructure Solutions • R89 • ~800m2 machine room floor • ~240 42 Racks, 2 office floors • 4.5 MW infrastructure for power and cooling • UPS backed by 950kW generator for critical services • 4 Chillers providing 2.25MW cooling plus 750kW on standby • £1M e-infrastructure improvements lead by SCT • Cold aisle containment • In row chillers 12 x32kW fed by chilled water • Future • Space for a large project • Space for expansion • New build as necessary • ISICHyperwall • 28 Panel driven by 28 Servers for HPC Visualisation

  23. Infrastructure SolutionsSystems Management • Backups • System and User Data • SVN • Codes and documentation • Monitoring • Ganglia, Cacti, Power-management • Alerting • Nagios • Security • Intrusion detection, patch monitoring • Deployment • Kickstart, LDAP, inventory database • VMware • Server consolidation,extra resilience • 150+ Virtual servers • Supporting all e-Science activities • Development Cloud • ~

  24. e-Infrastructures • Lead role in National and International e-infrastructures • Authentication • Lead and Develop UK e-Science Certificate Authority • Total issued ~30,000 • Current~3000 • Easy integration of UK Access Management Federation • Authorisation • Use existing EGI tools • Accounting • Lead and develop EGIAPEL accounting • 500M Records, 400GB data • ~282 Sites publish records • ~12GB/day loaded into the main tables • Usually 13 months but Summary data since 2003 • Integrated into existing HPC style services

  25. e-Infrastructures • Lead role in National and International e-infrastructures • User Management • Lead and develop NGSUAS Service • Common portal for project owners • Manage Project and User Allocations • Display trends, make decisions (policing) • Information, what services are available? • Lead and develop the EGI information portal GOCDB • 2180 registered GOCDB users belonging to 40 registered NGIs • 1073 registered sites hosting a total of 4372 services • 12663 downtime entries entered via GOCDB • Training & Support • Training Market place • tool developed to promote training opportunities, resources and materials • SeIUCCR Summer Schools • Supporting 30 students for 1 week Course (120 Applicants)

  26. Hartree Centre • Drive the adoption of “HPC” across UK industry • to improve their competitiveness and generate wealth with next-generation applications and cloud-enabled user interfaces • Delivering step-change improvements in software functionality and scalability in order to address grand challenge problems • Establish a world-class “HPC” technology & skills development centre in Daresbury Other Collaborators Business Innovation Department Scientific Computing (CSED & e-Science) Hartree Centre Daresbury Research Collaboratory in association with IBM

  27. Hartree Centre • Develop & enhance research collaboration • nationally & internationally • on key focus areas for the UK • Foster innovation in academia & commerce • Educational focus on computational science & engineering • Help to establish a better career structure for CSE staff

  28. Hartree - Compute and Data • A six rack Blue Gene/Q system comprising: • 98,304 cores providing 70,778,880 core hours per month • 6,144 nodes • 16 cores & 16 GB per node • 1.26 Pflop/s peak • 5PB Storage backed with 15PB tape • An 8,192 core iDataplex System, providing 5,898,240 core hours per month comprising: • Node has 16 cores, 2 sockets- Intel Sandy Bridge (AVX etc.) • 252 nodes with 32 GB per node • 4 nodes with 256 GB per node • 12 nodes with Nvidia X3090 GPUs • 256 nodes with 128 GB per node • 196 Tflop/s peak • ScaleMP virtualization software allows up to 4TB virtual shared memory

  29. Hartree - Visualisation Screens

  30. Summary • High Performance Computing and Data • SCARF • NSCCS • JASMIN • EMERALD • GridPP – Tier1 • Managing e-Infrastructures • Authentication, Authorisation, Accounting • Resource discovery • User Management, help and Training • Hartree Centre • HPC in industry • Next generation exa-scale codes

  31. Information • Website • http://www.stfc.ac.uk/SCD • Contact • Pete Oliver 01235 445164 • peter.oliver@stfc.ac.uk • www.stfc.ac.uk/hartree • hartree@stfc.ac.uk Questions?

More Related