1 / 20

The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-. Marc XAB, M.A. - 桜美林大学大学院 Country Manager. 5/07/2013 @ BAH! Oil & Gas - Rio de Janeiro, Brazil. Super Micro Computer Inc. Rua Funchal , 418. Sao Paulo – SP www.supermicro.com/brazil. Networking in Rio. Company Overview.

feleti
Télécharger la présentation

The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The New Era of Coprocessor in Supercomputing -并行计算中协处理应用的新时代- Marc XAB, M.A. - 桜美林大学大学院 Country Manager 5/07/2013 @ BAH! Oil & Gas - Rio de Janeiro, Brazil Super Micro Computer Inc. RuaFunchal, 418. Sao Paulo – SP www.supermicro.com/brazil

  2. Networking in Rio

  3. Company Overview San Jose (Headquarter) Revenues: FY10 $721 M  FY11 $942 M FY12 $1B Global Footprint: >70 Countries, 700 customers, 6800 SKUs Production: US, EU and Asia Production facilities Engineering: 70% of workforce in engineering, SSI Member Market Share: #1 Server Channel Corporate Focus: Leader Energy Efficient, HPC & Application-Optimized Systems Fortune 2012 100 Fastest-Growing Companies Fremont Facility

  4. COPROCESSOR (协处理器) • A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). • Operations performed by the coprocessor may be floating point arithmetic, graphics, signal processing, string processing, encryption or I/O Interfacing with peripheral devices. • Math coprocessor – a computer chip that handles the floating point operations and mathematical computations in a computer. • Graphics Processing Unit (GPU) – a separate card that handles graphics rendering and can improve performance in graphics intensive applications, like games. • Secure crypto-processor – a dedicated computer on a chip or microprocessor for carrying out cryptographic operations, embedded in a packaging with multiple physical security measures, which give it a degree of tamper resistance • Network coprocessor. 网络协处理器. • ……..

  5. The Trend Indicated on Green500

  6. Case Study – Submerged Liquid Cooling • Removed Fans and Heat Sinks • Use SSD & Updated BIOS • Reverse the handlers • Supermicro 1U (Single CPU) with two coprocessors • No requirement for room-level cooling • Operates at PUE ~ 1.12 • 25 kilowatts per rack – the breakpoint per rack (between regular air-cool and submerged cool) “Submerged Supermicro Servers Accelerated by GPUs” Cost Efficiency ~25kW Air cool Submerged liquid cool KW / rack

  7. Tesla: 2-3x Faster Every 2 Years Maxwell 16 14 12 10 DP GFLOPS per Watt Thousands of core 8 Kepler 6 512 cores 4 Fermi T10 2 2008 2010 2012 2014

  8. GPU Supercomputer Momentum 52 # of GPU Accelerated Systems on Top500 4x June 2012 Top500 Tesla Fermi Launched First Double Precision GPU 2008 2009 2010 2011 2012 2013

  9. Case Study – PNNL • Expects supercomputer to rank in world's top 20 fastest machines. • Research for climate and environmental science, chemical processes, biology-based fuels that can replace fossil fuels, new materials for energy applications, etc. Supermicro FatTwin™ with 2x MIC 5110P per node

  10. Case Study – PNNL Supermicro FatTwin™ with 2x MIC 5110P per node • Theoretical peak processing speed of 3.4 petaflops • 42 racks / 195,840 cores • 1440 compute nodes with conventional processors and Intel Xeon Phi "MIC" accelerators • 128 GB memory per node • FDR Infiniband network • 2.7 petabyte shared parallel file system (60 gigabytes per second read/write)

  11. Programing Paradigm The Xeon Phi programming model and its optimization are shared across the Intel Xeon Made Easier CUDA(Compute Unified Device Architecture) – a parallel computing platform and programming model. CUDA provides developers access to the virtual instruction set and memory of the parallel computational elements in CUDA GPUs. Don’t Complicated

  12. Keynotes • This is a new era of hybrid computing – heterogeneous architecture with PCI-E based coprocessor • Specialized (or application-optimized) design is required for GPU/MIC applications and HPC future scalability • There are more to come in the industry roadmap with new technologies, power management and system architecture • Configurable cooling & power for energy efficiency and performance are more and more critical • The trend towards heterogeneous architecture poses many challenges for system builder and software developers in making efficient use of the resources • Programming paradigm and its investment are important as a part of the selecting consideration

  13. Oil and Gas/Seismic Weather and Climate • Weather • Atmospheric • Ocean Modeling • Space Sciences • Seismic imaging • Seismic Interpretation • Reservoir Modeling • Seismic Inversion HPC Coprocessor Applications Scientific Simulation & Creation Design • Computational fluid dynamics • Materials science • Molecular dynamics • Quantum chemistry • Mechanical design & simulation • Structural mechanics • Electronic Design Automation • Massively parallel architecture accelerates Scientific & Engineering Applications Data Mining • Data parallel mathematics • Extend Excel with OLAP for planning & analysis • Database and data analysis acceleration Imaging and Computer Vision Computational Finance • Medical imaging • Visualization & docking • Filmmaking & animation • Options pricing • Risk analysis • Algorithmic trading

  14. Hybrid Computing Mainstream FatTwin™ 2-node 8 GPUs or MICs per node Density Efficiency Hybrid Computing Pioneer NVIDIA Kepler & Intel Xeon Phi supports Ultra High Efficiency GPGPU Where it started… 7U GPU Blades 20 CPUs + 20 GPUs The fastest 1U server in the world FatTwin™ 4-node 3 GPUs or MICs per node Telsa S1070 4 GPUs or MICs Workstation / 4U 2U 4-GPU 1U 4-GPU Standalone box 2U GPU w/ QDR IB onboard PCI-E x16 X9 (UP) 1U 2-GPU/MIC 1U 3-GPU 2U Twin The most powerful PSC 1U Twin™ X9 2U 6-GPU/MIC X9 (DP) 1U 4-GPU/MIC 2008 2009 2010 2011 2012 2013

  15. Communication Between Coprocessors IB Switch IB IB Implementation Example The model used by existing CPU-GPU Heterogeneous architectures for GPU-GPU communication. Data travels via CPU & Infiniband (IB) Host Channel Adapter (HCA) and Switch or other proprietary interconnect Data transfer between cooperating GPUs in separate nodes in a TCA cluster enabled by the PEACH2 chip. Schematic of the PEARL network within a CPU/GPU cluster Source: Tsukuba University

  16. Designing GPU/MIC Optimized Systems • Performance • PCI-e lanes arrangement, PCB placement, interconnect • Mechanical design • mounting, location, space utilization • Thermal • air flow, fan speed control, location, noise control • Power support • PSU efficiency, wattage options, power management • Number of power connectors (& location)

  17. Summary • Coprocessor and Applications • Performance and Efficiency • Top500 & Green500 • Hybrid Computing & HPC • GPU/MIC Optimized Systems • Design Considerations • Performance • Mechanical Design • Thermal & Cooling • Power Support

  18. Thank You! Marc XAB marc.xab@supermicro.com

  19. Conference Puzzle How do you put an ELEPHANT in a Refrigerator ?

  20. Conference Puzzle

More Related