1 / 29

Accelerated Computing: The Path Forward

Supercomputing has swept rapidly from the far edges of science to the heart of our everyday lives. And propelling it forward – bringing it into the mobile phone already in your pocket and the car in your driveway – is GPU acceleration, NVIDIA CEO Jen-Hsun Huang told a packed house at a rollicking event kicking off this week’s SC15 annual supercomputing show in Austin. The event draws 10,000 researchers, national lab directors and others from around the world.

nvidia
Télécharger la présentation

Accelerated Computing: The Path Forward

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accelerated Computing: The Path Forward Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 | Nov. 16, 2015 ACCELERATED COMPUTING: THE PATH FORWARD

  2. COMMODITY DISRUPTS CUSTOM SOURCE: Top500

  3. ACCELERATED COMPUTING: THE PATH FORWARD “It’s time to start planning for the end of Moore’s Law, and it’s worth pondering how it will end, not just when.” Robert Colwell Director, Microsystems Technology Office, DARPA ACCELERATED COMPUTING: THE PATH FORWARD “It’s time to start planning for the end of Moore’s Law, and it’s worth pondering how it will end, not just when.” Robert Colwell Director, Microsystems Technology Office, DARPA

  4. NVIDIA ACCELERATES COMPUTING Productive Programming Model & Tools Expert Co-Design Accessibility APPLICATION MIDDLEWARE SYS SW LARGE SYSTEMS PROCESSOR 0.0 0.5 1.0 1.5 2.0 2.5 3.0 2008 2009 2010 2011 2012 2013 2014 NVIDIA GPU x86 CPU Fast GPU Engineered for High Throughput TFLOPS M2090 M1060 K20 K80 K40 Fast GPU + Strong CPU NVIDIA ACCELERATES COMPUTING Productive Programming Model & Tools Expert Co-Design Accessibility APPLICATION MIDDLEWARE SYS SW LARGE SYSTEMS PROCESSOR 0.0 0.5 1.0 1.5 2.0 2.5 3.0 2008 2009 2010 2011 2012 2013 2014 NVIDIA GPU x86 CPU Fast GPU Engineered for High Throughput TFLOPS M2090 M1060 K20 K80 K40 Fast GPU + Strong CPU

  5. 0 25 50 75 100 125 2013 2014 2015 100+ accelerated systems now on Top500 list 1/3 of total FLOPS powered by accelerators NVIDIA Tesla GPUs sweep 23 of 24 new accelerated supercomputers Tesla supercomputers growing at 50% CAGR over past five years Top500: # of Accelerated Supercomputers ACCELERATORS SURGE IN WORLD’S TOP SUPERCOMPUTERS 0 25 50 75 100 125 2013 2014 2015 100+ accelerated systems now on Top500 list 1/3 of total FLOPS powered by accelerators NVIDIA Tesla GPUs sweep 23 of 24 new accelerated supercomputers Tesla supercomputers growing at 50% CAGR over past five years Top500: # of Accelerated Supercomputers ACCELERATORS SURGE IN WORLD’S TOP SUPERCOMPUTERS

  6. MACHINE LEARNING HPC’S 1ST CONSUMER KILLER-APP “NADELLA: SMART AGENTS LIKE CORTANA WILL REPLACE THE WEB BROWSER” -BI FACEBOOK MESSENGER ADDS FACIAL RECOGNITION YOUTUBE: CLICK-TO-BUY ADS GOOGLE PHOTOS: ML-POWERED FEATURES MICROSOFT OPEN-SOURCES DMTK GOOGLE OPEN-SOURCES TENSORFLOW MACHINE LEARNING HPC’S 1ST CONSUMER KILLER-APP “NADELLA: SMART AGENTS LIKE CORTANA WILL REPLACE THE WEB BROWSER” -BI FACEBOOK MESSENGER ADDS FACIAL RECOGNITION YOUTUBE: CLICK-TO-BUY ADS GOOGLE PHOTOS: ML-POWERED FEATURES MICROSOFT OPEN-SOURCES DMTK GOOGLE OPEN-SOURCES TENSORFLOW

  7. TESLA FOR MACHINE LEARNING 10M Users 40 years of video/day 270M Items sold/day 43% on mobile devices TESLA M4TESLA M40 HYPERSCALE SUITE POWERFUL: Fastest Deep Learning Performance LOW POWER: Highest Hyperscale Throughput GPU Accelerated FFmpeg Image Compute Engine GPU REST Engine TESLA FOR MACHINE LEARNING 10M Users 40 years of video/day 270M Items sold/day 43% on mobile devices TESLA M4TESLA M40 HYPERSCALE SUITE POWERFUL: Fastest Deep Learning Performance LOW POWER: Highest Hyperscale Throughput GPU Accelerated FFmpeg Image Compute Engine GPU REST Engine

  8. MACHINE LEARNING REVOLUTIONIZING TRANSPORTATION “Toyota Invests $1 Billion in Artificial Intelligence in U.S.” — U.S. News & World Report MACHINE LEARNING REVOLUTIONIZING TRANSPORTATION “Toyota Invests $1 Billion in Artificial Intelligence in U.S.” — U.S. News & World Report

  9. 39% 45% 55% 62% 66% 72% 75% 79% 83% 30% 40% 50% 60% 70% 80% 90% 100% 7/8 7/22 8/5 8/19 9/2 9/16 9/30 10/1410/28 END-TO-END MACHINE LEARNING PLATFORM FOR AUTONOMOUS CARS NVDRIVENET on KITTI Object Detection BAIDU 11/10 DRIVE PX DIGITS DevBox NVIDIA 39% 45% 55% 62% 66% 72% 75% 79% 83% 30% 40% 50% 60% 70% 80% 90% 100% 7/8 7/22 8/5 8/19 9/2 9/16 9/30 10/1410/28 END-TO-END MACHINE LEARNING PLATFORM FOR AUTONOMOUS CARS NVDRIVENET on KITTI Object Detection BAIDU 11/10 DRIVE PX DIGITS DevBox NVIDIA

  10. . MACHINE LEARNING REVOLUTIONIZING AUTONOMOUS MACHINES

  11. . JETSON TX1 Supercomputer on a Module 10x Energy Efficiency Alexnet GPU 1 TFLOPS 256-core Maxwell CPU 64-bit ARM A57s Memory 4GB LPDDR4 26 GB/s Power Under 10W 0 10 20 30 40 50 Intel Core i7-6700K (Skylake) Jetson TX1 Images/Sec/Watt Under 10W for typical use cases

  12. . PC GAMING SUPERCOMPUTING EVERYWHERE Titan X for PC Tesla in the Cloud Jetson TX1 for Robots DRIVE PX for Auto

  13. . Ian Buck, VP of Accelerated Computing, NVIDIA SC15 | Nov. 16, 2015 ACCELERATED COMPUTING: THE PATH FORWARD

  14. . TESLA ACCELERATES DISCOVERY AND INSIGHT 270M Items sold/day 43% on mobile devices SIMULATION TESLA ACCELERATED COMPUTING VISUALIZATIONMACHINE LEARNING

  15. . “Approximately a third of HPC systems operating today are equipped with accelerators and nearly half of all newly deployed systems have them.” ACCELERATED COMPUTING: A TIPPING POINT FOR HPC, Intersect360 Nov 2015

  16. . 70% OF TOP HPC APPS NOW ACCELERATED VASP NOW ACCELERATED Typically Consumes 10-25% of HPC System INTERSECT360 SURVEY OF TOP APPS Top 10 HPC Apps 90% Accelerated Top 50 HPC Apps 70% Accelerated 1 Dual K80 Server 1.3x 4 Dual CPU Servers 1.0x Intersect360, Nov 2015 “HPC Application Support for GPU Computing” Dual-socket Xeon E5-2690 v2 3GHz, Dual Tesla K80, FDR InfiniBand Dataset: NiAl-MD Blocked Davidson

  17. . 370 GPU-Accelerated Applications www.nvidia.com/appscatalog

  18. . TESLA FOR SIMULATION LIBRARIES TESLA ACCELERATED COMPUTING LANGUAGESDIRECTIVES ACCELERATED COMPUTING TOOLKIT

  19. . TESLA K80 World’s Fastest Accelerator for HPC 0 5 10 15 20 25 30 Tesla K80 Server Dual CPU Server # of Days AMBER Benchmark: PME-JAC-NVE Simulation for 1 microsecond CPU: E5-2698v3 @ 2.3GHz. 64GB System Memory, CentOS 6.2 CUDA Cores 2496 Peak DP 1.9 TFLOPS Peak DP w/ Boost 2.9 TFLOPS GDDR5 Memory 24 GB Bandwidth 480 GB/s Power 300 W Simulation Time from 1 Month to 1 Week 5x Faster AMBER Performance

  20. . APPLICATION PERFORMANCE BOOSTS DATA CENTER THROUGHPUT TESLA K80: 5X FASTER 1/3 OF NODES ACCELERATED, 2X SYSTEM THROUGHPUT 100 Jobs Per Day 220 Jobs Per Day CPU-only System Accelerated System 0x 5x 10x 15x QMCPACK LAMMPS CHROMA NAMD AMBER K80 CPU CPU: Dual E5-2698 v3@2.3GHz 3.6GHz, 64GB System Memory, CentOS 6.2 GPU: Single Tesla K80, Boost enabled Speed-up vs Dual CPU

  21. . OPENACC DELIVERS TRUE PERF PORTABILITY Paving the Path Forward: Single Code for All HPC Processors 4.1x 5.2x 7.1x 4.3x 5.3x 7.1x7.6x 11.9x 30.3x 0x 5x 10x 15x 20x 25x 30x 35x 359.MINIGHOST (MANTEVO) NEMO (CLIMATE & OCEAN) CLOVERLEAF (PHYSICS) CPU: MPI + OpenMP CPU: MPI + OpenACC CPU + GPU: MPI + OpenACC SpeedupvsSingleCPUCore Application Performance Benchmark 359.miniGhost: CPU: Intel Xeon E5-2698 v3, 2 sockets, 32-cores total, GPU: Tesla K80- single GPU NEMO: Each socket CPU: Intel Xeon E5-‐2698 v3, 16 cores; GPU: NVIDIA K80 both GPUs CLOVERLEAF: CPU: Dual socket Intel Xeon CPU E5-2690 v2, 20 cores total, GPU: Tesla K80 both GPUs

  22. . TESLA HYPERSCALE FOR MACHINE LEARNING 10M Users 40 years of video/day 270M Items sold/day 43% on mobile devices TESLA M4TESLA M40 HYPERSCALE SUITE POWERFUL: Fastest Deep Learning Performance LOW POWER: Highest Hyperscale Throughput GPU Accelerated FFmpeg Image Compute Engine GPU REST Engine

  23. . TESLA M40 World’s Fastest Accelerator for Deep Learning 0 1 2 3 4 5 6 7 8 9 Tesla M40 CPU 8x Faster Caffe Performance # of Days Caffe Benchmark: AlexNet training throughput based on 20 iterations, CPU: E5-2697v2 @ 2.70GHz. 64GB System Memory, CentOS 6.2 CUDA Cores 3072 Peak SP 7 TFLOPS GDDR5 Memory 12 GB Bandwidth 288 GB/s Power 250W Reduce Training Time from 8 Days to 1 Day

  24. . TESLA M4 Highest Throughput Hyperscale Workload Acceleration CUDA Cores 1024 Peak SP 2.2 TFLOPS GDDR5 Memory 4 GB Bandwidth 88 GB/s Form Factor PCIe Low Profile Power 50 – 75 W Video Processing 4x Image Processing 5x Video Transcode 2x Machine Learning Inference 2x H.264 & H.265, SD & HD Stabilization and Enhancements Resize, Filter, Search, Auto-Enhance Preliminary specifications. Subject to change.

  25. . TESLA FOR VISUALIZATION IRAY TESLA ACCELERATED COMPUTING INDEXOPTIX VISUALIZATION TOOLS FOR HPC

  26. . GROWING ADOPTION IN CLIMATE & WEATHER MeteoSwiss Deploys World’s First Accelerated Weather Supercomputer 2x higher resolution for daily forecasts 14x more simulation with ensemble approach for medium range forecasts NOAA Chooses Tesla To Improve Weather Forecast Research Develop global model with 3km resolution, five-fold increase from today’s resolution Improved resolution requires 40x higher in computational complexity

  27. . NEXT-GEN SUPERCOMPUTERS ARE GPU-ACCELERATED SIMULATION TESLA ACCELERATED COMPUTING VISUALIZATIONMACHINE LEARNING SUMMIT SIERRA U.S. Dept. of Energy Pre-Exascale Supercomputers for Science IBM Watson Breakthrough Natural Language Processing for Cognitive Computing NOAA New Supercomputer for Next-Gen Weather Forecasting

  28. . ENTERPRISE World’s first in-situ weather simulation, running on Meteoswiss supercomputer Simulation of deadly tornado that hit El Reno, Oklahoma on May 24, 2011 ACCELERATED SCIENCE AND DATA ANALYTICS ON DISPLAY AT SC’15 Analyze quantum effects in nanowire using 10,800 GPUs on TITAN supercomputer Predicting drug reactions for personalized medicine with GPU-powered IBM Spark

More Related