1 / 17

Accelerators

Accelerators. Ran Ginosar Avinoam Kolodny Yuval Cassuto Koby Crammer Shmuel Wimer Dani Lichinski. (research)(motivation) questions. We love accelerators, but… What accelerators ? What workload? What “killer applications” ? Why study / develop them? Who needs them?

makana
Télécharger la présentation

Accelerators

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accelerators Ran Ginosar Avinoam Kolodny Yuval Cassuto Koby Crammer Shmuel Wimer Dani Lichinski

  2. (research)(motivation) questions • We love accelerators, but… • What accelerators ? • What workload? What “killer applications” ? • Why study / develop them? • Who needs them? • What architecture(s) ? • What goals are we seeking to fulfill ? • In addition to winning ICRI-CI research grants

  3. Why accelerators? • Semiconductor industry sells $300B/year (10% INTC) • 1M high profit chips/day • $100/chip, $100M/day. Mostly CPU. • 10% of revenues. 100-1000% gross profit • 90M low cost chips/day • $10/chip, $900M/day. 50% gross profit • Growth < 10% • In the year 2023? • Need to expand into another rich industry • Store-and-compute accelerators will be the driver

  4. Which industry is • Rich • Much richer than semiconductors • Under-utilized • Begs for progress (and can pay for it) • Critical, will not disappear • Video? Entertainment? Communication?

  5. Health Care • $2.5 Trillion in US alone • Already 10x the entire global semiconductor industry • $4.5T by 2020 • Global is probably 3X, $15T by 2020 • Key challenge: • Today: imprecise, statistics-based diagnosis and treatment • Develop into more efficient, more successful discipline by combining science & computing

  6. Future health care is computerized (store and compute) • Medical/health data about 10B people • Genomics, proteomics (5 GB/person) • Health & medical record (1 GB/person) • Continuous accumulating readings of sensors(4 GB/person) • Medical, environmental, food & drugs • Monitor and process all individuals • Machine learning • Predict and alert medical conditions • Individualize drugs, diets, treatments

  7. Storage required • 10 GB/person • 10B people • 1020 Bytes (100 ExaBytes, 100 Mega-TeraBytes) • 100 million of today’s 1 TBytes disk. 100+ data centers • 500 MegaWatts to store, read and write • $350 Million / year

  8. Computing required • Run through 50% of data each day • Perform 10 op / byte • 1021 OP/day = 1016 OP/sec • Only 10M cores of 1 GOPS each • 100 data centers • Power: only 10 MegaWatt • 2% of storage power

  9. Solution: move computing closer to data • The HMC industry already makes the first step • 100,000 TSV vertical interconnects

  10. Not yet there • Wish to get closer: stack memory on top CPU ? • NO. Too hot • CPU operates above 100ºC • DRAM is useless above 85ºC • Solution • Dispose of the CPU • Create 3D low-power (low temperature), uniform-power-density, high-performance store & compute machine

  11. NVM NVM NVM NVM NVM NVM Store & Compute NVM NVM DRAM+SRAM NVM NVM NVM NVM • 1 Tbyte / chip in 2020 • Combined DRAM + NVM • Accelerators • 1000 cores “many-core” • MIMD • Associative Processors • SIMD • Internal + external networks NVM 3D Accelerator p-m NOC p-m NOC p-m NOC p-m NOC p-m NOC p-m NOC p-m NOC 2D Accelerator NOC

  12. Challenges 5 mm • Need 100M chips • Max 0.1 W / chip • Total 10 MWatt • 100-1000 data centers 20 mm 20 mm NVM p-m NOC p-m NOC p-m NOC 2D Accelerator NOC 500 chips 50 Watt

  13. More challenges • Understand workload • Understand algorithms • Architect the store & compute accelerators • Low lowlow power • High (data-intensive) performance

  14. Approaches • Associative processors • Classic store & compute • Uniform power distribution • Massive parallelism • Very low power • Orthogonal access SIMD processors • Sequential and parallel access • Mitigate data-movement bottleneck

  15. Approaches • Average case computing • ALU that runs faster than worst case • And dissipates less power than worst case • Enables low power just-in-time architecture • Personalized vision/graphics for personal mobile devices • Inspires workload understanding • Memristive processors and resistive memories • Presented by Yuval Cassuto

More Related