1 / 14

Let’s Open Up New Fields for Next 10X!

Let’s Open Up New Fields for Next 10X!. Koji Inoue Kyushu University, Japan. We Need More Performance! But…. High performance is exactly required more! Supercomputing, Desktop, Laptop, Cellar Phone, Home Games, and so on But, power is a SERIOUS problem! Device Reliability Grobal Warming.

marius
Télécharger la présentation

Let’s Open Up New Fields for Next 10X!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Let’s Open Up New Fieldsfor Next 10X! Koji Inoue Kyushu University, Japan ISLPED'08@Bangalore

  2. We Need More Performance! But… • High performance is exactly required more! • Supercomputing, Desktop, Laptop, Cellar Phone, Home Games, and so on • But, power is a SERIOUS problem! • Device Reliability • Grobal Warming Total Power Consumption in Japan 250 200 150 100 50 12X • Need drastic improvement! • Not Incremental! Billions KW/h 5X 2006 2025 2050 http://www.meti.go.jp/press/20071207005/03_G_IT_ini.pdf ISLPED'08@Bangalore

  3. How? • Fundamental concept for low-power has matured in “Our Current Field”! • DVS, DVFS, Selective Activation, Exploiting Prediction, and so on… • Suggestion • Move to a new (or strange) field! • Orchestrate computing resources effectively! ISLPED'08@Bangalore

  4. New Fields! • Revisit the system stack from circuit/architecture level to algorithm level on new fields! Superconducting rapid single-flux-quantum (SFQ) New Reconfigurable Devices 3D-IC Implementation with Wireless Links Yokohama National University Nagoya University Advanced Industrial Science and Technology Keio University ISLPED'08@Bangalore

  5. The Case for SFQ-RDP Project 10 TFLOPS Desk-top Computer RSFQ with new architecture Kyushu University, Yokohama National University, Nagoya University, SRL/ISTEC ISLPED'08@Bangalore

  6. Superconducting rapid single-flux-quantum (SFQ) : Device Level Energy-delay products Superconducting wire Ballistic propagation Bit energy [J] MCM developed by SRL ISLPED'08@Bangalore Gate delay [s]

  7. Large-Scale Reconfigurable Data-Path for SFQ : Architecture Level • 1K FPUs operate at 80 GHz • Re-configurable operand network • Much simple organization for SFQ design (No feedback loops) • Make a good balance between “Parallel Exe. Vs. Sequential Exe.” ISLPED'08@Bangalore

  8. How To Exploit A Number of FPUs: Compiler Level Application Program • Large DFGs are extracted from source codes • They are executed in pipeline fashion • SFQ-RDP is used as an “Efficient Accelerator” DFG Extractor (w/ source level optimization) Mapping ISLPED'08@Bangalore

  9. How To Exploit A Number of FPUs: Algorithm Level Computation of molecular orbital while (I < 1000): tei(4,4,4,4)=(((3+2*p*(4*PAx*PBx+PBx**2+PAx**2*(1+2*p*PBx**2)))*(3+2*q*(4*QCx*QDx+QDx**2+QCx**2*(1+2*q*QDx**2)))*f(0,t))/(p**2*q**2)+(4*(3+2*p*(4*PAx*PBx+PBx**2+PAx**2*(1+2*p*PBx**2)))*PQx*(QCx+QDx)*(3+2*q*QCx*QDx)*f(1,t))/(p*q*(p+q))(4*(PAx+PBx)*(3+2*p*PAx*PBx)*PQx*(3+2*q*(4*QCx*QDx+QDx**2+QCx**2*(1+2*q*QDx**2)))*f(1,t))/(p*q*(p+q))(8*(PAx+PBx)*(3+2*p*PAx*PBx)*(QCx+QDx)*(3+2*q*QCx*QDx)*(((p+q)*f(1,t))+2*p*PQx**2*q*f(2,t)))/(p*q*(p+q)**2)+(2*(3+2*p*(4*PAx*PBx+PBx**2+PAx**2*(1+2*p*PBx**2)))*(3+q*(QCx**2+4*QCx*QDx+QDx**2))*(((p+q)*f(1,t))+2*p*PQx**2*q*f(2,t)))/(p*q**2*(p+q)**2)+(2*(3+p*(PAx**2+4*PAx*PBx+PBx**2))*(3+2*q*(4*QCx*QDx+QDx**2+QCx**2*(1+2*q*QDx**2)))*(((p+q)*f(1,t))+2*p*PQx**2*q*f(2,t)))/(p**2*q*(p+q)**2)+(4*(3+2*p*(4*PAx*PBx+PBx**2+PAx**2*(1+2*p*PBx**2)))*PQx*(QCx+QDx)*(3*(p+q)*f(2,t)+2*p*PQx**2*q*f(3,t)))/(q*(p+q)**3)\+(8*(3+p*(PAx**2+4*PAx*PBx+PBx**2))*PQx*(QCx+QDx)*(3+2*q*QCx*QDx)*(3*(p+q)*f(2,t)+2*p*PQx**2*q*f(3,t)))/(p*(p+q)**3)(8*(PAx+PBx)*(3+2*p*PAx*PBx)*PQx*(3+q*(QCx**2+4*QCx*QDx+QDx**2))*(3*(p+q)*f(2,t)+2*p*PQx**2*q*f(3,t)))/(q*(p+q)**3)(4*(PAx+PBx)*PQx*(3+2*q*(4*QCx*QDx+QDx**2+QCx**2*(1+2*q*QDx**2)))*(3*(p+q)*f(2,t)+2*p*PQx**2*q*f(3,t)))/(p*(p+q)**3)+((3+2*p*(4*PAx*PBx+PBx**2+PAx**2*(1+2*p*PBx**2)))*(3*(p+q)**2*f(2,t)+4*p*PQx**2*q*(3*(p+q)*f(3,t)+p*PQx**2*q*f(4,t))))/(q**2*(p+q)**4)(8*(PAx+PBx)*(3+2*p*PAx*PBx)*(QCx+QDx)*(3*(p+q)**2*f(2,t)+4*p*PQx**2*q*(3*(p+q)*f(3,t)+p*PQx**2*q*f(4,t))))/(q*(p+q)**4)(8*(PAx+PBx)*(QCx+QDx)*(3+2*q*QCx*QDx)*(3*(p+q)**2*f(2,t)+4*p*PQx**2*q*(3*(p+q)*f(3,t)+p*PQx**2*q*f(4,t))))/(p*(p+q)**4)+(4*(3+p*(PAx**2+4*PAx*PBx+PBx**2))*(3+q*(QCx**2+4*QCx*QDx+QDx**2))*(3*(p+q)**2*f(2,t)+4*p*PQx**2*q*(3*(p+q)*f(3,t)+p*PQx**2*q*f(4,t))))/(p*q*(p+q)**4)+((3+2*q*(4*QCx*QDx+QDx**2+QCx**2*(1+2*q*QDx**2)))*(3*(p+q)**2*f(2,t)+4*p*PQx**2*q*(3*(p+q)*f(3,t)+p*PQx**2*q*f(4,t))))/(p**2*(p+q)**4)(4*p*(PAx+PBx)*(3+2*p*PAx*PBx)*PQx*(15*(p+q)**2*f(3,t)+4*p*PQx**2*q*(5*(p+q)*f(4,t)+p*PQx**2*q*f(5,t))))/(q*(p+q)**5)+(8*(3+p*(PAx**2+4*PAx*PBx+PBx**2))*PQx*(QCx+QDx)*(15*(p+q)**2*f(3,t)+4*p*PQx**2*q*(5*(p+q)*f(4,t)+p*PQx**2*q*f(5,t))))/(p+q)**5+(4*PQx*q*(QCx+QDx)*(3+2*q*QCx*QDx)*(15*(p+q)**2*f(3,t)+4*p*PQx**2*q*(5*(p+q)*f(4,t)+p*PQx**2*q*f(5,t))))/(p*(p+q)**5)(8*(PAx+PBx)*PQx*(3+q*(QCx**2+4*QCx*QDx+QDx**2))*(15*(p+q)**2*f(3,t)+4*p*PQx**2*q*(5*(p+q)*f(4,t)+p*PQx**2*q*f(5,t))))/(p+q)**5+(8*(PAx+PBx)*(QCx+QDx)*(15*(p+q)**3*f(3,t)+30*p*PQx**2*q*(p+q)*(3*(p+q)*f(4,t)+2*p*PQx**2*q*f(5,t))8*p**3*PQx**6*q**3*f(6,t)))/(p+q)**6+(2*(3+p*(PAx**2+4*PAx*PBx+PBx**2))*(15*(p+q)**3*f(3,t)30*p*PQx**2*q*(p+q)*(3*(p+q)*f(4,t)+2*p*PQx**2*q*f(5,t))+8*p**3*PQx**6*q**3*f(6,t)))/(q*(p+q)**6)+(2*(3+q*(QCx**2+4*QCx*QDx+QDx**2))*(15*(p+q)**3*f(3,t)30*p*PQx**2*q*(p+q)*(3*(p+q)*f(4,t)+2*p*PQx**2*q*f(5,t))+8*p**3*PQx**6*q**3*f(6,t)))/(p*(p+q)**6)  787 MUL, 261 ADD, 69 FUNC I ++: ISLPED'08@Bangalore

  10. Koji’s Message(from Core to Data-Center) • Move to new fields! • Orchestrate computing resources effectively! • Efficient acceleration (Parallelization, Specialization) • Make a good balance between many things (Concurrency Management) ISLPED'08@Bangalore

  11. Thanks!! ISLPED'08@Bangalore

  12. New Fields! • Revisit the system stack from circuit/architecture level to algorithm level on emerging devices! ISLPED'08@Bangalore

  13. 4.2 K SFQ 0.5um process CMOS CPU (1chip) ORN 2TB memory module (FB-DIMM [DDR3@1333MHz, 128GB] ×16 modules) ... FPU SFQ RDP (32FPU×32chips) (4GFLOPS/FPU) ORN : : : : ... ORN ... ORN SB 1024FPU@MCM (34chips)×4MCM : : : ... : SMAC SMAC SMAC Memory band width per MCM:256GB/s (=16GB/s ×16 channels) ISLPED'08@Bangalore

More Related