Computing Without Processors Thesis Proposal
600 likes | 743 Vues
Computing Without Processors Thesis Proposal. Mihai Budiu July 30, 2001. Thesis Committee: Seth Goldstein, chair Todd Mowry Peter Lee Babak Falsafi, ECE Nevin Heintze, Agere Systems. This presentation uses TeXPoint by George Necula. Four Types of Research. Solve nonexistent problems
Computing Without Processors Thesis Proposal
E N D
Presentation Transcript
Computing Without ProcessorsThesis Proposal Mihai BudiuJuly 30, 2001 Thesis Committee: Seth Goldstein, chair Todd Mowry Peter Lee Babak Falsafi, ECE Nevin Heintze, Agere Systems This presentation uses TeXPoint by George Necula
Four Types of Research • Solve nonexistent problems • Solve past problems • Solve current problems • Solve future problems
The Law (source: Intel)
The Crossover Phenomenon technology time
Example Crossover access speed (ns) nocaches caches CPU DRAM 200 1980 time
Signal Propagation mm die size 20 distancein 1 clock now time
Reliability & Yield defects/chip occurring tolerable new process now time
Energy power CPU consumption thermal dissipation 100W now time
Instruction-Level Parallelism (ILP) instructions fetch commit now time
Premises of this Research • We will have lots of gates • Moore’s law continues • Nanotechnology • Contemporary architectures do not scale
Outline • Motivation • ASH: Application-Specific Hardware • The spatial model of computation • CASH: Compiling for ASH • Evolutionary path • Conclusions • Future work
ASH Application-Specific Hardware HLL program Compiler Circuit Reconfigurable hardware
ASH: A Scalable Architecture-- Thesis Statement -- Application-specific hardware on a reconfigurable-hardware substrate is a solution for the smooth evolution of computer architecture. • We can provide scalable compilers for translating high-level languages into hardware.
Example int f(void) { int i=0, j = 0; for (; i < 10; i++) j += i; return j; }
Outline • Motivation • ASH: Application-Specific Hardware • The spatial model of computation • CASH: Compiling for ASH • Evolutionary path • Conclusions • Future work
Huge structures Nano-RAM cell . In yellow: a CMOS RAM cell ASH and Nanotechnology • Build reconfigurable hardware using nanotechnology • Low Power: 1010 gates use less than 2 W • Low cost: nanocents/gate • High density: 105x over CMOS
Control-flow transfer Basic block Memory write Memory read Memory word A Limit Study of Performance A graph of the whole program execution:
memcpy Typical Program Graph (g721_e) Memory reads Control flow transfer 100% code cluster 100% memory cluster
How Time Is Spent No caches: reads expensive No speculation
Lesson The spatial model of computation has different properties.
Outline • Motivation • ASH: Application-Specific Hardware • The spatial model of computation • CASH: Compiling for ASH • Evolutionary path • Future work
Program to circuits Memory partitioning Interconnection net CASH: Compiling for ASH
Reliability Computations & local storage 2. Split-phase Abstract Machines Unknown latency ops. 3. Configurations placed independently 4. Placement on chip Compilation int reverse(int x){ int k,r=0; for (k=0; k<32; k++) r |= x&1; x = x >> 1; r = r << 1; }} 1. Program
Power Split-phase Abstract Machines CFG SAM 1 SAM 3 SAM 2
Hyperblock => SAM • Single-entry, multiple exit • May contain loops
SAM => FSM Exit Start Loop Exit Local memory Remote Memory
The SAM FSM Computation args results Register exit start Predicates (control) Combinational logic
Signals Computation = Dataflow Programs Circuits a 7 x = a & 7; ... y = x >> 2; & 2 x >> • Variables => wires + tokens • No token store; no token matching • Local communication only
data data data valid ack valid valid reset Local Global Static Tokens & Synchronization • Tokens signal operation completion • Possible implementations:
ILP and Eager Muxes slow - - > > Speculation b x 0 if (x > 0) y = -x; else y = b*x; * ! f y Computation Predicates Static-Single Assignment implemented in hardware
Guard side-effects • Memory access • Procedure calls *q = 2; • Control looping • Decide exit branch Predicates x=... x=... • Select variable definition ...=x
Computing Predicates s t b • Correct for irreducible graphs • Correct even when speculatively computed • Can be eagerly computed
= Pipelining a[3] a[2] a[1] Loops + Dataflow 0 i 1 &a[0] for (i=0; i < 10; i++) a[i] += i; + + load + a[0] store
Outline • Motivation • ASH: Application-Specific Hardware • The spatial model of computation • CASH: Compiling for ASH • Evolutionary path • Conclusions • Future work
Microprocessors ASH Evolutionary Path The problem with ASH: Resources
CPU+ASH CPU ASH support computation + OS + VM core computation Memory
Outline • Motivation • ASH: Application-Specific Hardware • The spatial model of computation • CASH: Compiling for ASH • Evolutionary path • Conclusions • Future work
Scalable Performance performance ASH CPU now time
Summary • Contemporary CPU architecture faces lots of problems • Application-Specific Hardware (ASH) provides a scalable technology • Compiling HLL into hardware dataflow machines is an effective solution
Timeline now CASH core Explore architectural/compiler trade-offs Hw/sw partitioning (ASH + CPU) Loop parallelization Memory partitioning Writethesis Costmodels ASH Simulation 06/01 09/01 12/01 04/02 06/02 09/02 12/02
Extras • Related work • Reconfigurable hardware • Other cross-over phenomena • A CPU + ASH study • More about predicates
Related Work • Hardware synthesis from HLL • Reconfigurable hardware • Predicated execution • Dataflow machines • Speculative execution • Predicated SSA back
Interconnection network Universal gates and/or storage elements Programmable Switches Reconfigurable Hardware back to presentation
Main RH Ingredient: RAM Cell 0 0 0 1 a0 data a0 a1 & a2 a1 a1 Universal gate = RAM data in 0 control Switch controlled by a 1-bit RAM cell back