26 September 2000 James C. Lyke

Cellular Automata Based Reconfigurable Systems as a Transitional Approach to Gigascale Electronic Architectures 26 September 2000 James C. Lyke

Outline • Introduction • Background • Description of reconfigurable cellular automata (RCA) arrays • Summary of current status

Introduction • Constraints common to all molecular systems • Limited interconnection fan-in/fan-out • No effective lithography approach capable of adequate throughput • Design intolerance to random defects

The smaller the devices, the bigger the problems • Building the devices: very hard • Building the architectures: not easy • How do you harness 1012 simple devices? • Design capture, synthesis, verification? • How do you wire them together? • How do you assemble and package them? • How do you test finished devices? • How do you address yield issues? • How do you rectify design errors in “gigascale” designs?

Trends in architectures • Interconnection growth • Increased use of programmable logic devices in digital design • Field programmable gate array (FPGA) devices

Pad-limiting due to terminal count explosion

Factors contributing to explosion in interconnections with diminishing scale • Non-scale-ability of resistance (R~L/A) • Packing considerations force minimum average length of interconnections to increase • Dimensionality of design • Hierarchy of design

The challenge of interconnect • Rent’s rule establish the growth of terminals (interface signals) as a function of gate count1 • An empirical explanation T = A G p T – terminal count A- terminals per sub-module G- gate count p- Rent’s exponent (0<p<1) 1Bakoglu, Interconnections and Packaging for VLSI, Addison-Wesley, Reading MA, 1990

Complex integrated circuits usually have p=1 p=0 p=0.8 p=0.5

Nanoscale Interconnect • In order to be manageable at large scales of complexity, exponents of Rent’s rule must be be consistent with dimensionality • Two-dimensional (planar) systems p<(1/2) • Three-dimensional systems: p<(2/3) • Rent’s rule is a statistical observation and a guideline, but must be used with care • A complex design may have different Rent’s exponents at different hierarchical levels and regions of design

Field Programmable Gate Arrays (FPGAs)

Requirements for Complex Digital Design • Ability to form arbitrary arrangement of: • Logic • Memory • Interconnect • Field programmable gate arrays (FPGAs) emulate complex systems and allow these arrangements to be programmed

CLB CLB CLB CLB CLB Structure of RAM-based FPGA Configuration logic Block (CLB) unspecified interconnection

Adding USER memory to LUT a b c Short either, but not both a f LUT b D Q c f

Routing in FPGA Devices pass transistor memory bit

Design problem: F = A AND B G = C AND D Simple example G

Typical FPGA (corner of XC3020) Many details suppressed Source: Xilinx datasheet

Binary Cellular Automata:A lattice of computing points • Lattice of uniformly spaced point sites in 1,2, or 3 dimensions • Each point has a value of {0} or {1} • Value of each site updated at discrete time intervals • Updates are computed as a function of local neighborhood only

Conversion of 1-D CA into a 2-D spatial computation structure

Cellular Automata for Molecular Electronics • Cell behavior normally fixed and homogeneous across entire array • Turing complete • Normally perfect (mathematical abstraction) • Not practical for molecular implementation due to defects • Recover from this by relaxing first assumption • Allow rules at any site to be chosen from a set that is “Boolean complete”

Redefine CA sites as look-up tables (LUTs) • An 3-input LUT (LUT-3) can implement all cellular automata rules of neighborhood 3 LUT3 A C B

Boolean functions as CA rules

Reconfigurable Cellular Array (RCA)

reconfigurable cellular automata -advantages for molecular architecture • Periodic structure • Amenable to chemical self-assembly • Reconfigurability-Logical behavior of each cell independently and repetitively programmable after fabrication • Low interconnection demand • Defect tolerance

3LUT 3LUT 3LUT C D A B Equivalence (functional isomorphisms) between CA and random logic forms A A B B F C C F D D (b) (a) C D A B C D A B F F F (d) (e) (c)

Defect tolerance: before and after

How template choice affects implementation of 4-input majority gate on three different RCA templates size: 21 size: 12 size: 20

3LUT tile of single cell type A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A

2-LUT system based on two cell types

Combinational logic x Y(t) Y(x,h) register array h h(t) Example of more complex architectures using RCA tiles Equivalent representations

Another RCA Architecture (Example) register file (bit array) clock m x n tile of LUTs register file (bit array) m x n tile of LUTs clock register file (bit array) Creates feedback path necessary for general clock-mode sequential behavior

Detail of bit array between tiles

Configuration of RCA • LUT memories implemented as shift register (2-phase clock) • Multiple configuration chains for large device • Not fault-tolerant method of bit-stream distribution

A computation result has a limited range of propagation Dead zones Inputs: cannot be reached Outputs: no results can be used A side effect - “cones of influence”

Routing heuristics for RCA structures may require simple modifications • (left) netlist example to be routed • (center) results (incorrect) from typical FPGA routing tool (error node in red) • (right) corrected results for RCA • Requires definition of a node resolution function (node in yellow)

Training neural nets to design circuits:Results of experimental neural-net based design tools, demonstrating combined “heuristics” (simultaneous technology mapping, placement, and routing)

An n-input look-up table is adequately modeled by a perceptron network with n neurons in its hidden layer * • based on analysis of Vapnik-Chervonenkis dimension • proved with brute force simulation for n = 3 case

3LUT tile and neural network model

How neural nets are trained to design circuits • Abstract a neural net model • Use truth table as the training set (= test set) • Build back-propagation system around tile to train (adjust weights of neurons) • Train / re-train with randomized version of training set until convergence occurs (if it occurs)

Neural network circuit designer tally compare offset

NN-produced results for 2-bit multiplier • Designs are not optimal, but they work • Could be improved with post-processing to remove nonsense constructs

Comparison of Conventional FPGAs to Reconfigurable Cellular Arrays (RCAs) • Similarities • Both use LUTs • Both are software configured with serial bitstreams • Differences • RCAs have no programmable routing • RCAs support only nearest neighbor connections • RCAs has much simpler (periodic) structure

Benchmark rationale • The design of FPGA architectures and their ability to express architectures is empirical • Benchmark suites exist (e.g. MCNC, PREP) to permit comparison of FPGA architectures and algorithms • Comparitive findings in benchmarking is the best current known way of establishing some yardstick, given that most steps in CAD are NP-complete and optimality cannot be proved in general

Other issue #1 • Departure from regularity • Small world (0<p<1 fraction of connections dislocated from lattice) • Semi-structured (Most LUTs point in the right direction) • Amorphous / random structure • Challenge: find O(N) algorithms for “structure discovery” • Question: Can we establish statistical evidence that semi-structured / amorphous cellular networks are adequate as media for hosting complex designs?

Other issue #2 1 cm2 chip 100 um ~840 molecular gates • Signal delivery from “outside world” • X-Y signal grid of 100 microns • Molecular (fractal?) distribution network may be required to combat signal starvation at nanoscale network level Signal terminal from “outside world”

Summary • Reconfigurable cellular arrays are promising as a molecular-scale architecture • Interconnect, defect tolerance, self-assembly • Templates can be tuned to specific molecular concept • Even as abstract approach, some important loose ends need to be dealt with • Configuration bitstream • Hierarchical assembly specifics • Proof that media is competitive with a standard FPGA approach if it could be scaled to molecular levels

26 September 2000 James C. Lyke

26 September 2000 James C. Lyke

Presentation Transcript

September 2000

2000 CLRS September 2000 Minneapolis, Minnesota

Quang Trinh/ James Arnold 26 September 2007

James 2:14-26

January–September 2000

September 27, 2000

September 27, 2000

September 28, 2000

September 1, 2000

IMT-2000 Seminar Seoul, September 2000

C Code Benchmarks - FAE Training September 2000

Anthony Apted/ James Arnold 26 September 2007

James 1:19-26

September 26, 2000

September 28, 2000

ATN-2000 26 September 2000

Indiana University Professional Opportunities Orientation Program September 26, 2000

September 21, 2000

September 25, 2000

ATN-2000 26 September 2000