Enumera CPU Low cost super computing solutions
History of the CPU Core • 1968 Chuck Moore invents the FORTH Language • 1980 Developed by Chuck Moore at Forth Inc. • 1983 Broke from Forth Inc to become Novix • 1985 Harris Buys Novix Chip becomes Harris RTX family • 1990 Started development of Sh-boom CPU • 1995 Patriot Scientific aquires it. • 1994 MPU 21 Developed for Offete Enterprises. • 1993 to 98 F21 • 1996 to 1998 ITV I21 CPU • 2000 Chuck Moore joins Enumera Project
Patents Held by Chuck Moore Harris US05070451 12/03/1991 Forth specific language microprocessor US05319757 06/07/1994 FORTH specific language microprocessor Nanotronics US05440749 08/08/1995 High performance, low cost microprocessor architecture US05530890 06/25/1996 High performance, low cost microprocessor US05604915 02/18/1997 Data processing system having load dependent bus timing Patriot Scientific US05659703 08/19/1997 Microprocessor system with hierarchical stack and method of operation US05784584 07/21/1998 High performance microprocessor using instructions that operate within instruction groups US05809336 09/15/1998 High performance microprocessor having variable speed system clock
Stack Computers These chips are stack based processors There are no registers in the “normal Sense” they use a stack instead. Often they are referred to as FORTH Chip's but the language they use is very different from FORTH. The CPU's will support more common languages such a C. The Book cover to the Left is by Philip Koopman.
Advantages of Stack architecture. Context switching for interrupts takes essentially zero time Does not need to be pipelined for ALU and operands, because the operands are immediately available in the top of stack Code is smaller than CISC programs by a factor of 2.5 to 8 Comments from Phil Koopman’s Book: Size and Weight Power and Cooling Operating Environment Cost high-performance Parallel processing
Quick look at the F21 1996 versions of Chuck Moore’s designs process .8 um (Current state of the Art 0.18) Volts: 5.0 Mips: 500 (asynchronous design, No Crystal Voltage dependent) mA: 20 mW: 100 The F21 comes with a network co-processor for parallel processing Video co-processor (NTSC out in the I21 Design) Serial co-processor Analog processor ( 6 Bit 40 Mhz sample rate ) 1 Instruction per clock cycle : A total of 16,000 transitors!! Very predictable behavior. Full code simulator for Windows. Designed with OKAD, pronounced Oh CAD, is the VLSI design environment.
A Closer look at OKAD Custom software written by Chuck Will allow for fill electrical simulation Will allow for Macro and Micro thermal simulation (this allow for better optimization of design performance)
Enumera’s Background John Sokol started Enumera to be able to expand research into parallel computing. This began as an extension from using clusters for video compression and high performance web and video serving. The first project was 100 Pentium CPU's in a 6" rack. Ethernet Multicast Boot on disk less motherboards. Designs for over 300 CPU's in a 6" (single depth) rack were drafted but dropped when we found the Chuck Moore CPU's.
My background in Parallel computing 1980 experimenting with 8x 6809 CPU's with shared memory (wire wrapped) developed a RAM memory bank passing system. 1984 built a cluster of 20+ Commador 64's. 1988 -92 had opportunity to work on a Cray YMP.Then ported finite element of code for Maxwell's equations to a SGI 4 CPU SMP box. 1993 At Sun Micro Systems I used 384 globally distributed servers to deliver live and recorded video to 13,000 viewers. 1994 Wells Fargo I worked on architecture to use 600 HPUX boxes located in the bank branched to reduce load on the Main Frames. 1995 to Present, have been using cluster of loosely coupled server for video compression, serving, Neural Networks and Genetic Programming
Current version of Chip being developed These numbers are proprietary. process V Mips mA mW .8 um 5.0 500 20 100 F21(tested) .35 3.0 1100 15 45 .35 3.3 1200 16 54 .18 1.5 2000 12 18 .18 1.8 2400 12 21 Numbers are guesstimates pending simulation. Speed can be throttled by a factor of 1024, reducing power to uW.
Two major direction this can be used. 1.) As low cost, low power devices, using a single core and co-processors. 2.) As an ultra high performance cluster on a chip, still low power and cost.
As a stand alone processor the F21 already uses lower power then the Dragon ball, ARM processor or MIPS cores such as the NEC VR4181. The F21 is also much lower cost, and run over 100x faster then a typical Motorola Dragon Ball(2.7 Mips). Enumera plans to release a version of this with out the video and some internal DRAM as a single chip MP3 player. Co-processor dedicated to audio or I/O processing can be easily added.
As a single chip computing cluster (multi cores on one chip) there a many advantages. The Cores are asynchronous so this will average out the draw on the power supply. Each core will have 4 to 6, 1+ Gigabit per second bi-directional serial interfaces to it's neighboring CPU cores. Each Gbps serial will be addressable as a simple register read and write. A patent is being filed for a cooling and power delivery system to allow for enormous clusters and ram on a very large chips. In theory it would be possible to build a single device running at 10 TeraFlops!
The minimum size Mosis(a Fab) will charge for is 7 cm2. On this size Chip a 7x7 array (49 CPU’s) with ram could be build. Co-processors could also be added. Each CPU’s would be operating at 2400 MIPS x 49 for a total of 117 Billion operations per second. The power consumption would be 1 watt 1.8 Volts a 500 mA. With this level of computing power new applications that were unthinkable before, now become possible.
The 0.18 um core is expected to be able to a/d at 8+ bits at over 1 GHz.! A new tracking a/d design is also expected to increase performance futher. Ultra High speed PWM output design will reduce the size and complexity of power drivers for motors and actuators.
Disposable computing. (MP3 Players, PDA's, like Calculators and watches are today) • Portable real time ray traced images. • Software radio decoder / receivers, Phased array’s • More advance error correction and trellis coding schemes. • More complex servo algorithms. • Video and Audio codec’s in low cost appliances. Encryption • Real time embedded systems. Different tasks could be assigned several processors.
Applications for the Enumera processor design in hard drives Improved servos and more advanced coding systems then Viterbi. (increase storage capacity and reduce cost.) Reduction of component count. Possibly eliminate external RAM. Addition of Video and Audio compression. Addition of TCP/IP servers on the drive itself. Addition of RAID on the drive.