Digital Systems Design 2

Digital Systems Design 2 Programmable Logic and Storage Devices Chapter 8: “Advanced Digital Design with the Verilog HDL”, Michael D. Ciletti. Memory, CPLDs and FPGAs Chapter 10: “Digital Design Principles and Practices”, John F. Wakerly, Prentice Hall, 2001, Third Edition

Programmable Logic and Storage Devices • With advancement of hardware technology: • Density • Complexity • Size Of field-programmable gate arrays (FPGAs), it provides an attractive and cost-efficient alternative to semi-custom application specific integrated circuits (ASICs). • The opportunity to realize large circuits in FPGAs has created pressure for a change in the method by which circuits are designed for FPGA-based applications: • Schematic entry tools can be productive and efficient when designs are small. • Trend is toward larger and larger designs targeted for FPGAs. Thus, language-based design methodology has become essential to FPGA-based design flows.

Programmable Logic and Storage Devices • Technologies available for implementing digital circuits range from: • Standard Integrated Circuits (ICs) used in low-density/low-performance applications, • To Cell-based and full-custom ICs for high-density/high-performance circuits. • Standard ICs: • Can be manufactured cheaply, • Implement very limited, basic functionality at low levels of integration. • Customized ICs • Implement specialized functionality with a high level of integration • Have a small market • Creates inventory risk because the quantities that could be sold do not warrant the expense of their development and production. • Programmable Logic Devices: • Between two extremes of density and performance that characterize standard parts and full-custom circuits. • Born out of necessity created by two conflicting realities: • Large, dense, high-performance circuits cannot be build practically or economically from discrete devices • Dedicated ICS cannot be produces economically to satisfy a diversity of low-volume applciations.

Programmable Logic and Storage Devices • PLDs: • Read-Only Memory’s (ROM) • Programmable Logic Arrays (PLA) • Programmable Array Logic (PAL) • Complex Programmable Logic Devices (CPLD) • Field Programmable Gate Arrays (FPGA), and • Mask-Programmable Gate Arrays (MPGA).

Programmable Logic Devices • For most up-to-date PLDs see:www.e-insite.net/ednmag • PLDs have • a fixed architecture • Functionality is programmed for a specific application • Programming is done by: • Manufacturer - mask-programmable logic devices (MPLD) • End-User – field-programmable logic devices (FPLD). • Three basic characteristics distinguish PLDs from each other: • An architecture of identical basic functional units • A programmable interconnection fabric, and • A programming technology.

Programmable Logic Devices • The first type of PLDs considered has the AND-OR plane structure shown in the figure. • This type of architecture is used to implement ROMs, PLAs, and PALs. • It implements Boolean expressions in Sum of Products (SOP) form: • AND plane forms product terms selectively from the inputs, and • OR plane forms outputs from sums of selected product terms. • A programmable interconnect fabric joins the two planes, so that the outputs implement sum-of-product expressions of the inputs. • Whether and how a plane can be programmed determines the particular type of PLD that is implemented by the overall structure. Product Terms Inputs AND Plane OR Plane Outputs AND-OR plane structure of a programmable logic device

Storage Devices • The architecture used to implement PLDs lends itself to implementation of storage devices. • Storage Devices can be: • Read-Only, or • Random Accessdepending on whether the contents of a memory cell can be written during normal operation of the device. • ROM (read-only memory) is a device programmed to hold certain contents, which remain unchanged during operation and after power is removed from the device. • RAM (random-access memory) in contrast its contents can be changed during operation, and they vanish when the power is removed.

Read-Only Memory (ROM) • Read-Only Memory (ROM) • A 2n x b ROM consists of an addressable array of semiconductor memory cells organized as 2n words of b bits each. • ROM Interface: • n – inputs defining address lines. • b – outputs called bit lines. • ROM is non-volatile memory. It’s content is preserved even if no power is applied.

Read-Only Memory (ROM) • AND-OR planes for a ROM: 2n Minterms (Word lines) formed from inputs A(0) Address Decoder (Nonprogrammable) AND Plane OR Plane Memory Array 2n x b A(1) A(i) A(n-1) D(b-1) D(i) D(0) b – Outputs (bit lines)

Using ROM for “Random” Combinational Functions • ROM can be used to perform any combinational function. ROM will actually store for each input bit-pattern (input address) the corresponding output bit-pattern. • Example: Truth table for a 3-input, 4-output combinational logic function.

Equivalent 2-to-4 decoder with output-polarity control

Using ROM for “Random” Combinational Functions • Another example that can be built with ROM is unsigned binary multiplication. • Typical realization of a 4x4 multiplier requires to high number of product terms (36) to obtain one pass multiplier through a conventional PLD’s AND-OR array. • With ROM one can realize the function with one pass through a 28 x 8 (256X8) ROM. • Contents of a ROM are normally specified by a file that contains one entry for every address in the ROM. • The nice think about ROM-based design is that one can usually write a simple program in a high-level language to calculate what should be stored in the ROM.

Two-dimensional decoding • Suppose that one wants to build a 128 x 1 ROM. • Straight forward solution will require a 7-to-128 decoder: • 128 7-input NAND gates, • 14 buffers and inverters with a fanout of 64 each. • ROMs with a 1M bits or more are available commercially and they do not use linear structure for decoder – which would require a 20-to-1,048,576 decoders. • The structure used is called two-dimensional decoding. This structure enables reduction of the decoder size to something on the order of the square root of the number of addresses. • The basic idea in two-dimensional decoding is to arrange the ROM cells in an array that is as close as possible to square. • In the next illustration a possible internal structure for a 128x1 ROM is depicted.

Two-dimensional decoding

Two-dimensional decoding • As can be seen, two-dimensional decoding allows a 128x1 ROM to be built with a 3-to-8 decoder and a 16-input multiplexer (whose complexity is comparable to that of a 4-to-16 decoder). • A 1Mx1 Rom could be built with a 10-to-1024 decoder and 1024-input multiplexer. A lot simpler than the one dimensional alternative. • Additional benefit to reduction of decoding complexity is that two-dimensional decoding has one other benefit –- it leads to a chip whose physical dimensions are close to square -- important for chip fabrication and packaging. • In ROMs with multiple data outputs the storage arrays corresponding to each data output may be made narrower in order to achieve an overall chip layout that is closer to square. For example, the next figure shows the possible layout of a 32K x 8 ROM.

Possible layout of a 32K x 8 ROM

Commercial ROM Types • A modern ROM is fabricated as a single IC chip; one that stores 4M bits with a price under $5. • Various methods are used to “program” the information stored in a ROM: • Mask Programmable ROMs. • Manufacturer has to be provided with the ROM content in order to create one or more customized masks to manufacture ROMs with the required pattern. • ROM manufacturers impose a mask charge of several thousand dollars for the customized aspects of mask-ROM production. Because of mask charges and the four-week delay typically required to obtain programmed chips, mask ROMs are normally used today only in very high-volume applications. • For a low-volume applications there are more cost-effective choices, discussed next.

Commercial ROM Types • Programmable read-only memory (PROM) • Similar to a mask ROM, except that the customer may store data values (program the PROM) in just a few minutes. • PROM is manufactured with all of its diodes or transistors “connected”. This corresponds to having all desired bits at a particular value (typically “1”). The PROM programmer can be used to set desired bits to the opposite value. • In bipolar PROMs this is done by vaporizing tiny fusible links inside the PROM corresponding to each bit. • A link is vaporized by selecting it using the PROM’s address and data lines, and then applying a high-voltage pulse (10-30V) to the device through a special input pin. • Early reliability problems with vaporized links technology were solved and reliable fusible-link technology is used now days not only in bipolar PROMs but also in the bipolar PLD circuits.

Commercial ROM Types • Erasable programmable read-only memory (EPROM): • EPROM is programmable just like PROM. • In addition it also can be “erased” to all 1s-state by exposing it to ultra-violet light. • EPROM uses a different technology called “floating-gate MOS”. • EPROM manufacturers guarantee that a properly programmed bit will retain 70% of its charge for at least 10 years even if the part is stored at 125o C. • Insulating material surrounding the floating gate becomes slightly conductive if it is exposed to ultraviolet light with a certain wavelength which provides for the EPROM content to be erased. • Most common application of EPROMs is to store programs in microprocessor systems. • EPROMs are typically used during development. ROMs and PROMs are used once the program is finalized because usually they cost less than EPROMs of similar capacity.

Commercial ROM Types • Electrically Erasable Programmable Read-Only Memory (EEPROM). • It is like and EPROM except that individual stored bits may be erased electrically. • Floating gates in an EEPROM are surrounded by a much thinner insulating layer and can be erased by applying a voltage of the opposite polarity as the charging voltage to the non-floating gate. • Large EEPROMs (1M bit and larger) allow erasing only in fixed-size blocks, typically 128-512 Kbits (16-64 Kbytes) at a time. These memories are typically called flash EPROMs or flash memories. • EEPROM can be reprogrammed only a limited number of times (Insulating layer wares off).

Logic Symbols for standard EPROMs in 28-pin dual in-line packages.

ROM Applications • In addition to the most common application of ROMs for program storage in microprocessor systems, there are many other applications that can provide a low-cost realization of a complex or “random” combinational logic function. • Example of Voice Signals: • When an analog voice signal enters a typical telephone systems, it is sampled 8,000 times per second and converted into a sequence of 8-bit bytes representing the analog signal at each sampling point. • This example will show how ROM-based circuits can easily deal with this highly encoded information.

Coding Voice Samples • The simplest 8-bit encoding of the sign and amplitude of an analog signal would be an 8-bit integer in the two’s complement or signed-magnitude system. • 8-bit linear encoding yields a dynamic range of only 28 = 256 different values. • This corresponds to a dynamic range in signal power of 20*log(256)≈48dB. • By comparison, compact audio disks use a 16-bit linear encoding with a theoretical dynamic range of 20*log(216)≈96dB

Coding of Voice Samples • North American telephone network uses an 8-bit compounded encoding called μ–law PCM (pulse code modulation). • The next figure shows the format of an 8-bit coded byte: a sort of floating point representation containing sign (S), exponent (E) and mantissa (M) fields. • The analog value V represented by a byte in this format is given by the formula: • V = (1-2s)*[(2E)*(2M+33)-33] • An analog signal represented in this format can range from -8159*k to +8159*k, where k is arbitrary scale factor. • The range of the signals is 2*8159 and the smallest difference that can be represented is only 2 (when E=0), so the dynamic range is 20*log(8159) ≈78dB. 7 6 5 4 3 2 1 0 S E M sign exponent mantissa

Coding of Voice Samples • In many types of phone connections voice signal is purposely attenuated by a few decibels to make things work better. • Given a μ–law PCM byte, a digital attenuator must produce a different PCM byte that represents the original analog signal multiplied by a specified attenuation factor. • One way to build a digital attenuator is shown in the next figure. • Each block in the figure can be build with perhaps a dozen MSI chips or a CPLD or FPGA 8 14 μ-law to lineardecoder 14 8 linear to μ-law encoder 14x14 multiplier 14

Coding of Voice Samples • Alternative realization of digital attenuator can be done using a single inexpensive 8kx8 ROM instead. • This ROM can apply any of 32 different attenuation factors to a μ–law input byte. • High order-address bits select a table, and the low order address bits select an entry.

Digital Conference Circuit • In the analog telephone network, it is easy to make a conference connection between three or more parties: • Just connect the analog phone wires together and you get an analog summing junction. • In the digital network, digital conference circuit must include a digital adder that produces output samples corresponding to the sums of the input samples. • We have seen how to create binary adders for 8-bit operands. However, binary adders cannot process μ–law PCM bytes directly. • The 8-bit μ–law PCM bytes must be converted to 14-bit linear format, • The signals then can be added, • Resulting signal must then be converted to 8-bit μ–law PCM as in previous example. • Again, one could create a complex adder or alternatively the same function be performed by a single 64K x 8 ROM. • The ROM has 16 address inputs accommodating two 8-bit μ–law PCM operands. • For each pair of operand values, the corresponding ROM address contains the pre-computed 8-bit μ–law PCM sum.

ROM-based Designs (Advantages) • Previous two examples illustrate many advantages of building complex combinational functions with ROMs. • Most complex functions: • Are generally difficult to design with a custom digital logic • ROM realization of those functions is alternatively straight forward. • For a moderately complex function, a ROM-based circuit is usually faster than a circuit using multiple SSI/MSI devices and PLDs, and often faster than an FPGA or custom LSI chip in a comparable technology. • The program that generates the ROM contents can easily be structured to handle unusual or undefined cases that would require additional hardware in any other designs. For example adder function of the previous example can easily handle out-of-range sums. • A ROM’s function is easily modified just by changing the stored pattern, usually without changing any external connections. For example, the PCM attenuator and adder ROM’s in the previous example can be changed to use 8-bit A–law PCM, the standard digital voice coding in Europe. • The prices of ROMs and other structured logic devices are always dropping, making them more economical and their densities are always increasing, expanding the scope of problems that can be solved with a single chip.

ROM-based Designs (Disadvantages) • For a simple to moderately complex functions, a ROM-based circuit • may cost more, • consume more power, or • run slower then a circuit using a few SSI/MSI devices and PLDs or small FPGA. • For functions more than 20 inputs, a ROM-based circuit is impractical because of the limit on ROM sizes that are available. For example, one wouldn’t build a 16-bit adder in ROM – it would require billions and billions of bits.

Complex Programmable Logic Devices • Since their inception years ago, programmable logic devices have been very flexible workhorses of digital design. • As IC technology advanced, there was naturally great interest in creating larger PLD architectures to take advantage of increased chip density. The question is why didn’t manufacturers just scale the existing architectures? • For example, if DRAM densities increased by a factor of 64 over the last 10 years, why couldn't manufactures scale the 16V8 (16 input signals and its complements, and a number of 16-variable product terms) to create a “128V64”? Such device would have 64 I/O pins, and some number (say 8) of 128-variable product terms for each of its 128 logic macro-cells.

Complex Programmable Logic Devices • This new chip “128V64” could combine the functions of a larger collection of 16V8 and offer terrific performance and flexibility using any input in any output function? • This new chip would be very flexible but it would not have a good performance.

How to expand PLD architecture? • Increase # of inputs and outputs in a conventional PLD? • E.g., 16V8 --> 20V8 --> 22V10. • Why not --> 32V16 --> 128V64 ? • Problems: • n times the number of inputs and outputs requires n2 as much chip area -- too costly • logic gets slower as number of inputs to AND array increases • Solution: multiple PLDs with a relatively small programmable interconnect. • Less general than a single large PLD, but can use software “fitter” to partition into smaller PLD blocks.

CPLDs vs. FPGAs • CPLD architecture: • Small number of largish PLDs (e.g., “36V18”) on a single chip • Programmable interconnect between PLDs

FPGAarchitecture Much larger number of smaller programmable logic blocks. Embedded in a sea of lots and lots of programmable interconnect. CPLDs vs. FPGAs

CPLD families • Identical individual PLD blocks (Xilinx “FBs”) replicated in different family members. • Different number of PLD blocks • Different number of I/O pins • Many CPLDs have fewer I/O pins than macrocells • “Buried” Macrocells -- provide needed logic terms internally but these outputs are not connected externally. • IC package size dictates # of I/O pins but not the total # of macrocells. • Typical CPLD families have devices with differing resources in the same IC package.

Xilinx XC9500 CPLD Family • The xilinx XC9500 series is a family of CPLDs with a similar architecture but varying number of external input/output pins and internal PLDs (which Xilinx calls function blocks – FBs). • Each internal PLD has 36 inputs and 18 macrocells and outputs and might be called “36V18”. • As shown in the table in the next slide, devices in the family are named according to the number of macrocells they contain. • The smallest has 2 FBs and 36 macrocells, and • The largest has 16 FBs and 288 macrocells.

Xilinx CPLDs • Notice overlap in resource availability in a particular package.

Xilinx CPLDs • Another feature of this family is that a given chip, such as XCC95108 is available in several different packages. This is important not only to accommodate different manufacturing practices but also to provide some choice and potential savings in the number of external I/O pins provided. In most applications, it is not necessary for all internal signal of a state machine or subsystem to be visible to and used by the rest of the system. • Thus, even though the XC95108 has 108 macrocells, the outputs of at most 69 of them can be connected externally in the 84-pin PLCC version of the device. • In fact many of the 69 I/O pins would typically be used for inputs, in which case even fewer outputs would be visible externally. • Note that the remaining macrocell outputs are still quite usable internally, since they can be hooked up internally through the CPLD’s programmable interconnect. • Macrocells whose outputs are usable only internally are sometimes called buried macrocells.

Xilinx 9500-family CPLD architecture

Xilinx 9500-family CPLD architecture • I/O pins can be used as input, output or bidirectional pins according to the device’s programming. • Special purpose pins: • GSK – global clock • GSR – global set/reset • GTS – global three-state controls; one of these signals can be selected in each macrocell to output enable the corresponding output driver when the macrocell’s output is hooked up to an external I/O pin. • Only 4 FB’s are shown in the previous schematic diagram, however, XC9500 architecture scales to accommodate 16 Fbs in th XC95288.

Xilinx 9500-family CPLD architecture • Regardless of the specific family member, each FB programmably receives 36 signals from the switch matrix. • The inputs to the switch matrix are the 18 macrocell outputs from each of the FBs and the external inputs from the I/O pins. • Each FB also has 18 outputs that run “under” the switch matrix as shown in the previous figure connecting to the I/O blocks. • These are merely the output-enable signals for the I/O block output drivers; • They are used when the FB macrocell’s output is hooked up to an external I/O pin.

9500-family function blocks (FBs) architecture • 18 macrocells per FB • 36 inputs per FB (partitioning challenge, but also reason for relatively compact size of FBs) • Macrocell outputs can go to I/O cells or back into switch matrix to be routed to this or other FBs.

9500-family function blocks (FBs) architecture • The basic XC9500 FB programmable AND array has just 90 product terms. • However, it also has product-term allocation. • This mechanism allows a macrocell’s unused product terms to be used by other nearby macrocells in the same FB. • Next slide depicts a logic diagram of the XC9500 product-term allocator and macrocell.

Set control Programmable inversion or XOR product term Up to 5 product terms Global clock or product-term clock Reset control OE control 9500-series macrocell (18 per FB)

programmable steeringelements 9500-series product-term allocator Share terms from above and below

Analog controls in addition to logic ones: Slew-rate control. The rise and fall time of the output signals - can be set to be fast or slow. Pull-up resistor. When enabled, pull-up resistor prevents output pins from floating as the CPLD is powered up. Useful if the outputs are used to drive active-low enable inputs of other logic that is not supposed to be enabled during power up. User-programmable ground. This feature reallocated an I/O pin be ground pin and not a signal pin. Extra ground pins are needed to handle the high dynamic currents that flow when multiple outputs switch simultaneously. 9500-series I/O block

Could be anything from a limited set of multiplexers to a full crossbar. Multiplexer -- small, fast, but difficult fitting Crossbar -- easy fitting but large and slow Finding a complete set of connections through a sparse switch matrix is NP-complete problem. For each different CPLD-based design, a set of switch-matrix connections must be found be “fitter” software. Typically this software together with overall CPLD design are part of manufacturers “secret sauce” Switch matrix for XC95108

FPGAs • Historically, FPGA architectures and companies began around the same time as CPLDs. • Xilinx launched the world’s first commercial FPGA in 1985, with the vintage XC2000 device family. • XC3000 and XC4000 families soon followed, setting the stage for today’s Spartan and Virtex device families. • Each evolution of devices brought improvements in density, performance, voltage levels, pin counts, and functionality. • Thus XC4000, Spartan and Spartan/XL devices have the same basic architecture. • FPGAs are closer to “programmable ASICs” -- large emphasis on interconnection routing • Timing is difficult to predict -- multiple hops vs. the fixed delay of a CPLD’s switch matrix. • But more “scalable” to large sizes. • FPGA programmable logic blocks have only a few inputs and 1 or 2 flip-flops, but there are a lot more of them compared to the number of macrocells in a CPLD.

General FPGA chip architecture a.k.a. CLB --“configurable logicblock”

Xilinx 4000-series FPGAs

Digital Systems Design 2