Board Memory

Board Memory EE138 – SJSU

Objectives • Defining the various types of board memory • Discussing memory management of onboard memory • Discussing memory performance

Primary memory • Typically is made up of: • The memory IC • An address bus • A data bus

Memory IC • In general, a memory IC is made up of three units: • the memory array, • the address decoder, and • the data interface.

Common types of ROM used on embedded boards • Mask ROM (MROM). Data bits are permanently programmed into a microchip by the manufacturer of the external MROM chip. • MROM designs are usually based upon MOS (NMOS, CMOS) or bipolar transistor-based circuitry. • One-Time Programmable ROM (OTP or OTPRom). • This type of ROM can only be programmed (permanently) one time as its name implies, but it can be programmed outside the manufacturing factory, using a ROM burner. • Erasable Programmable ROM (EPROM) • UV programmable • Electrically Erasable Programmable ROM (EEPROM)

Random-Access Memory (RAM) • The two main types of RAM are • static RAM (SRAM) ,and • dynamic RAM (DRAM)

SRAM • Static RAMs (SRAMs) use flip-flops as basic storage elements. The 2147H static RAM: Data sheet; Address row and column decoders

Address row and column decoders select memory cell 111111 000010

Internal configuration of a single cell of an SRAM memory, all N-channel MOSFETs

SRAM OPERATION Write Ops: Both –CS and –WE must be LOW for Din to pass through its three-state buffer A1. Read Ops: -CS must be LOW and -WE must be HIGH for Dout to receive data from its three-state buffer A2. To select this particular memory cell, the A0 to A11 address lines on the 2147H are decoded into a single row and column in such a way to place a HIGH on the internal memory array line labeled Bit Select. This HIGH turns on Q3 and Q4, which allow the data to enter (or leave) the cross-connection circuit (Q5 and Q6) that holds the data bit. To store a 1 into the memory cell, Din is made HIGH, and –CS and –WE are made LOW. The HIGH will pass through Q3 to the gate of Q6, turning it ON. An ON transistor essentially acts like a short, placing 0 V (0) at the drain of Q6, which also places 0 V at the gate of Q5, turning it off. Because Q5 is OFF, its drain voltage will be close to VDD (1). Q1 and Q2 are used to provide the bias for the memory transistors, Q5 and Q6. Now that the data are loaded into the cell, this Bit Select line can go LOW, allowing the 2147H to access another cell. This isolates this memory cell from the outside world by turning Q3 and Q4 OFF. As long as VDD is still applied, the 1 that was loaded into the cell will remain there in the form of an ON Q6, which holds Q5in an OFF state, which in turn holds Q6 in an ON state similar to the cross-NAND S-R flip-flop. The need to maintain a VDD supply voltage makes this a volatile memory. If VDD is turned off, no one could predict what state Q5 and Q6 will return to when power is reapplied. To read data out of the memory cell, –CS is made LOW and –WE is made HIGH, enabling the A2 inverting buffer. Then, when this particular cell is selected by a HIGH on Bit Select, the value on Q6 will be inverted and passed out to Dout as the original level that was input at Din.

SRAM Timing Diagram

Dynamic RAM (DRAM) • DRAM use internal capacitors as basic storage elements. • Additional refresh circuitry is needed to maintain the charge on the internal capacitors (refresh period was 2ms but newer technology extended to 64ms). • Slower to read or write a bit than SRAM. • Less expensive per bit and much higher density. • Most applications requiring large amounts of read/write memory will use DRAM instead of SRAM.

Simplified schematic of a single DRAM memory cell.

DRAM OPERATION The storage capacitor is isolated from the Din or Dout line until the MOSFET control transistor is turned ON momentarily with a write or read pulse at its gate. To write a 1 to the capacitor, Din is set to 1 (+Vcc) and the DRAM control circuitry provides a pulse to the gate of the selected memory cell. This pulse turns the transistor ON, shorting the drain to source and placing +Vcc on the capacitor (Depending on the DRAM family, Vcc will be +5V or +3V and in some cases even lower). To read the data at the cell location, the DRAM control circuitry redirects the drain of the MOSFET to the Dout line and issues a read pulse on its gate. This shorts the transistor, connecting the capacitor directly to Dout. Capacitors cannot hold their charge forever so the capacitor has to be refreshed on a regular basis, called the refresh period. Figure on the above right, shows the voltage on a storage capacitor that is initially 0 V, then loaded with a 1, the voltage, will immediately start dropping and require refreshing before its level drops below a recognizable 1-level.

DRAM

DRAM Read Timing Diagram

DRAM Write Timing Diagram

Mask ROMs • Manufacturers will make a custom mask ROM which is a cover to be placed over the silicon chip during fabrication that determines the permanent logic state to be formed at each memory location. • Most desktop computers use mask ROMs to contain their operating system and for executing procedures that do not change, such as decoding the keyboard and the generation of characters for the display. Below shows how a 1 or 0 is derived when the IC manufacturer alters the source connection of a single MOSFET.

Fusible-Link PROMs • IC manufacturers provide user-programmable ROMs (PROMs). They are available in standard configurations such as 4K x 4, 4K x 8, 8K x 4 and so on. • Initially, every memory cell has a fusible link, keeping its output at 0 (see below). A 0 is changed to a 1 by sending a high-enough current through the fuse to permanently open it, making the output of that cell a 1. • Once the fusible link is burned open, the data are permanently stored in the PROM and can be read over and over again . • The process of programming such a large number of locations is best done by a PROM programmer or microprocessor development system (MDS).

Level-2+ Caches • Level 2+ (level 2 and higher) cache is the level of memory that exists between the CPU and main memory in the memory hierarchy.

3 Cache Schemes • Data is usually stored in cache in one of three schemes: • direct mapped, • set associative, or • full associative.

Direct Mapped Cache The diagram above illustrates a blocked, direct-mapped cache for a computer that uses 32-bit data words and 32-bit byte addresses

Cache – 2-Way Set Associative

Cache – N-Way Set Associative

Cache – Fully Associative

Cache Swapping Schemes • Optimal, using future reference time, swapping out pages that won’t be used in the near future. • Least recently used (LRU), which swaps out pages that were used the least recently. • FIFO (first in, first out) is another scheme which, as its name implies, swaps out the pages that are the oldest, regardless of how often they are accessed in the system. • While a simpler algorithm then LRU, FIFO is much less efficient. • Not recently used (NRU), swaps out pages that were not used within a certain time period. • Second chance, FIFO scheme with a reference bit, if “0” will be swapped out (a reference bit is set to “1” when access occurs, and reset to “0” after the check). • Clock paging, pages replaced according to clock (how long they have been in memory), in clock order, if they haven’t been accessed (a reference bit is set to “1” when access occurs, and reset to “0” after the check).

Memory Management of External Memory - MEMC • A memory controller (MEMC), is used to • implement and provide glueless interfaces to the different types of memory in the system, such as SRAM and DRAM, • synchronizing access to memory and verifying the integrity of the data being transferred. • Memory controllers access memory directly with the memory’s own physical two-dimensional addresses. • The controller manages the request from the master processor and accesses the appropriate banks, awaiting feedback and returning that feedback to the master processor. • In some cases, where the memory controller is mainly managing one type of memory, it may be referred to by that memory’s name, such as DRAM controller, cache controller, and so forth.

MEMC Sample Circuits

Memory management units - MMU • Memory management units (MMUs) mainly allow for the flexibility in a system of having a larger virtual memory (abstract) space within an actual smaller physical memory. • The MMU, shown in Figure 5-20, can exist outside the master processor and is used to translate logical (virtual) addresses into physical addresses (memory mapping), as well as handle memory • security (memory protection), controlling cache, handling bus arbitration between the CPU and memory, and generating appropriate exceptions.

Virtual Memory http://www.cs.nyu.edu/~gottlieb/courses/2001-02-fall/arch/lectures/lecture-22.html

TLB – Translation Look Aside Buffer http://www.cs.nyu.edu/~gottlieb/courses/2001-02-fall/arch/lectures/lecture-22.html

Cache and Memory Working Together http://www.cs.nyu.edu/~gottlieb/courses/2001-02-fall/arch/lectures/lecture-22.html

Cache and Memory Working Togetherhttp://www.cs.nmsu.edu/~pfeiffer/classes/473/notes/cache-vm.html Let's try to put together some examples of simultaneous TLB and L1 cache lookups. For example, let's look at the simplest case: we'll make both the TLB and the L1 cache direct-mapped. Let's assume the following specifications:

Cache and Memory Working Togetherhttp://www.cs.nmsu.edu/~pfeiffer/classes/473/notes/cache-vm.html • Cache • The block size is 16 bytes, so the byte offset field is 4 bits • The total size of the cache is 1K, so there are1K/16 = 64blocks. Since it's direct mapped we've got a six bit index field. • We've used up 10 bits; since the physical address is 32 bits that tells us that we've got a 22 bit tag. • So this looks like the following: • Tag 31-10 • Cache Index 9-4 • Byte Offset 3-0 • Virtual Memory • The page size is 1K, so the byte offset field is 10 bits. • That leaves us a 22 bit virtual page number • Virtual Page Number 31-10 • Byte Offset 9-0 • TLB • We get the field breakdown for the TLB by further dividing the VPN. • Since we've got 64 translations and a direct-mapped organization, the 22 bit VPN gets divided into: • TLB Tag 31-16 • TLB Index 15-10

Cache and Memory Working Togetherhttp://www.cs.nmsu.edu/~pfeiffer/classes/473/notes/cache-vm.html • Let's put specific numbers on this: we'll try to read one byte from virtual address 0x1234abcd. • The byte offset field contains d (bits 3-0 of the address). • The cache index field contains 3c (bits 9-4 of the address). • The TLB index field contains 2a (bits 15-10 of the address). • The TLB tag field contains 1234 (bits 31-16 of the address). • So now we go through the following steps: • We look up translation 2a in the TLB and cache line 3c in the cache. • We obtain the TLB tag from the TLB and the cache tag from the cache. • We ask whether: • The TLB entry is valid. • The TLB tag is 1234 (that's the TLB tag from our virtual address. • We have permissions to perform the requested access. • The cache entry is valid. • The cache tag from the cache entry matches the cache tag from the TLB entry. • If the answer to all of the questions in Step 3 was "yes", we've both got a valid translation and a cache hit. We can either obtain our data from the cache or write our value to the cache.

Transfers Between Cache and Memoryhttp://www.cs.nmsu.edu/~pfeiffer/classes/473/notes/cache-vm.html • We have two competing requirements: • we'd like to bring an entire cache line in from memory in one transfer (for bandwidth), but • we want to have as few data lines as possible (for cost). • There are really three feasible solutions here: • the fastest (but most expensive) approach is to use a memory bus that's as wide as a cache line. Now, any time you have a miss, you can just do a single memory transfer. • The cheapest (but slowest) approach is to use a memory bus that's narrower than a cache line; then, on a miss, we take several memory transfers to bring the whole line in. • The third approach is a compromise between the first two: use the narrower bus from the second approach, but find a way to overlap the memory accesses. The traditional way to implement this approach was to have several distinct memory modules: you'd start a read from each of them in turn, and the data would arrive from them on consecutive cycles. The current solution to this problem is to use fast page DRAM or synchronous DRAM. With both of these technologies, we can make a transfer from the internal DRAM cells (comparatively slow) into some substantially faster static memory on the memory chip, and then transfer the data from the static memory much more quickly than we could from DRAM. PC100 and PC133 SDRAM uses four transfers of 64 bits each to fill a cache line on a system with a 32 byte cache line.

DMA

Board Memory and Performance Solutions for improving the bandwidth of main memory include: • Integrating a Harvard-based architecture, with separate instruction and data memory buffers and ports • for systems that expect to perform a high number of memory accesses and computations on a large amount of data. • Using DRAMs, such as DRDRAM and SLDRAM, that integrate bus signals into one line, • to decrease the time it takes to arbitrate the memory bus to access memory. • Using more memory interface connections (pins), • increasing transfer bandwidth. • Using a higher signaling rate on memory interface connections (pins). • Implementing a memory hierarchy with multiple levels of cache, • which has faster memory access times than those of other types of memory.

Improving Memory Performance • Introducing cache, which means fewer slower DRAM accesses with a decrease in the average main memory access time; non-blocking cache will especially decrease any cache miss penalties. • average total memory access time = (cache hit time + (cache miss rate * cache miss penalty)) + (% cache misses * average main memory access time) • where (cache hit time + (cache miss rate * cache miss penalty)) = average cache access time. • Reducing the cache miss rate, by • increasing cache block sizes or implementing pre-fetching (hardware or software), a technique by which data and/or instructions theoretically needed in the future are transferred from main memory and stored in cache. • Implementing pipelining, which is the process of breaking down the various functions associated with accessing memory into steps, and overlapping some of these steps. • While pipelining doesn’t help latency (the time it takes to execute one instruction), it does help to increase throughput, by decreasing the time it takes for cache writes, for example, and thus reducing cache write “hit” times. The pipeline rate is limited only by its slowest pipeline stage. • Increasing the number of smaller multi-level caches rather than having one big cache, since smaller caches reduce the cache’s miss penalty and average access time (hit time), whereas a larger cache has a longer cycle time and for pipe stages in an implemented pipeline. • Integrating main memory onto the master processor,

Summary • This chapter introduced some of the basic hardware concepts involving memory that are typically found on an embedded board: • different types of board memory, and • the basic electrical elements that are used in building them. • There are certain types of memory that can be, or are only, located outside the master processor on the embedded board itself • certain types of ROM and RAM (summarized in Table 5-1), as well as auxiliary memory. • This chapter ends with an introduction of some of the key performance issues that revolve around board memory.

REFERENCES Embedded Systems Architecture: A Comprehensive Guide for Engineers and programmers, Second Edition ISBN:978-0-12-382196-6 Author(s): Tammy Noergaard Digital Electronics: A Practical Approach with VHDL, Ninth Edition ISBN-13: 978-0-13-254303-3 Author(s): William Kleitz

Board Memory

Board Memory

Presentation Transcript

Memory

Memory

Memory

Memory

Memory Errors, Memory Gaps

Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory

Sensory memory, Primary memory

Memory

MEMORY

Memory

Memory and Working Memory

Memory

The X-Board: Memory-Mapped I/O and Devices

Memory

Memory

Memory and Memory Interfacing

Buffer-On-Board Memory System

Sensory Memory Iconic Memory Echoic Memory

Memory Technology, Virtual Memory

Sensory Memory Iconic Memory Echoic Memory

6 Board Games to Increase Memory - Try Them in 2017!